DEV Community

Soma Györe
Soma Györe

Posted on

Explain python global interpreter lock (GIL) Like I'm Five

Top comments (24)

Collapse
 
appeltel profile image
Eric Appelt

Alice and the Snake Library

Help Wanted

Alice was very smart and did well in school. One day, she saw a sign
that said, "help wanted at Snake Library". Alice was not a snake, but
she was curious, so she read the address on the sign and went to Snake
Library to ask about the job.

When she got to Snake Library the snakes were very happy to see her and
asked her to come in. They gave her some tea and cakes, and then told her
why they needed help. The snakes were very smart and good with their books,
but they did not have hands to move books around and write. They needed
a human to come and help that was smart and good with reading and writing.
The snakes were nice and fun to be around, so Alice took the job.

It turned out to be a strange job, but also exciting since Alice liked
books. The snakes had books of instructions of what to do, but these were
written in Snake. Alice didn't speak Snake, so she would read the instructions
out loud to one of the snakes, and that snake would explain to her what that
instruction meant. Usually this would involve getting a book off the shelf,
putting it on a stack, taking a book off a stack, writing something down
in a book, or putting books back on the shelf. Sometimes there would be
math, like adding or multiplying numbers, but she would need help with this
too as the numbers were written in Snake. Every now and then the instructions
would say to take a book to the post office and mail it far away, or go and
wait for a new book to arrive.

Another human

One morning Alice came in to work and the snakes handed her a golden key.
They said that from now on she would need to use the golden key to open the
big human door and it that would lock behind her. They said that there would
be another human coming to help the snakes, and his name was Bob.
The snakes said that it was very important that only one human be in the
library at a time, reading instructions, stacking books, and writing things
down. After some time, the snakes would tell her to leave and give Bob the
key so that he could work for a while.

Alice asked, "But why can Bob and I not both work here at once? The library
is very large, and we could each have a different snake help us read the
instructions?" One snake started to explain that they had just set things up
that way, and if Alice and Bob were both supposed to write in the same book
at the same time, everything would get garbled. Another snake said that they
could solve that problem, but yet another snake said that would slow everything
down when only Alice or Bob was working. The snakes spent the rest of the
afternoon arguing and hissing at each other. Alice decided not to ask
this question again.

As Alice and Bob took turns going into the library and working on the snake
books, Alice wondered why they hired Bob at all. It was nice having the
company of another human, but overall it just slowed things down a bit, since
only one of them could work at a time and they wasted time going in and out
of the library.

Then while Alice was in the library working, she read an instruction out loud,
and the snake said that it meant to take a certain book to the post office.
Alice left the library where Bob was waiting outside, and she handed him the
key. Bob went into the library and started working. As Alice walked down the
road to the post office, she realized that it did make sense for the snakes
to hire Bob. If it was only her working for the snakes, then no work would
get done while she walked to the post office. With two humans working, one
could follow instructions in the library while the other walked to the post.

A lot of math

One day Alice came to work in the morning and saw not just Bob, but her
friends Chelsea, Daniel, Erica, and Ferdinand waiting outside. They said that
they had all been hired by Snake Library to help. Alice used her golden key
to go into the library while everyone else waited. Inside, she asked the
snakes if there was going to be a lot of trips to the post office. Surely
there must be a lot to take, since otherwise it wouldn't make sense to
hire so many humans, since only one could work in the library at a time.
The snakes told her, no - this week they needed to do a lot of math, and they
needed to do it fast.

She asked, "But how will all these people help with
doing math, since only one of us can be in the library at a time getting
help from a snake?". The snakes pointed at a big package labeled "Numpy".
The first task of the day was to unpack the new books and put them on the
shelf. As Alice was working on this, the snakes explained that these were
special books. The first and last few pages were written in snake, but most
of the book would be written in human. Once the snakes helped Alice get
started into one of the math problems, she would be able to take the book
outside and work on the math problems alone. Once she got to the last part,
she would have to wait to get the golden key and go back into the library.

Alice got to work on one of the books and took it outside, handing the key
to Daniel. Alice started doing the math problems written in human. Since she
didn't have to read them out loud to a snake, she could do the math
problems much faster. But there were a lot of math problems in the book!
While she was doing the problems, Daniel came out with his own math book,
then Bob, Chelsea, Ferdinand, and finally Erica. Everyone was sitting outside
doing math problems on a beautiful sunny day. When Alice got to the last
part of the book written in snake, she took the golden key and went back
into the library. She didn't even have to wait for the key since everyone
else was busy doing math problems outside.

All through the day, everyone sat outside doing math problems. Every now and
then someone would finish and go back inside to get a new book, but since
most of the time was spent outside no one had to wait very long for the
golden key. They got a lot of math done that day, and the snakes were very
happy. The End.

Moral of the Story

The library and snakes represent the python interpreter, and the books
represent python objects and instructions. The "Snake language" is the python
bytecode, which are not native instructions for the CPU but must be
processed via the interpreter. The humans represent individual execution
threads, and the golden key is the Global Interpreter Lock (GIL).

The GIL prevents any two threads from simultaneously executing python
bytecode instructions. However, when there is IO to be performed, represented
by going to the post office, a thread will release the GIL so that other
threads can execute python bytecode instructions.

The GIL doesn't prevent computation in general, just the interpretation of
python bytecode. High-performance numerical packages like Numpy perform
operations on large arrays of data using machine instructions, and are
generally written in C or another compiled language. While these large
computations are being performed, the thread can release the GIL since
it isn't manipulating python objects outside its array of numerical data.

So the CPython GIL isn't quite as restrictive as it might first seem. For
IO-bound network applications, threads are spending most of their time
waiting on network traffic and not executing python instructions. For
high-performance numerical work, python is a slow interpreted language
anyway, so it makes sense to rewrite the bulk of the work in C. When this
is done right, it is possible to release the GIL while performing
computational work written as compiled extensions.

Collapse
 
rhymes profile image
rhymes

Best explanation ever!

You should "upgrade it" to an article, it would be very helpful ;-)

Collapse
 
rachit_gupta_ profile image
Rachit Gupta • Edited

Hi Eric, i got some questions regarding this ?
1) it was fine when we were considering the Python only, the basic python3 which ships with linux, and i understand that GIL is applicable to multi-processing but not to multi-threading ? please correct me if i'm wrong ?

2) How do CPython comes into play with the standard Python and how differently GIL plays it's role in Python and as well as it's other implementations ?

3) Although, libraries like numpy and pandas are written in C, so do GIL will play it's role here as well ? so can we take advantage of multi-processing by default ( that is not implementing multi-processing module explicitly ) ?

Can you help me with this ?

Collapse
 
jorotenev profile image
Georgi Tenev

I was like genuinely into the story haha you should release a book

Collapse
 
matteojoliveau profile image
Matteo Joliveau

This is just beautiful, you should definitely turn it into a standalone article!
Plus, it's a beautiful story on it's own :D

Collapse
 
docoprusta profile image
Soma Györe

Thank you for this long explanation. I really enjoyed it and it is really useful!

Collapse
 
allecto profile image
Paula Hasstenteufel

I love this.

Collapse
 
nestedsoftware profile image
Nested Software

This is seriously awesome!

Collapse
 
rhymes profile image
rhymes • Edited

Your parents buy ONE toy for your and your sibling.

They tell you can both play with it in the toy room and only one of you can enter at a time. To make sure you respect the instructions you have to go ask them each time who can play it first.

the toy is the variable, the room is the GIL, the parents are the interpreter, you and your sibling are the threads

Hope it makes sense :D

Collapse
 
docoprusta profile image
Soma Györe

Great thanks. If I understand it correctly the GIL is kind of an automatic mutex implementation. Is it correct?

Collapse
 
rhymes profile image
rhymes

Exactly, it's a giant mutex that serialises access to non thread safe resources.

Thread Thread
 
idanarye profile image
Idan Arye

And in Python - everything except IO is considered non-thread-safe.

Collapse
 
msoedov profile image
Alex Miasoiedov

Just imagine you need coordinate work of 10 people who work together on 4 projects. Once project is complete you need to ship it, remove from the board and maybe add a new one.
How to know when project is complete and nobody still working on it?
Well you can check each day by stoping the work and checking status of each worker and making conclusion which project is ready (Mark and sweep garbage collection)

Another way is create a counter representing of how many people working on the same project simultaneously. And once this counter is 0 that's means we can remove this project (Reference counting approach).

But to make the last approach work before inc/decr that counter each worker needs to make sure that counter is consistent and nobody gonna inc/decr that counter in the same time (GIL)

Collapse
 
idanarye profile image
Idan Arye

That's not an explanation of the GIL - that's an explanation of locks in general...

Collapse
 
msoedov profile image
Alex Miasoiedov

GIL is a general lock :)

Thread Thread
 
idanarye profile image
Idan Arye

GIL is also a three letter acronym - I don't see you explaining about acronyms.

Thread Thread
 
msoedov profile image
Alex Miasoiedov

What difference does it make? The simplest as possible explanation for 5 years does not supposed to rely on definition of mutex, semaphore, object graph traversal, several generation of garbage collection objects, entire posix standard, linux kernel internals, architecture of cpu's, modern microelectronics and recent discoveries of quantum field theory.

Thread Thread
 
idanarye profile image
Idan Arye

Of course not - that's going down into details, when you should go up.

Say you go to take your bike from where you left it, and you see me putting my own chain and lock on it. You approach me - "mind explaining this lock?" - and I start explaining what locks are used for and why humans need to lock things. But that's not what you wanted, right? You probably didn't want an explanation on the internal clockwork of the lock either. No - what you wanted me to explain is why I put a lock on your bike.

When someone asks a about a specific lock, they want to know why you need a lock there. The GIL is a "hidden" lock - it's applied automatically by the Python interpreter and you don't usually notice it - so asking for it to be explain is asking what it locks and why it needs to be locked - not what's the purpose of locks in general.

Thread Thread
 
msoedov profile image
Alex Miasoiedov • Edited

Idan, I understand where you coming from. Oversimplified explanation might zoom out from all of this concrete details. Nevertheless it at least can explain thing for those who don't know what mutex is.

The GIL is a "hidden" lock it's applied automatically

The word hidden and applied automatically are kind of misleading. GIL is implementation detail of Cpython's reference counting garbage collector and it could be implemented without it.

Thread Thread
 
rachit_gupta_ profile image
Rachit Gupta • Edited

Hi Alex, i got some questions regarding this ?
1) it was fine when we were considering the Python only, the basic python3 which ships with linux, and i understand that GIL is applicable to multi-processing but not to multi-threading ? please correct me if i'm wrong ?

2) How do CPython comes into play with the standard Python and how differently GIL plays it's role in Python and as well as it's other implementations ?

3) Although, libraries like numpy and pandas are written in C, so do GIL will play it's role here as well ? so can we take advantage of multi-processing by default ( that is not implementing multi-processing module explicitly ) ?

Can you help me with this ?

Collapse
 
nestedsoftware profile image
Nested Software • Edited

I really love @Eric Appelt's answer. It's amazing. For those who just want a concise and boring answer, here's my attempt at that:

In cpython (the reference implementation of Python), there's this thing called the GIL, or Global Interpreter Lock. I believe there are other implementations of Python that don't have it.

The GIL means that for a given Python process, only one CPU core will ever be used, regardless of how many threads you may have running.

Let's say you have a computer with a 4-core CPU. The GIL means that, contrary to what you may expect, if you have 4 threads, and they all want to use the CPU a lot, your program won't run any faster for having threads! In fact, it will probably run a bit slower!

So what are threads useful for in cpython? If your threads are I/O bound, that is, threads are often just sitting idle waiting to get results back from the disk drive, or fetching things from the Web, stuff like that, then you can get a benefit out of multithreading. The threads that want the CPU can use it while the idle threads sleep.

If you do want to utilize more than one core, or more than one CPU for that matter, you can do so in cpython with the multiprocessing module. This can be annoying though, since it is my understanding that each process will have to run its own instance of the Python runtime. I have not used this module, so I am not sure how communication among processes is handled. Traditionally, that is supposed to be one benefit of threads over processes, that all the threads share access to the heap.

It's not really for 5 year olds, but I hope this explanation is clear for someone who knows some programming basics.

Eric Appelt mentions in this discussion that numpy code, for example, appears to run independently of the GIL, since it is compiled native code rather than Python code that is run directly by the interpreter. Apparently that is another case where the GIL will not interfere and allow code to run on multiple cores at a time. However, I have never used numpy, so I don't know any details.

Collapse
 
rhymes profile image
rhymes

If you do want to utilize more than one core, or more than one CPU for that matter, you can do so in cpython with the multiprocessing module. This can be annoying though, since it is my understanding that each process will have to run its own instance of the Python runtime. I have not used this module, so I am not sure how communication among processes is handled. Traditionally, that is supposed to be one benefit of threads over processes, that all the threads share access to the heap.

I wouldn't say it's annoying, just don't expect the overhead to be like 3kb of memory or something.

The various types of creating a process are explained here in the doc docs.python.org/3.6/library/multip... - but long story short: you can either spawn a process (a new interpreter) or clone the existing (fork). The default is the clone.

Processes can communicate through queues or pipes or they can share parts of the memory (not recommended).

Queues and pipes are implemented using unix pipe, which is not that different from what happens when in Unix you do:

cat verylongfile.txt | sort

Those two totally unrelated processes communicate through a unix pipe.

Collapse
 
tsundara profile image
Thyag Sundararmoorthy • Edited

Imagine lunch time in your office pantry. People have their lunch boxes. They are waiting to warm their food in the microwave.

• The pantry room = process
• Lunch boxes = threads
• The microwave oven = CPU
• The food = instructions
• Warming the food = execution
• The microwave lock = The GIL

At a given time, the food in a single lunch box can be warmed. Other lunch boxes have to wait. The microwave lock prevents other lunch boxes from entering the microwave oven.

Similarly, the instructions in a single thread can be executed. Other threads have to wait. The GIL prevents other threads from entering the CPU.

This whole act of waiting is the reason why the GIL is frowned upon. The CPU itself is doing work, but the threads are idle. Hence "overall performance" takes a hit.

Hence, to improve performance, the management creates a new pantry room with its own mincrowave oven. In Python, the equivalent is creating a new process via the multiprocessing module. The flip side is that this option costs more (more resources needed )

Collapse
 
xowap profile image
Rémy 🤖

Threads are a lie. Use processes instead.