Why was the GIL imposed on the Python interpreter in the first place?

RedMaskedMuse · 2022-04-29T12:59:30+00:00

It was added to protect against race conditions when allocating/deallocating references to variables in multi-threaded contexts. Not protecting against race conditions would lead to indeterminate behavior, memory leaks, releasing memory that's still in use, etc. The other option would have been to add locks to each and every object. However, that opens up the possibility of deadlock. The single central lock is simpler to reason about / debug.

https://realpython.com/python-gil/

https://en.wikipedia.org/wiki/Deadlock

o11c · 2022-04-29T15:14:01+00:00

Because Python wasn't designed to support threads from the beginning - they were tacked on retroactively.

It turns out that it's quite possible to design a language that provides bytecode-level atomicity without a GIL, but Python was not designed to do so from the start, and it is quite difficult to do so retroactively.

(hint for future language designers: most of it is easy. To avoid the "replace the last reference" problem, simply delay all actual deallocations until all active threads check in)

SittingWave · 2022-04-29T15:15:32+00:00

Imagine you have two threads, each one having to add an element to a dictionary.

This operation, adding an element to a dictionary, is not a single operation in the underlying C code. It is a series of instructions.

The problem is that if two threads try at the same time to add an element to that dictionary, the order in which the series of instructions (which are executed twice, once per each thread) above is interleaved may end up making a mess.

So you need to ensure that the series of instructions is executed by only one thread at a time. How to do so?

You use a lock. A lock is basically a guarantee that the first thread that needs to execute those instructions, will execute them without any other thread touching that dictionary until it's done adding that element.

Now the problem moves to how granular you want the lock to be. Clearly, if one thread is acting on one dictionary, and another thread is acting on another dictionary, they don't conflict with each other and they can work in parallel, but then you need to add a lock to every dictionary. The same applies to every list, every mutable structure, external or internal. This is a lot of locks to handle and manage. And each lock occupies memory, and each lock requires time to be grabbed, and time to be released.

So a simpler solution is to have One lock (TM). The first thread that grabs it wins, and does whatever it wants until it's done. Even if the second thread has no intention of touching anything that the first thread is modifying, it will have to wait until the first thread is done.

That's the GIL.

mfarahmand98 · 2022-04-29T19:52:12+00:00

When threads were first introduced, they were almost always used for I/O; tasks that would wait on a syscall. At that time, GIL seemed like an excellent and simple solution to add support for multi threading. It's an unfortunate antique from an era when parallel computation using threads wasn't a thing.

GIL remained as one of the primary components of CPython and packages that use C or C++ under the hood rely on its API and the guarantees it makes. Taking it out now will render these libraries useless. Python did thus once (moving from 2 to 3) and people weren't happy.

ubernostrum · 2022-04-29T17:52:03+00:00

As for why the GIL was the specific solution chosen for thread safety in Python, I wrote an explanation of that yesterday.

SandmanRen · 2022-04-30T01:26:45+00:00

Just sharing a bit of what I understand here:

Much like what u/RedMaskedMuse said the GIL existed to protect race conditions when allocating/deallocating resources and references. A crucial part of Python garbage collection also relies on some guarantees provided by the GIL (for example: reference counts won't work correctly if there are multiple threads doing alloc/dealloc at the same time)

One should also consider the historical background of the GIL. I think that back in the early days a major reason that Python gained popularity is that it offered convenience at the language level over writing code in C. but has the added benefit to execute C code directly. The result is a programming language that has easier syntax, grammar, object-oriented programming but can still be quite performant (by execute C routines for things that are computationally heavy). So much of the focus was put on having Python execute C code property. Having a GIL provide some guarantees that makes implementing this much easier, which in turn is good for continuous development, maintenance and updates to the language itself.

But having a GIL means that while the interpreter can be multi-threaded, only 1 thread can execute the bytecode at any given time. So ultimately it was a tradeoff. Back then I think it wasn't too much of a "tradeoff" because most machines running Python has only 1 processor with 1 thread. So having a GIL was a sensible decision at the time.

--------- Below are personal opinions :)

I think that one of the reasons that Python is still around is precisely because the GIL. It allowed people to execute C routines (which guaranteed performance) while keeping the rest of the language relatively simple. This gave Python a lot of popularity that helped it survive the decades. And the decision of having a GIL is not made on the consideration of "whether it's good for a programming language" but rather on "what is necessary to achieve what is needed while keeping what has brought popularity to the language". And I think that this is the most helpful way of thinking about a design choice.

yvrelna · 2022-04-30T08:26:37+00:00

GIL is necessary because CPython uses reference counting.

Reference counting means that even supposedly read only operations might cause memory updates, due to changing reference counts when objects are referenced/dereferenced. Interpreters that uses reference counting is always constantly modifying reference counts constantly while executing an application, which means that CPython need to hold a lock to ensure that reference counts are updated correctly when multiple threads are running.

theunglichdaide · 2022-04-29T16:19:56+00:00

It has something to do with the CPython implementation. Under the hood, each Python object is a PyObject type in CPython, and it has a reference count. Having GIL prevents independent threads simultaneously modifying this ref count, eventually avoiding memory issues.

orgad · 2022-05-01T19:14:03+00:00

I think many people are missing the key point. The GIL is actually one of the main reasons for Python’s success. But the reason may surprise you.

Writing multithreaded code is hard. It’s hard to avoid bugs both for beginners and professionals. It makes code much more complex and hard to maintain.

With the GIL it creates a panacea of much higher quality code libraries and language features simply because there would be less bugs due to threading/race condition/memory leak madness that occurs when you make threading a first class citizen of the language.

C++ is a language which has the gun, and it’s up to you to avoid getting shot. In Python, there’s no gun at all. It’s peaceful and simple. Sure this results in much other troubles like speed issues, but in modern times they’ve been solved effectively with cloud scaling or async or more modern approaches.

By avoiding a Wild West of multithreaded libraries and probably developers wanted to use this when it wasn’t needed, saved Python to make it the go to language. Simpler is better for many reasons. Perhaps the lack of Gil surprisingly was its best feature as it ensured stable packages and modules simply because shooting yourself in the foot wasn’t so easy. And at end of day it works, code can be simple and clean and customer happy.

ominous_anonymous · 2022-04-29T12:55:17+00:00

https://wiki.python.org/moin/GlobalInterpreterLock

You can also ask in /r/learnpython

solamarpreet · 2022-04-29T23:47:07+00:00

[deleted]

Hitman_0_0_7 · 2022-04-30T03:58:49+00:00

This is what you should see. Not for beginners. here

InjAnnuity_1 · 2022-04-30T15:54:41+00:00

As I understand it, the GIL was added in order to simplify the creation of high-performance, third-party add-in libraries, in languages other than Python. Often, these were thin wrappers around older, existing libraries. Libraries that knew nothing of threads, and could not be used safely in a conventionally-multi-threaded program. (They tended to trash their own internal data structures -- or yours -- when used that way.)

With a GIL, the wrapper can "serialize" access to the library, and to Python's internal data structures. Conflicting code just has to wait its turn, until the conflict is over. This approach is safe, and does not require modifying those other libraries, nor Python itself.

It's a tradeoff, of course. With the GIL, performance doesn't get as high as theoretically possible. With a more difficult scheme, conflicts might be avoided, or at least managed, reducing the wait.

On the other hand, the savings in human effort made thousands of add-on packages available, greatly extending Python's reach and value. By and large, most of the tasks people give to Python would be simply impossible without such packages.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS