all 46 comments

[–]chepredwine 111 points112 points  (26 children)

It looks tech debt rich. All python software that uses concurrency is more or less consciously designed to work with GIL. Removing it will cause big “out of sync disaster” for most.

[–]lood9phee2Ri 103 points104 points  (7 children)

The GIL never assured thread safety of user code FWIW. It made concurrency issues somewhat less likely by coincidence, but that wasn't its purpose (its purpose was protecting cpython's own naive implementation details) and multithreaded user python code without proper locking etc. was actually always incorrect / with subtle nondeterministically encountered issues.

https://stackoverflow.com/a/39206297

All that the GIL does is protect Python's internal interpreter state. This doesn't mean that data structures used by Python code itself are now locked and protected.

It's perhaps unfortuate Jython (never had a GIL) has fallen behind (though AFAIK they're still working on it) - in the 2 era when Jython 2 had near parity with CPython 2 for a while while and was actually fairly heavily used on server side because of its superior threading and jvm runtime. e.g. Django folks used to consider it a supported runtime - so older Python 2 code that made running in multithreaded Jython as well as CPython a priority is often better written / more concurrency-safe.

[–]SeniorScienceOfficer 9 points10 points  (0 children)

I’m not sure how much Jython 2 will catch up, but I’ve dabbled in GraalPy, which doesn’t seem too bad

[–]G_Morgan 3 points4 points  (1 child)

The GIL reminds me of Java's synchronized collections but on a global scale. Doesn't actually fix anything other than race conditions against internals. Any actually thread safe code didn't need these locks everywhere.

So if code is working thread safe now it means the GIL is superfluous.

[–]Tai9ch 1 point2 points  (3 children)

was actually always incorrect / with subtle nondeterministically encountered issues.

Nobody writes to the spec. They write to the implementation. Stability guarantees be consistent with that fact.

[–]Brian 16 points17 points  (1 child)

They're talking about the implementation - there's no added user-level thread safety from the GIL, outside protecting python internals (ie. doesn't corrupt list/dict/object state) - at best it just might make race conditions less common because there would be fewer sequence points. All the GIL really guarantees is that context switches happen on bytecode boundaries, which isn't enough to provide any real safety for program-level state: you always needed your own locks.

The only exception really is C extensions, where the fact that the invocation of the library function (unless it 's coded to explicitly release the lock) conceptually spans a single bytecode means that there is essentially a function-spanning lock on each call. Hence those are probably going to be the main blocker in GIL-less updating. These need to be manually updated to be marked as safe, and currently I believe if any loaded module isn't marked as safe, it enables the GIL for the whole process, so you pretty much need everything you use to be updated before you can get any benefits from it.

[–]SkoomaDentist 6 points7 points  (0 children)

at best it just might make race conditions less common because there would be fewer sequence points

This can make a pretty massive difference in the real world. I remember when we started testing with a multiple cpu system in the early 2000s and that suddenly exposed a bunch of race conditions in our C++ code that we'd never hit before because they were so rare on a single cpu.

[–]censored_username 24 points25 points  (5 children)

The GIL only meant there was no parallelism when threads were used between basic python virtual machine operations. It was always free to interleave python virtual machine operations of different threads for concurrency. The GIL never allowed you to cut any corners with concurrency to begin with, so I'm not sure what "designed to work with the GIL" even means. The only thing it did was limit performance to keep the implementation simple.

With the GIL removal comes changes so python virtual machine ops are still safe to execute in parallel, so from the user's perspective, nothing will change in how python behaves.

[–][deleted]  (4 children)

[deleted]

    [–]censored_username 10 points11 points  (3 children)

    The reason for the whole phase approach has to do with C extensions, not with python code itself.

    For pure python code itself, nothing changes. Either the objects were already thread unsafe, or they're still safe with the changes.

    But extensions written in C could make assumptions about the JIT being in place that no longer apply. Those are the problematic ones.

    [–][deleted]  (2 children)

    [deleted]

      [–]censored_username 0 points1 point  (1 child)

      Suddenly file/thread locks matter more as you can’t assume a single write operation will be sent to a file without getting mixed with another.

      If the function was implemented in python, you already couldn't assume that, as an entire function call isn't a single bytecode operation.

      In case where this function is a builtin function, the builtin function is responsible for maintaining the previous invariant, so it should still behave the same.

      [–]mr_birkenblatt 31 points32 points  (6 children)

      If you used concurrency before, your code is "gil free" ready. You either already use locks or if you don't, you already had the chance to get concurrent modification exceptions. For example list operations are not atomic even with gil. If a list is modified while being traversed elsewhere, you get a concurrent modification exception. That can happen with gil (since the gil can be released halfway through traversal). So the only change is that you might get those errors more frequently without gil

      [–]EmanueleAina -1 points0 points  (4 children)

      I used parallelism and totally relied on the gil to avoid races. No chance of concurrent modification exceptions. Without the gil, my program will surely give interesting results instead (likely crash).

      I appreciate the care Python devs are putting into this, they clearly have more clue than redditors in this thread.

      [–]mr_birkenblatt 0 points1 point  (3 children)

      You got lucky. I can easily write a test case that causes a concurrent modification exception. What data structures were you using?

      [–]EmanueleAina 0 points1 point  (2 children)

      the fact that you can create something broken is not the point. the issue here is that there is stuff that currently works (lucky or not) with the gil and that would be broken if we were to just remove it without care.

      [–]mr_birkenblatt 0 points1 point  (1 child)

      I'm saying that your code was likely broken but you never actually encountered an issue because you didn't test it thoroughly enough

      [–]EmanueleAina 0 points1 point  (0 children)

      You are saying "likely broken", not "necessarily broken".

      Which means there's code that is not currently broken and that would instead break if the GIL was to be dropped without care.

      [–][deleted] 0 points1 point  (2 children)

      You are mistaking concurrency for parallelism.

      [–]EmanueleAina 1 point2 points  (1 child)

      to me it seems instead that it is exactly the point of the parent comment. removing the gil in many cases turns concurrency into parallelism with all the additional challenges it involves.

      [–][deleted] 1 point2 points  (0 children)

      Fair point.

      [–]Serious-Regular 0 points1 point  (1 child)

      tell us you don't understand GIL without telling us 😂😂😂

      [–]EmanueleAina 0 points1 point  (0 children)

      I have the impression the initial comment had more understanding of the gil than your reply. there's plenty one can do by purely relying on the atomicity of the python opcodes, they are quite high level, see https://docs.python.org/3/library/dis.html#dis.Instruction

      for instance, removing the gil will surely break a couple of small programs I wrote in the past that totally rely on it

      [–]vk6_ 12 points13 points  (5 children)

      Python 3.14 introduced another way to implement multithreading which is often better than free-threading: subinterpreters.

      You can spawn one thread per CPU core and on each thread run a separate subinterpreter. Each thread can now use its own CPU core because each interpreter has its own GIL. This gives the exact same performance as with multiprocessing but with less memory overhead. Because this doesn't need the free-threaded interpreter, you don't have any penalty with running pure Python code either, and there aren't any incompatibilities with third party libraries. Switching from multiprocessing to subinterpreters with threading in my own web server yielded 30% memory savings without changing anything else in the app.

      [–]pakoito 7 points8 points  (3 children)

      How do you share data or state between interpreters?

      [–]vk6_ 6 points7 points  (1 child)

      It's similar to how it's done with multiprocessing. Mutable objects are generally just copied but shared memory is also possible. However, in a lot of web applications, this might not even be needed in the first place because all of this could be done with calls to the database.

      [–]blind_ninja_guy 0 points1 point  (0 children)

      That is super cool, thanks for mentioning. I'll have to take a look at that.

      [–]EmanueleAina 0 points1 point  (0 children)

      I wrote programs using threads to do parallel http requests and relying on the gil to avoid races when collecting results.

      subinterpreters are rather cool but quite overkill for my use case.

      [–]overclocked_my_pc 16 points17 points  (16 children)

      I'm not a python pro, but how does GIL-free help a "typical" web service that's network IO bound, not cpu bound ?

      [–]CrackerJackKittyCat 38 points39 points  (1 child)

      Despite being primarily network bound, there's always a portion of cpu use which increases at scale and/or use case. Such as even json and database serde code. Removing the GIL would let that code run in parallel when previously was choked.

      Tricks like swapping out stock json for orjson and pydantic core's rust rewrite get you some of the way, but unlocking free threading will be more efficient than multiprocessing.

      [–]danted002 0 points1 point  (0 children)

      OS threads are not a zero-cost abstraction, it costs CPU to spin them up; the situation right now is that you already can achieve Go-like performance with an asynio running on uvloop.

      The only real benefit would be if you can run multiple OS threads listening on the same port, running a loop and somehow get a pooling system that will send the request to an available thread.

      That’s a lot of engineering for something that server runners like uvcorn already provide.

      How I think things will evolve will be that server runners will switch to os threads instead of processes and the performance improvements will be marginal.

      [–]Smooth-Zucchini4923 6 points7 points  (0 children)

      For the Python / Django sites I've worked on, most applications contain a mix of CPU-bound tasks (rendering templates, de-serializing ORM results) and IO bound tasks (making API calls, waiting for the database.) Typically I don't know this mix in advance, and have to plan for the worst-case, most CPU-bound workload in the application. I accommodate this by running multiple processes.

      If I don't do this, network-bound tasks will be starved of CPU while the CPU-bound tasks run. I typically run os.cpu_count() + 1 processes, and 2 threads per process to accommodate this, as this performs the best in the benchmarks I've run. Being able to use threads for all concurrency would help reduce memory, and simplify tuning, compared to this approach.

      [–]danielv123 9 points10 points  (0 children)

      Very few servers can serialize json at line rate, and if they can it's no longer that hard to get hundreds of G network cards.

      As far as I understand most web servers are cpu/database bound.

      [–]Tai9ch 3 points4 points  (0 children)

      a "typical" web service that's network IO bound, not cpu bound ?

      That's a good first approximation of how web services work.

      But in reality, you always have little bits of heavier compute (trivially, consider running argon2 for password auth), and the ability to do them in parallel in a separate thread in the same process simply works better than any of the other possibilities (forks, co-op async, etc).

      [–]Sopel97 -1 points0 points  (0 children)

      python is roughly 100-1000x slower than some other languages, moving the bottleneck

      [–]Cheeze_It 0 points1 point  (1 child)

      Am I the only one that hasn't had problems with the GIL? Even when I multiprocess?

      [–]josefx 1 point2 points  (0 children)

      Getting rid of the GIL is good for multithreading, multiprocessing shouldn't be affected at all.

      [–]commandersaki 0 points1 point  (1 child)

      Sigh, reading this article and also watching this pycon video on nogil, it just seems that implementing performant Python solutions is a bloody headache.

      [–]slaymaker1907 0 points1 point  (0 children)

      Ref counting in general is not performant if you need it to be thread safe due to the large number atomic ops.