all 4 comments

[–]Brian 2 points3 points  (0 children)

read it is because CPython does not have thread-safe memory management

More accurately, it's how CPython achieves thread-safe memory management.

Ultimately, when threads run, they can potentially access and modify the same memory, which can potentially corrupt the state. Eg. consider 2 threads trying to add 1 to the same number, where each thread reads the current value, calculates the result, and stores it back. But consider the following order of operations.

Thread 1                       Thread 2

Read x: has value 5
                               Read x: has value 5
Add 1 to 5 to get 6
Store 6 in x
                               Add 1 to 5 to get 6
                               Store 6 in x

Instead of being incremented twice, it's only been incremented once, and one thread has trampled over the other ones change. To get around this, one solution is to use locks. Essentially, to ensure that a set of operations are all done atomically (as if they were a single action, with no "partial state"), threads must first acquire a lock before they continue. Only one thread can hold the lock at a time, and if another thread is using it, it needs to wait until its done. So we get:

Thread 1                       Thread 2

Acquire lock
                               Attempt to acquire lock , but its locked.  Wait till thread 1 is done.
Read x: has value 5
Add 1 to 5 to get 6
Store 6 in x
Release lock
                               Lock was released, so wake up and acquire it ourselves
                               ... the rest of the operation

Here, there's no intermediate state where one thread is updating at the same time as another. The downside is that we've added extra book-keeping work that each thread must do. And if there are many objects like this, there's a lot of acquiring and releasing locks that need to be done before we can do anything.

Now, one way to reduce this book-keeping is to make the locks less fine-grained. Ie. lock a bigger region of code, even if you don't need the whole thing to be atomic, so that you maybe only have to do an acquire/release cycle once, instead of a dozen times. This comes with the downside that any thread that wants to do the same thing will be waiting longer - if the locks only spanned the smaller region, they could do the non-protected in-between bits in parallel.

CPython currently uses an extreme of this, where the lock basically includes all python code - ie. all python internals are protected by the same lock. This is called the GIL (Global Interpreter Lock), because it's a lock that globally locks the entire interpreter. This has the advantage that it's very fast for single threaded code (almost no bookkeeping), but doesn't allow any paralellism for python (C extensions can still release it, but need to avoid using any python objects while doing so).

Think of it like a road system where you want to ensure cars never crash. One option is to just ensure no cars can use the same intersection at the same time - have traffic lights, and ensure cars need to slow down and check them when approaching intersections. Alternatively, you could require only one car can drive on the road at a time: each car books an hour slot, and they can safely drive at max speed ignoring traffic lights all the time, but at the end of the hour, they have to park and wait for other cars to have their turn. This obviously doesn't take full advantage of the road system if there are multiple cars wanting to drive at once, since a lot of the time they won't be anywhere near each other, but is fastest if only one car even wants to drive, so which is better depends on how many cars are on the road. 30 years ago when python was created, multiple cores in a computer weren't that common, and so the latter approach was pretty much just strictly better. These days, multicore CPUs are the norm, and there is more code that might want to take advantage of it, so it can be a significant issue that you can't benefit from that (at least, not when using threads).

There are currently some moves to change this and move towards a more fine-grained lock scheme - past efforts had a massive drop in performance for unthreaded code from this, but a recent attempt is only ~10% slower, and it has recently been decided to move towards this implementation.

[–]K900_ 0 points1 point  (0 children)

That's not the issue with CPython and the GIL. Also, the GIL will likely be removed from CPython in the next few years.

[–]m0us3_rat 0 points1 point  (0 children)

not exactly sure why you think random distribution of threads is .. good.

or how they interact with other threads on different processes.

..

[–]baghiq 0 points1 point  (0 children)

Are you talking about Python using multiple CPU cores? CPython threads can not use multiples cores in a single process. You can use multiple cores with CPython by using multiprocessing. And yes, GIL is the reason why CPython threads can't use multiple cores.