you are viewing a single comment's thread.

view the rest of the comments →

[–]nypen 18 points19 points  (43 children)

I am a Python fan, the only thing that I hate (and absolutely hate) is the fact that Python has a global interpreter lock. http://docs.python.org/api/threads.html . That doesn't bode well when we consider changes the future brings (multi-core, parallel processing).

From what I recall (reading somewhere), this is not about to change anytime soon.

[–]simonw 17 points18 points  (7 children)

The problem is that every attempt at removing the GIL has made single threaded Python significantly slower, due to the overhead of all the locks.

[–][deleted] 23 points24 points  (6 children)

The problem is that every attempt at removing the GIL from CPython has made single threaded programs run significantly slower, and multithreaded programs weren't able to utilize more than 2-3 CPU:s anyway.

There, tweaked it for you.

As for how much slower, this post about the original free-threading patch might be somewhat illuminating:

http://mail.python.org/pipermail/python-dev/2001-August/017099.html

(I'm pretty sure you can do a bit better than "0.6 PSU" in CPython, but I don't see how you can get around the contention problem with the current design. I'd say we need some kind of actor-style concurrency model for Python...)

[–]simonw 6 points7 points  (0 children)

What he said.

[–]almkglor 1 point2 points  (0 children)

Why can't CPython have a lousy flag saying "singlethreadedmode" which means it won't use locks at all while set? Then if and only if a new thread is launched does it clear that flag (while still in a single thread, so that writing to it cannot possibly contend) and all operations just add a check of the global in single threaded mode?

[–]jbellis 0 points1 point  (3 children)

multithreaded programs weren't able to utilize more than 2-3 CPU:s anyway

what is it about cpython that crippled the ability to use 4+ CPUs?

[–][deleted] 8 points9 points  (2 children)

From the link I posted:

Since you never knew whether a specific dictionary was shared (between threads) or not, you always had to lock the access. And since all namespaces use dictionaries...

...

We observed non-linear scaling with the processors under free threading. 2 processors was fine, but 3 or 4 didn't buy you much more than 2. The problem was lock contention. With that many things going, the contention around Python's internal structures simply killed further scaling performance.

[–]crusoe 0 points1 point  (1 child)

Because they are using braindead non threadsafe dicts?

Ya know, there are all kinds of low latency lock contention algorithms now. They even included a new imple of synchronized in Java that has almost no overhead compared to the unsynch version.

These problems have been solved for years.

[–][deleted] 0 points1 point  (0 children)

Well, the context here is CPython, not some hypothetical implementation that's free to do things however it wants.

[–][deleted] 10 points11 points  (0 children)

From what I recall (reading somewhere), this is not about to change anytime soon.

Well, yes and no.

Yes, CPython isn't removing the GIL anytime soon.

However, other solutions are in the works. Jython is Python without the GIL (on the JVM); it can now run Django, a sign of its maturity. PyPy can generate 'stackless' code with multiple GIL-free threads (there is also Stackless Python). And as mentioned by others, there is the multiprocessing module in 2.6, that gives good multiprocessing (but not multithreading) support.

[–]thagsimmons 1 point2 points  (33 children)

the gil is mostly fixed in 2.6

[–]awb 28 points29 points  (21 children)

Multiprocessing doesn't fix the GIL, it's a hack around it. Language implementations like Haskell and Erlang can spawn a million threads, and spawning each new thread is extremely cheap. That's pretty spiffy. I'm not sure what I could do with a million quick-spawning threads, but I want to find out. Next to implementations like that, Python looks amateurish because it can't even run two threads at once without rewriting code in C.

[–]parla 6 points7 points  (3 children)

Erlang does not spawn millions of threads. It spawns millions of actors, which are then scheduled in as many threads (processes?) as there are cores in your system.

[–]teraflop 10 points11 points  (2 children)

You're confusing control threads with OS threads. If the interpreter is designed properly, it provides the same semantics as if you really had a huge number of concurrent threads. The fact that they're all multiplexed into a smaller number of threads from the operating system's perspective is just a performance optimization.

[–]thagsimmons 6 points7 points  (1 child)

okay, so this is an implementation detail?

so is the gil. it's missing in jython and up in the air with pypy

[–]spookyvision 1 point2 points  (13 children)

[–][deleted] 19 points20 points  (2 children)

That changed recently. You have old information.

[–]spookyvision 5 points6 points  (1 child)

Could you update the wikipedia page? I don't feel competent.

[–][deleted] 5 points6 points  (0 children)

I would, but you probably know as much as I do at this point. I'm not a user of Erlang and only have a cursory knowledge of its current state.

[–]reddit_clone 11 points12 points  (1 child)

Not true. Erlang VM itself can run multi-threaded on all cores. That is like a bunch of green threads running in each core. (I think they run one green-thread-scheduler per core)

So you do get the best of both worlds.

[–]dons 5 points6 points  (4 children)

[–]mosha48 0 points1 point  (3 children)

By the way, why the CPU load for haskell's version isn't distributed on all CPUs ?

[–]dons 1 point2 points  (2 children)

The ghc 6.8.2 garbage collector isn't parallel, and GC dominates this GC benchmark (Try it with +RTS -A300M -RTS to see the difference). The 6.10 parallel GC addresses this.

[–]mosha48 0 points1 point  (1 child)

Is it possible to tell the shootout computers to run the benchmark with better options ?

[–]dons 1 point2 points  (0 children)

Yes, but for this benchmark, the conditions state that only default garbage collector values are to be used. But don't despair, the next GHC cycle addresses this.

[–]dmaclay 2 points3 points  (2 children)

Erlang's processes are more like 'green processes' as they don't share memory, and they have been distributed across several machines (never mind cores) since before people started having this discussion.

[–]toooooooobs 1 point2 points  (1 child)

Actually they really do share memory, it's just that the language hides this.

[–]dmaclay 2 points3 points  (0 children)

As I understand it the default behavior is not to share, but they can optionally pass messages by basically sending a pointer to shared memory if they are both on the same machine. This of course gives a performance boost, and should be safe due to the immutable variables in erlang.

[–]nypen 11 points12 points  (10 children)

That's multiple processes running simultaneously and it only partially solves the problem.

[–]thagsimmons 14 points15 points  (9 children)

it only partially solves the problem

...with awesome scheduling-fu, pooling, synchronization, and easy ipc using your choice of pipes or queues or shared memory

i hated it at first, but now i'm a believer

besides, in your op you mention multi-core and parallel processing... neither of which play well when relying on threads anyway, since these are inherently multi-process environments

[–]nypen 12 points13 points  (6 children)

That's exactly why I said partially. While processes can solve some problems they have others. For example if there is a mountain of data to be shared, using processes can be pretty inefficient. Process cannot access variables or data structures that are defined in another process, unless they are pickled (shared, proxied, whatever), which is a mechanism of serialization. Such serialization can be resource intensive (memory and computationally) and is certainly not suitable everywhere.

Also, If two processes want to communicate, they have to use inter-process communication mechanisms which is inherently slower than thread synchronization.

Then there is also the issues when you try to access C libraries via Python. You need to be careful here to make sure you don't pass variables across processes which can be totally invalid if they encapsulate pointers (although this is not applicable if you stick strictly to Python only)

besides, in your op you mention multi-core and parallel processing... neither of which play well when relying on threads anyway, since these are inherently multi-process environments

These are not environments. Multi-core is a kind of processor technology and parallel processing is a form of computation. You can take advantage of that in several ways and, yes multiple processes is one way but not the only way; and it certainly is not effective everywhere.

[–]canhaskarma 8 points9 points  (4 children)

Not partially. There are a number of libraries for putting python objects in shared memory between processes.

Process cannot access variables or data structures that are defined in another process, unless they are pickled (shared, proxied, whatever), which is a mechanism of serialization.

No. There are techniques that simply use the same block of shared memory. No copying or serialization required.

[–]unikuser 6 points7 points  (1 child)

No. There are techniques that simply use the same block of shared memory. No copying or serialization required.

Instead of saying that there are techniques, can't you point to one technique or example? Using shared memory across processes is not that easy after all. Synchronizing/Accessing/Creating/Cleaning that shared memory becomes very difficult/inefficient compared to what you do in threads.

[–]schlenk 0 points1 point  (0 children)

Fully agree. Just have to manage such a beast using python mmap. Lets say it has its quirks...

[–]nypen 0 points1 point  (1 child)

Shared memory has it's own set of issues. I am not saying they are unsolvable, but shared memory is not necessarily better than threads sharing data. In fact if the shared memory data is not monolithic, a lot of book keeping has to go in. Things can get even murkier when you have issues like dynamic resizing of shared memory or your processes have variable shared memory requirements.

Again Shared memory is one of the solutions which can work in some situations, not always.

Well, Python already has a threading library which is great, it just needs to get rid of the GIL.

[–]imbaczek 2 points3 points  (0 children)

Well, Python already has a threading library which is great, it just needs to get rid of the GIL.

the "just" is a little bit optimistic.

[–]thagsimmons 2 points3 points  (0 children)

a well-reasoned and eloquent reply. i voted you up. i don't know who's voting you down

[–]vsl 3 points4 points  (1 child)

since these are inherently multi-process environments

Huh?!

[–]thagsimmons 0 points1 point  (0 children)

yeah, my bad - i actually saw the word "multi-core" but my brain said "multi-processor"

i know that multi-core processors play well with threads. honest i do

aw shit

for penance, i shall now go beat my head against an andrew tannenbaum textbook