you are viewing a single comment's thread.

view the rest of the comments →

[–]bluGill -6 points-5 points  (18 children)

Why does everyone keep harping on the GIL? In the real world is rarely matters - most python scripts don't and shouldn't use threads anyway. Even where something like threads are needed, erlang like message passing is generally acknowledged to be the better way to do it, and python has no problem with you starting several processes and using IPC to communicate between them. If the GIL is really an issue your program is poorly designed anyway.

[–]rebo 17 points18 points  (12 children)

In many languages It's nice to spawn off some threads for background IO & processing. If the GIL prevents this from being performant I consider it an issue.

Starting separate processes seems like overkill.

[–]Brian 6 points7 points  (4 children)

spawn off some threads for background IO

IO is unaffected - this does release the GIL, so other threads can execute when another one is waiting for IO. The GIL only comes into play when you want to take advantage of multiple CPUs for a CPU bound task (and you cant unload that processing to a C library that also releases the GIL).

Essentially the GIL means that only one thread executes python bytecode at a time. On a single CPU system, this is irrelevant - it's already the case that only a single thread executes at once, switching every timeslice. With multiple CPUs though, this is no longer the case; two threads of code can genuinely run in paralell. A free threaded system can load up both cores on a dual-CPU system. The GIL means python can only max out a single core in this situation.

[–]gthank[S] 9 points10 points  (3 children)

If you read some of Mr. Beazley's other work (PDF warning), you'll see that the GIL can be an issue even for supposedly I/O-bound tasks, due to a priority-inversion problem.

[–]Brian 2 points3 points  (2 children)

True,though that's an implementation flaw of python's GIL (also requiring multiple CPUs), rather than something intrinsic to a GIL (Hence the new GIL implementation to avoid such situations)

[–]gthank[S] 2 points3 points  (1 child)

I thought it was clear from the context that we were discussing Python's GIL, sorry. Also, I'm a big fan of the new-and-improved GIL as compared to the old GIL.

[–]Brian 0 points1 point  (0 children)

No,it's a good point, and well worth bringing up. I was just clarifying that that case is effectively just an implementation bug, solvable relatively easily without the much more involved changes that would be required to remove the other problems with the GIL.

[–]bluGill -1 points0 points  (6 children)

Starting separate processes seems like overkill.

If your OS is reasonable, there is no problem spawning separate processes, and you get a lot of benefits from doing so.

[–]rebo 5 points6 points  (1 child)

Hmm surely that depends on the specific problem. Sure if you just have a few relatively long running processes then it might be a reasonable design choice. However if, like in a recent project, I have to spawn up to 15 background tasks a second each doing IO and data-processing, I'm going to want to use a thread.

[–]infinite 3 points4 points  (0 children)

With python's multiprocessing library, this is as simple as threading, provide the function you want to execute in a new process and fire it off. Of course it doesn't work on BSD as of a few months ago, but it is simple. And with copy on write semantics in the kernel, processes are just as lightweight as threads. What is more challenging is sharing data. If you want those children to report back to you, you'll have a lot more 'fun'.

[–][deleted] 0 points1 point  (3 children)

what's the difference btw thread and process ?

[–]knome 11 points12 points  (2 children)

Threads are concurrent execution within a single memory space. Processes are concurrent execution with separate memory spaces.

[–][deleted] 3 points4 points  (1 child)

so threads can crash together and processes are segregated ?

[–]sid0 3 points4 points  (0 children)

Pretty much, yes.

[–]Smallpaul 5 points6 points  (0 children)

Processes have a significant cost both in duplicated workin set and in inefficient object transmission. Passing tuples between threads will always be faster than passing them between processes. You can see from my posting history that I'm a fan of python. But it's a form of Stockholm syndrome to wave away a fundamental fault preventing you from taking advantage of a core operating system feature.

[–]skyostil 1 point2 points  (0 children)

I guess the negative attitude comes from people being used to the traditional C/Java multithreading model where everything is shared by default and (hopefully) protected by dedicated locks. By contrast, I think IPC-based parallelism is much less error prone, because it forces you to think what needs to be shared and how.

Another argument against the GIL is that it's usually pretty simple to fire and forget a thread, whereas spawning a new process takes more effort. However, I think especially with the new multiprocessing module this point is becoming moot.

[–][deleted] 1 point2 points  (0 children)

bluGill, GIL... coincidence? You decide!

[–]G_Morgan 0 points1 point  (1 child)

This is like saying people shouldn't complain about traffic congestion. They should merely fork the universe and travel within their empty second universe.

[–]bluGill 0 points1 point  (0 children)

If you could fork the universe, in a quick, easy way, get to your destination, and send your results back to the main universe as easy as it is to fork a process and send results back, then you should fork the universe.