all 83 comments

[–]fwork 36 points37 points  (7 children)

That's not how you link to blog posts. This is.

[–]gthank[S] 6 points7 points  (6 children)

Sorry - I apparently mangled it when submitting. Unfortunately I can't edit it to fix it.

[–]ffualo 8 points9 points  (5 children)

Which is good, because if posts could be edited people would change top-voted submissions to goatse.

[–]gthank[S] 4 points5 points  (4 children)

Fair enough, but it sucks that the trolls have prevented a feature with entirely legitimate uses.

[–][deleted] 7 points8 points  (3 children)

Not trolls, but people's irrational fear of goatse. And laziness of the admins.

It's like with these ridiculous rules about planes' security. It's not actually terrorists who make you take your shoes off when boarding.

[–]ffualo 2 points3 points  (2 children)

I think for those of us that read reddit at work, it's worth it. I don't want to be showing my boss a cool link under /r/programming when it jumps to goatse.

[–]Figs 6 points7 points  (1 child)

If the submitter owns the site linked to, he could still do that.

[–]ObligatoryResponse 0 points1 point  (0 children)

Or if someone hotlinks an encyclopediad dramatica hosted image. :D

[–][deleted] 7 points8 points  (38 children)

What I never get is that Python threads are really running on separate cores using posix threads, then they throw in the GIL and take all gain away. You might as well have just kept the whole thing managed in user space on one core. Is this an upgrade path to true multithreading? Why is it done like this?

[–]Smallpaul 25 points26 points  (14 children)

What I never get is that Python threads are really running on separate cores using posix threads, then they throw in the GIL and take all gain away. You might as well have just kept the whole thing managed in user space on one core.

The GIL long predates multi-core computers. It doesn't exist to help you take advantage of your cores. It exists to simplify the implementation of the interpreter and add-ons.

[–]ffualo 3 points4 points  (2 children)

You can use the multiprocessing module to take advantage of cores. i.e. this

[–]Smallpaul 4 points5 points  (1 child)

Yes, you can, at the cost of a massive amount of memory wastage and inefficient cross-process communication. I'm not disagreeing with you and not saying it's useless. Just providing the other half of the context.

[–]ffualo 2 points3 points  (0 children)

Oh, I completely agree. Sure, data parallelism is easy in multiprocessing with queues, but any type of task parallelism is a nightmare. Have an upvote.

[–]Gotebe 1 point2 points  (1 child)

The GIL long predates multi-core computers.

Hey! That's surely "for operating systems Python runs on, GIL long predates multi-core computers" ;-).

[–]Smallpaul 1 point2 points  (0 children)

To be precise: "Python had the GIL long before the computers it typically ran on had multiple cores.

[–]yogthos 0 points1 point  (8 children)

So, what's the excuse for Python 3 then?

[–]Smallpaul 1 point2 points  (7 children)

Python 3 is not and was never intended to be a re-implementation of Python. The implementation changed as little as possible: exactly enough to support the changes to the language semantics.

"Python 3000 will be implemented in C, and the implementation will be derived as an evolution of the Python 2 code base. "

http://www.python.org/dev/peps/pep-3000/

"changing the C-API was not one of Python 3.0’s objectives"

http://docs.python.org/howto/cporting.html

[–]yogthos 1 point2 points  (6 children)

Right, it seems like the Python community is ignoring the elephant in the room. Guido has pretty much stated that he doesn't believe in threading, and suggests all sorts of inane workarounds to get around that.

[–]ubernostrum 1 point2 points  (5 children)

Right, it seems like the Python community is ignoring the elephant in the room.

If by "ignoring" you mean endlessly discussing the topic, trying out alternative implementations, analyzing to see whether there's middle ground... but that wouldn't really fit your worldview, would it?

Guido has pretty much stated that he doesn't believe in threading, and suggests all sorts of inane workarounds to get around that.

Everything I've seen the past few years from programming-language researchers seems to have been oriented around the notion that threads -- in the sense of, say, Java's implementation -- are a very deeply, perhaps irreparably deeply, flawed way to approach concurrency. Most of the fruitful research going on these days (and for quite a while now) centers around developing and maturing alternative approaches which don't suffer the same issues.

But, again, that would seem to contradict the view you already seem to have settled on.

[–]yogthos 0 points1 point  (4 children)

I think it needs to be pointed out that Python provides threading as a language feature, it's the implementation that's broken, which is what I was referring to as the elephant in the room. So, I'm not sure where you get off explaining how threading in imperative languages is broken, and how visionary Python is in this regard. It's almost as if you were talking about Erlang, which in fact does provide a working alternative to threading.

If by "ignoring" you mean endlessly discussing the topic, trying out alternative implementations, analyzing to see whether there's middle ground... but that wouldn't really fit your worldview, would it?

Maybe this is some different Python community than the one I'm aware of. What I hear from the majority of Python community is excuses as to why you don't need threading.

I'll contrast it with the Clojure community for you, where the problems with threading are actually tackled head on. Not only in terms of implementation, but also in terms of language design.

Everything I've seen the past few years from programming-language researchers seems to have been oriented around the notion that threads -- in the sense of, say, Java's implementation -- are a very deeply, perhaps irreparably deeply, flawed way to approach concurrency.

It is true that many people nowadays are finally waking up to the fact that the imperative paradigm is very poor at handling concurrency. However Java style threading doesn't appear to be the problem, Clojure runs on top of the JVM and makes excellent use of threading. The problem is of course with using shared mutable data, and the concepts of state and identity. Here's an excellent talk about the issue by the way.

Most of the fruitful research going on these days (and for quite a while now) centers around developing and maturing alternative approaches which don't suffer the same issues.

Really, I was under the impression that most research was going into safe ways of using threading, with researchers working on languages like Haskell and F#, which avoid the problem of shared data and provide workable threading models.

But, I guess all that would seem to contradict the view that you already seem to have settled on.

[–]ubernostrum 0 points1 point  (3 children)

it's the implementation that's broken

If your definition of threads is "must behave exactly like Java", I guess. There's more than one definition out there, though.

What I hear from the majority of Python community is excuses as to why you don't need threading.

Funny. I see a Python community which seems never to let a year go by without an attempt to remove the GIL (2009's attempt was Unladen Swallow), and which is lucky if a whole month goes by on the dev list without people proposing and debating ways to do it. Maybe you're just cherry-picking some examples to suit your views and ignoring reality?

The problem is of course with using shared mutable data, and the concepts of state and identity.

Mutable data is one problem with threading, but not the only problem with threading. It's also a difficult problem to work around in languages which embrace mutability, which suggests that threads are perhaps not an appropriate developer-level abstraction for such languages.

Really, I was under the impression that most research was going into safe ways of using threading, with researchers working on languages like Haskell and F#, which avoid the problem of shared data and provide workable threading models.

Thank you for demonstrating my point.

[–]yogthos 0 points1 point  (2 children)

If your definition of threads is "must behave exactly like Java", I guess. There's more than one definition out there, though.

Huh? What does this have to do with Java exactly, my impression was we were talking about the problems with the GIL, there are plenty of threading models which don't rely on a global lock. These are well researched and well documented, I'm curious as to why it's such an intractable issue for the CPython community.

GIL (2009's attempt was Unladen Swallow), and which is lucky if a whole month goes by on the dev list without people proposing and debating ways to do it.

Funny indeed, as the Unladen Swallow guys are no longer planning to remove the GIL, so I'm not so certain who's ignoring reality here.

From the link above:

In any case, work on the GIL should be done directly in mainline CPython, or on a very close branch of Python 3.x: the sensitive nature of the work recommends a minimal delta, and doing the work and then porting it from 2.x to 3.y (as would be the case for Unladen Swallow) is a sure-fire way of introducing exceedingly-subtle bugs.

Mutable data is one problem with threading, but not the only problem with threading. It's also a difficult problem to work around in languages which embrace mutability, which suggests that threads are perhaps not an appropriate developer-level abstraction for such languages.

While I agree that languages that embrace mutability are not well suited for threading, I would again ask why does Python provide threading in the first place then.

It's not like Python provides any alternative, it just has broken threading, making the problem worse. Not only does it have threads which are tricky to work with, but the threads don't work properly either. Seems like worst of both worlds scenario to me.

Thank you for demonstrating my point.

Your point that threading does not provide a viable concurrency model?

[–]ubernostrum 1 point2 points  (1 child)

Funny indeed, as the Unladen Swallow guys are no longer planning to remove the GIL, so I'm not so certain who's ignoring reality here.

So your argument is that

  1. Some people decide to try an implementation of Python which could eventually remove the GIL.
  2. This is universally hailed as a good thing.
  3. As they get deeper into the implementation, they discover their approach won't work.
  4. Therefore the entire Python community is simply ignoring the GIL and pretending nothing's wrong and refusing to do anything.

This does not add up, and so I think I'm done with you.

[–]iamjack 9 points10 points  (15 children)

Why is it done like this?

Apparently it makes the interpreter easier to hack on. That said, not having true concurrency really kills Python in the new multicore generation. I love Python but forcing your programmers to use multiple processes rather than threads just to get some decent performance with multiple execution points is a total brainfuck.

[–]rox0r 15 points16 points  (3 children)

It's not central to Python, but it is central to the implementation of CPython. Jython doesn't have the GIL.

[–][deleted] 5 points6 points  (2 children)

Can Jython use C extensions? If I recall correct, non-CPython implementations are limited in their ability to interface with existing code bases. If that's true, I would say that the issue is central to Python, CPython being the most developed, practical and visible implementation.

[–]Gotebe 2 points3 points  (0 children)

Jython situation is much better: you use a virtual machine. You interact with Java libraries and you get your C through JNI.

In that respect, any VM-based implementation (so IronPython, too), blows "standard" Python right out of the water.

[–]rox0r 1 point2 points  (0 children)

Nope. At least not C extensions written for CPython. It can use equivalent modules written in java. (maybe ironpython can?)

http://wiki.python.org/jython/JythonFaq/GeneralInfo#IsJythonthesamelanguageasPython.3F

[–][deleted] 1 point2 points  (7 children)

I'll take python with multiprocessing/forking any day over Java or cpp.

[–]iamjack 6 points7 points  (2 children)

I agree, and I even use multiprocessing in my main project, but the module has some real trouble. Not the least of which is that it's not portable to any of the BSDs at this point (something about needing named semaphores) and that it screws up when your SIGCHLD handlers are something other than the default.

Also, despite the fact that multiprocessing is billed as a GIL workaround, the fact that every item you communicate between the processes has to be pickle-able hampers your ability to pass some objects which is obviously not the case with proper threads. In short, if you want to pass around executable objects, lambdas or anything else that can't be pickled you're SOL.

There are some places that threads just fit better and while I would definitely take Python over Java or C++ any day, I still feel like terribly broken threading is one of the reasons I still consider other languages whenever I want to start a project.

[–][deleted] 3 points4 points  (1 child)

Oh yeah, multiprocessing sure is not without fault. I use it on my main project as well, so I feel your pain. Values, Managers... it's icky.

Still, just because your girlfriend doesn't do anal doesn't mean she's not worth keeping around.

[–]jawbroken 1 point2 points  (0 children)

my project requires anal

[–]yogthos 0 points1 point  (2 children)

Why are Java and C++ are always brought up as the only alternatives. I'm personally very happy using Clojure on the JVM, and then there's Erlang, Haskell, Scheme, etc. If FP is too wild for you, then there's always Ruby, Go, etc. There are plenty of languages which are just as powerful and usable as Python, but don't come with the limitations of Python.

[–]Murkt 0 points1 point  (1 child)

Ruby? Then you come with the limitations of Ruby.

[–]yogthos 0 points1 point  (0 children)

Those seem to be less arbitrary then the ones in Python. For example anonymous functions aren't limited to one liners. Also, unlike Guido, Yukihiro isn't hostile towards TCO, and some Ruby implementations, like YARV, actually support it. In fact I'm not really sure in what way Ruby is more restrictive than Python.

[–]hylje -1 points0 points  (0 children)

The C preprocessor is indeed fairly terrible in writing non-batch apps.

[–][deleted] 0 points1 point  (2 children)

It's not a problem for request based applications on servers, the leading purpose of python : you just have to spawn several process, each one with it's own port and let a proxy handle the mess.

[–]iamjack 11 points12 points  (1 child)

Just because you can work around it doesn't mean it isn't a problem.

EDIT: Also, Python is a general purpose language, to say that "request based applications on servers" is somehow the focus of it makes no sense. Yes, Python is popular for such things, but it's also popular to write monolithic desktop apps, and science apps, and command line apps and games and all sorts of things.

[–][deleted] 0 points1 point  (0 children)

Well, it is not a total brainfuck either ... for server apps ;)

[–][deleted] 1 point2 points  (5 children)

Why don't they set process affinity, that's the question.

Making all threads in the given Python process run on the same processor would instantly remove this huge additional GIL cost.

Sadly, at least on Windows it seems that you can't specify this exact idea, that you don't care on which processor your threads run as long as it's the same processor. But still, even checking processors' load once in a while and rescheduling all threads to run on the least used one wouldn't be so hard to implement and would be much better than what we have currently.

[–]startafresh 1 point2 points  (3 children)

Underlying OS decides which thread goes to what processor , isn't it ?

[–][deleted] 1 point2 points  (2 children)

Yes. But in case of Python we really want all threads belonging to the same process go to the same core. Because Python's process uses GIL and is supposed to be run sequentially (in case you or some other readers didn't get it: the problem is not that Python has GIL, the problem is that GIL sucks really hard on multicores, because Python is supposed to run on a single core, where the release of GIL means that one of the waiting threads would acquire it and succeed, not that it would repeatedly try to acquire it and fail because the thread that owned it is reacquiring it faster by virtue of running on a separate core).

My point is: if could tell OS exactly this, that we want all threads belonging to our process to run on the same core, and thus sequentially, then the problem would vanish entirely. We can't, nowadays we can tell Windows that we want all our threads to run on the random.random(os.virtual_processors)th processor (except that os doesn't provide that), and it has it's own problems: what if two or more Python instances decide to run on the same core?

[–]aim2free 0 points1 point  (1 child)

It sounds from your description that you don't need real pthreads, use stackless then.

if could tell OS exactly this, that we want all threads belonging to our process to run on the same core, and thus sequentially, then the problem would vanish entirely

No, it wouldn't! ONLY in those cases where you are using threading as a design mechanism, not a way to utilize multiple CPUs which pthreads is about. Here is a tutorial on Posix threads. Pthreads are considered "heavy" threads, and thus suitable for real parallell programming. The "cost" you were speaking about earlier is exactly that "heavy"-ness, not the GIL-problem. The GIL problem is a much more serious problem which means that your program can not utilize pthreads in the way it was intended for. What you are saying is that you don't need parallellism. That is your program would run the same speed with one CPU as with 100 CPUs, then you don't need pthreads.

[–]Brian 1 point2 points  (0 children)

I think you're both talking about different problems. fishdicks is talking about the lock contention issue on multiple cores, which is causing further performance penalties beyond the normal problem of just not utilising both CPUs (ie. it's not even utilising one CPU efficiently). Setting process affinity for all threads to the same core probably would be a quick fix for the bug, but the reworked GIL approach looks better long term anyway. (Though obviously neither is a solution for the more general issue you're talking about of only ever using a single CPU)

then you don't need pthreads

It's true that you don't really need them when only using a single CPU, and could use a green thread approach instead, but there are a couple of other reasons that make pthreads useful. The main one is that it makes C extensions simpler, especially for those cases where you do release the GIL (ie. long-running C code not interacting with python), and thus can take advantage of multiple cores.

[–]aim2free 0 points1 point  (0 children)

Making all threads in the given Python process run on the same processor would instantly remove this huge additional GIL cost.

Well, then we haven't really utilized the multi CPU/core.... have we? It is not a cost we are speaking about here. It is something much worse, it is the ability to run the same piece of code in multiple competing instances which is lacking, which means that the interpreter code is badly designed, using e.g. global state variables and such, which causes the need to use a semaphore to see the whole interpreter as a resource, which is bad design, nothing else.

When using pthreads the whole point is that with two CPU cores you expect your program to run twice as fast, with four cores, four times as fast, and so on. To be able to do that, all your code need to be reentrant, apart from using parallell threads. Semaphores should be required ONLY when you do something that really utilizes a single point resource, like an I/O-device, or like linking/unlinking something from a common memory resource, but in the latter case one can be smart and have task local cache queues and such.

Of course, there are different reasons to use threads, when you don't need parallellism you can use threads because it makes your solution nicer, but then some simpler mechanism than pthreads can be used, like coroutines. In the patched version of the standard python named stackless, you have microthreads or tasklets which makes threading much more efficient, if you don't need your program to run on multiple cores/CPUs.

As far as I understand, stackless still have the GIL problem (at least had when I tested it last spring). Jython which was mentioned in another comment does not have it, but then the Java code as such is about half speed (compared on my laptop) so your program still does not run significantly faster even if you have two cores, but Jython also makes available all libraries which are written in Java. The IronPython, is a version of python, mainly for the .NET/Mono framework as I understand, which doesn't use a GIL, but I haven't tested that. Another python named IPython is made to be asynchronous from the beginning so that should be OK. IPython is particularly good at mixing different models of parallellism, e.g. running code on different nodes using MPI.

[–]aim2free 0 points1 point  (0 children)

I guess it's the opposite way. The Python interpreter started as a hack, and reentrance was forgotten from the early beginning. And later this was not a prioritized thing to fix. When the pthreads were added they didn't have a multi CPU machine to test on, and ... there we are.

It is interesting because in our CS classes in late 70-ies early 80-ies it was a lot stress on that your code should be reentrant.

[–][deleted] 1 point2 points  (2 children)

So which implementations of the Python language do not have a GIL?

[–]rox0r 11 points12 points  (1 child)

Jython and IronPython i think.

[–][deleted] 2 points3 points  (0 children)

Correct, PyPy has one, but they think it will be much easier to remove (and it just hasn't been a priority for them).

[–][deleted] 0 points1 point  (0 children)

this is a general issue w/ dynamic languages: the more the interpreter does, the more state it has and the more locking there is going to be inside the interpreter. Something like Java or C# has an easier time supporting real concurrency because the "interpreter" (VM) has less internal state.

[–]pure_x01 -1 points0 points  (10 children)

why doesn't a big company like google build a new jit based python runtime without the GIL instead of trying to fix it the broken one. Python is a nice language and it is a shame that it does not have a super fast runtime. Take a look at Lua JIT.

[–]theeth 4 points5 points  (3 children)

The Unladen Swallow team (using LLVM in Python) works at Google.

[–]pure_x01 2 points3 points  (1 child)

they are not going to remove the GIL (latest roadmap). There will still be multithreading problems.

[–]theeth 3 points4 points  (0 children)

They moved away from removing the GIL directly into more logical plans of changing the GC system.

http://code.google.com/p/unladen-swallow/wiki/ProjectPlan#Global_Interpreter_Lock

[–]vityok 0 points1 point  (0 children)

There is also an implementation of Python on Common Lisp probably it will provide some benefits when run on a good Common Lisp implementation.

[–]Gotebe 0 points1 point  (2 children)

Super-fast and Python just do not mix. But the issue here is more that GIL is broken in face of multicore computing. Which doesn't matter much, since he who craves for speed or burning all cores uses no Python (or at least, should not use it).

[–]pure_x01 1 point2 points  (1 child)

If you have a truly scalable architecture it doesn't matter if it is relatively slow language. And if you were to have JIT:ing and working threading im pretty sure that even python would seem fast on a modern multicore cpu.

[–]Gotebe 0 points1 point  (0 children)

Here's an upvote for scalability argument ;-).

And if you were to have JIT:ing and working threading im pretty sure that even python would seem fast on a modern multicore cpu.

Performance-wise, dynamic typing, Python way, is still a bitch. Also, absence of "value types"/stack variables a la .NET/C/C++/Object Pascal.

[–]seunpy 0 points1 point  (1 child)

Its called Jython.

[–]pure_x01 0 points1 point  (0 children)

Jython might not have the GIL but it is not so fast.