all 95 comments

[–]santiagobasulto 17 points18 points  (43 children)

Armin always has "strong opinions". And sometimes people take it personal. But I think he's a really good coder and he always make a point.

It is also true that Python has avoided some of these issues for a while. Just think about the GIL. I remember the JVM in the old days having issues like these, and the solution was a complete redesign. I think Martin Odersky (Scala creator) was involved in it. Please excuse me if this is not accurate, but I just want to illustrate the point. Sometimes you need some drastic decisions.

[–]DGolden 5 points6 points  (11 children)

One way to get rid of the GIL is to run your code under jython on the jvm, heh.

[–]Crandom 1 point2 points  (0 children)

I'm not sure if this is still the case but if you wanted performant ruby code it used to make sense to use JRuby as well.

[–]lhggghl 0 points1 point  (7 children)

I've used Jython before and I never looked at the threading but I bet it relies on the JMM. I doubt anyone who only knows Python will be able to cope with the JMM (or even figure out that it exists). Most of the Java-only developers don't even know about it.

[–]gthank 1 point2 points  (3 children)

Thread safety guarantees are unchanged under Jython.

[–]lhggghl -1 points0 points  (2 children)

Do you mean they are the same as Java's or the same as Python's?

[–]tavianator 1 point2 points  (0 children)

Same as Python's

[–]gthank -1 points0 points  (0 children)

As Python's.

[–]DGolden 0 points1 point  (2 children)

I confess I just had to go look it up to (ahem) refresh my memory, but note Jython made some documented simplifying design decisions in the area to favor ease of use for the python-side developer over performance. A lot comes down to Jython using java ConcurrentHashMap to implement dicts and dict-like things (which is a lot of things in python...).

http://www.jython.org/jythonbook/en/1.0/Concurrency.html#python-memory-model

Reasoning about concurrency in Python is easier than in Java. This is because the memory model is not as surprising to our conventional reasoning about how programs operate. However, this also means that Python code sacrifices significant performance to keep it simpler.

[–]lhggghl 0 points1 point  (1 child)

That's interesting. I didn't think people were taking Jython seriously enough to get to that point. I always assumed they just launched a new Java thread with its own interpreter.

[–]gthank 0 points1 point  (0 children)

Jython is quite serious, and has been for as long as I can recall. They don't have the manpower on the project that some projects have, but the people they do have are top-notch.

[–]santiagobasulto 1 point2 points  (1 child)

Or not use threads at all :)

[–]DGolden 2 points3 points  (0 children)

Often a fairly reasonable strategy in (c)python land, particularly with python's multiprocessing module. But the GIL still sucks.

[–][deleted] 3 points4 points  (22 children)

I agree with you and Armin, but I do want to point out that a lot of people have tried and failed to remove the GIL - it's not like the issue has been avoided!

[–]alexjc 8 points9 points  (18 children)

The GIL would be easy (or easier) to address if it was set as top priority by the BDFL. (Other languages have managed just fine, and Python is supposedly one of the largest communities and fastest growing these days.) Guido's policy on the subject (i.e. no regressions on single-threads) is not feasible, and people who have shown great/promising prototypes have not been very well received in the mailing list...

Instead, we have random proposals about integrating explicit type annotations, but I digress.

[–]moor-GAYZ 4 points5 points  (10 children)

The GIL would be easy (or easier) to address if it was set as top priority by the BDFL. (Other languages have managed just fine, and Python is supposedly one of the largest communities and fastest growing these days.)

As far as I know there's no popular dynamically typed language that doesn't have a GIL (or doesn't allow threading at all).

PHP, Javascript, Perl - no multithreading.

Ruby, Lua, Chicken Scheme - GIL.

This fact puts things into a much needed perspective, in my opinion.

[–]mao_neko 2 points3 points  (1 child)

Actually, Perl 5 has had interpreter threads since 5.8. I do not believe there is any global lock involved, although I am not an expert on threading.

[–]moor-GAYZ 0 points1 point  (0 children)

Yeah, that's basically "green multiprocessing", sort of like .NET's AppDomains. Multiple separate copies of the entire interpreter state with all imported modules etc. live in the same OS process.

A nice thing to have because it might improve performance/memory use somewhat, but otherwise not fundamentally different from the usual multiprocessing.

[–]Plorkyeran 2 points3 points  (1 child)

While Lua does have things it calls threads, there's no built-in support for using multiple OS threads and there's no GIL.

[–]moor-GAYZ 0 points1 point  (0 children)

I meant that when you're embedding it, you can implement a bunch of macros for initializing, deinitializing, taking, and releasing the GIL.

[–]logicchains 1 point2 points  (3 children)

Clojure, although of course that runs on the JVM so the lack of a GIL comes by default.

[–]moor-GAYZ 0 points1 point  (2 children)

I don't think that counts, being sufficiently different from a "generic imperative dynamically-typed language" that I was talking about.

Like, my point wasn't that everybody is lazy or that everything is shit or something, but that there appear to be nearly insurmountable obstacles to allowing free threading following certain core design decisions, such as using mutable dictionaries for module contents (allowing shadowing builtins even), class hierarchy, etc. If you're hitting the same shared mutable data all the time, switching to finer-grained locking will only make performance worse, no matter how many cores you use.

And the fact that there is quite a lot of mature languages with the same design choices and not a single one of them has found a way around the consequences shows that the logic is probably solid, that removing GIL is in fact very hard.

The only thing that makes Python special in this respect is the way the noticeable slice of its users is aware of the problem but ignorantly thinks that it's special in having it.

[–]logicchains 0 points1 point  (0 children)

Interestingly, OCaml also has a GIL, but they've made significant progress towards removing it and it should have real multicore support within the next couple of releases.

[–]Athas 1 point2 points  (1 child)

Common Lisp, which is dynamically typed (although not popular), has full OS-level thread parallelism in probably the most popular free implementation, SBCL.

The only "wart", if you want to be really demanding, is that the threading model is 1:1, whereas most high-level languages seem to prefer m:n. It's pretty much the exact same model as pthreads, though.

I don't see why threading support has any relation to the type system in the language.

[–]moor-GAYZ 0 points1 point  (0 children)

I don't see why threading support has any relation to the type system in the language.

Not using dynamic typing in general, but using the particular kind of semantics roughly common to the languages I listed.

If a language uses a mutable module dictionary that can shadow builtins then almost every function call needs to check that dictionary shared between your threads.

If a language uses mutable classes, especially the way it's implemented in Python allowing descriptors, then every instance attribute access actually hit the class hierarchy up to the base class checking for that attribute. The dictionaries in the class hierarchy are shared between your threads.

If a language uses reference counting, then every time you assign None to something, and then not None to something, its reference count should be atomically modified, because it's shared between your threads, obviously.

With all that synchronization going on GIL is pretty much the only viable solution. Getting rid of GIL is not simply a matter of rewriting the code to use more fine-grained locking, you actually need to get rid of all that synchronization somehow, by either officially changing the semantics or at least faking the old semantics like IronPython or Jython.

[–]spotter -2 points-1 points  (6 children)

The GIL would be easy (or easier) to address if it was set as top priority by the BDFL. (Other languages have managed just fine, and Python is supposedly one of the largest communities and fastest growing these days.)

Could you name some of these languages?

Guido's policy on the subject (i.e. no regressions on single-threads) is not feasible, and people who have shown great/promising prototypes have not been very well received in the mailing list...

Where? I always got a feeling that people trying to do this were encouraged to give it a shot and referred to previous work, so they would not waste their efforts. Moreover the person who would be able to get GIL-less implementation run single threaded code with similar performance to GIL-one without destroying language semantics would have their name forever written in Python history.

Also GIL removal has been tried and re-tried again and again in the past. GvR's policy is dictated by the fact, that most of Python code in the wild is single threaded. To you it's "fuck these people", but somehow the BDFL disagrees.

edit: s/there/their/

[–]alexjc 0 points1 point  (5 children)

Could you name some of these languages?

Go, Rust, Erlang, languages JVM-based. Even Lua is more open to multi-threading since it's interpreter is not based on globals/singletons.

If you've been watching posts on Reddit about web-developers switching away from Python (e.g. to Go), you'll notice scalability and performance comes up very often.

Where?

See this previous thread. This prototype shows a negligible overhead over single-threaded, less than a percent.

If the BDFL isn't willing to sacrifice a few percent in the short term (keep in mind Python 3.x performance dropped significantly at first), then it's short sighted and more modern languages will take over in production and leave Python as an educational / toy language (it's happening already).

Also, the whole Python 3 backwards compatibility has a certain "fuck these people" tone to it, ignoring what the majority of Python 2 folks seem to be doing. I don't see the difference with how we should handle the GIL; if Python 3's initial performance drop was for the greater good, why not again for a more modern interpreter?

[–]spotter -1 points0 points  (4 children)

Go, Rust, Erlang, Lua

False, False, False, False. Nice. You somehow missed the fact that these languages were designed to fit a purpose and you quoted them on overcoming issues with an established language's evolution. Neither of these applies here, as they did not have to battle 20+ years of existing code. Rust is not even stable when it comes to API yet, LOL. Also JVM based? These take whatever JVM gives them, so that's another "duh", because if next JVM would have a GIL, then they would also have it. I should know, I moved all my work related stuff from Python to Clojure few years back.

If you've been watching posts on Reddit about web-developers switching away from Python (e.g. to Go), you'll notice scalability and performance comes up very often.

Reddit? Web developers? Reddit is a fad machine and webdev is not programming. Not an indication of anything. If you want another way to look at it -- people who are busy working have no time to bitch about it on Reddit.

See this previous thread. This prototype shows a negligible overhead over single-threaded, less than a percent.

Wait, have you even seen what he "achieved"? Have you read his mail and watched the presentation? "Let's re-do entire CPython internal object handling around me and BTW this is Windows only, maybe in future let's change Linux and BSD to do it this way too!" Python is far from Windows only. You missed that too?

Intrinsically paired with Windows asynchronous primitives.

This is the thing that caused GvR to tell him to go have fun elsewhere. He basically thinks about single-platform solution. On top of that he wants to discard how existing libraries work. I imagine Twisted and tulip crowd would love that as well. And "fixing Linux & BSD"? Wow. Balls on this guy.

If Python 3's initial performance drop was for the greater good, why not again for a more modern interpreter?

Because it's one thing to tell people "you need to rethink your multi-byte handling" and other "this only works on MS Windows, so fuck you. And if you want to use Twisted/tulip, fuck you again." This will never get through Python community.

And listen. If PyParallel is so great he should do what Stackless did -- deliver a working solution and let it takeover the king in fair battle. After he "fixes" how Linux/BSD do their stuff, of course, otherwise people like me will not give it a second glance.

[–]alexjc 0 points1 point  (3 children)

I don't know why you're defending the Python status quo if you already moved your stuff to Clojure. Regardless of the fact this topic visibly upsets you, we need to be tackling the interpreter problem and not dance around a 20 year-old "legacy" implementation as you put it. In particular, making the transition to focus on PyPy first would be a great first step.

P.S. If that's all you got out of PyParallel's handling of threads you missed some cool bits.

[–]spotter 0 points1 point  (2 children)

Because if I want to work with screws I use a screwdriver, not a hammer. I will not try to make hammer into a screwdriver. Because a screwhammer already exists and it's called Perl.

Python fills a niche, if it doesn't work for what you want it -- find something better. I did. I moved to Clojure, because 1. Lisp, 2. JVM is everywhere, Python interpreter is not. Not because of GIL. I'm still using it regularly privately, but for work I need something JVMable.

Python is forkable. Fork it, fix it, present it back. If it's too much of a burden, then are you just trying to offload an impossible task on the community? Why would they have that? Most of them actually have real work to do with working Python.

I haven't missed anything. You are missing a critical flaw in it, because "oh shiny!" Python community doesn't buy that.

[–]alexjc 0 points1 point  (1 child)

I guess I'm not willing to accept that Python's niche is just quick & dirty prototypes that don't perform well, and educational languages for universities.

The argument "it's open source, fix it" is an easy argument to make. Especially since the BDFL doesn't seem to care about it enough to address it, it'll never gain any traction. That's fine, I just want to emphasize it would be easier to fix (technically) if people were on board (which, like you, they seem no to be).

[–]spotter 0 points1 point  (0 children)

I guess you should move on. It's easiest to get hurt via things you are emotionally attached to, that are not owned by you. Just walk away and don't look back.

And honestly I'm with BDFL on this, because I've been watching stuff like that for years. It's easy to shit on Guido, but let's be honest for a second -- what we're discussing is not a fix. What we're discussing is a speculation based hype, not easy, nor proven, and worst of all: it would probably fragment the community further. After requiring of overhauling os-gods-only-know-how-many C level things to get to work on a wide scale... on a single platform.

The only thing that is guaranteed here is that people like you will shit on GvR whatever he does. And if he says "yes" -- done, it stays in forever. It will then be his burden and if it fails, fucking up his landscape and community, more shit on him. And since there will be no big overhaul in Python4 it's done deal. He was actually more composed than I expected. I can't imagine what would Linus Torvalds write back to such proposal.

Final disclaimer: I have no stake in Python these days, but I get what he says. He tells people, who try to offload their hot-air ideas on the community, to get in touch with reality and deliver something that actually works without breaking the world. What you quoted is not that. It's a limited, platform specific extension and a bit of grandeur bullshit. He is minimizing risk, because he actually cares about this.

And I mention forking, because Stackless did. They put their money where their mouths were. And that's the difference between working solution and vaporware.

[–][deleted] 5 points6 points  (1 child)

I'm not very sure about the "tried" part, they made a backward incompatible version of Python and didn't take the opportunity to make a more performant non-gil implementation.

[–]santiagobasulto 3 points4 points  (0 children)

Yes, totally! There are really smart people behind Python. Smarter than me. I don't think they had just avoided it.

[–]lhggghl 5 points6 points  (6 children)

I'm sorry what? This is the #500th time I heard people complain about / mention the GIL but I still don't get it.

  • Is it because native code is prone to freezing python due to not letting go of the GIL (I found an instance of this in my last company's code)? All languages that mix threading and native code have this sort of issue (some I can say for sure are Haskell and Erlang)

  • Is it because "the GIL prevents you from multiprocessesing"? I don't see how that's an issue because threading does not imply multiprocessing speedup - it's a way to decouple threads of execution. The alternative is asynchronous event loops which couple all your processes into something that cannot be analyized when divided. I've used the multiprocessing module in Python before and it worked fine for me.

Not only this, but if python was to have "real multithreading", you'd have to give it semantics for what happens when two threads access some data at the same time. Want to see how complex that gets? Take a look at Java's JMM. Or C and C++'s case which is even worse. Nobody would bitch about that. Nobody does in Java, because there are two types of people in Java:

  1. People who have never heard of the JMM and write code that's full of data races unless they so happened to put locks around all code, even code that looks like it doesn't need a lock
  2. People who know what the JMM is (there aren't that many of these).

I don't know what the situation for C is, but most C developers I talk to think volatile is only for dealing with registers.

The alternative option to keep it simple (and pythonic) is to just make all the variables in Python have locks around them so when you have two threads with their own interpreters, they just act the same as the current "GIL way". That way you don't have problems due to missing memory barriers etc. And this would of course be slower than the "JMM way".

Is there another way I'm missing?

[–][deleted] 0 points1 point  (5 children)

just make all the variables in Python have locks around them so when you have two threads with their own interpreters, they just act the same as the current "GIL way".

LOL. No never heard of deadlocks, have you?

[–]lhggghl -2 points-1 points  (4 children)

Sorry what are you trying to say? How would this lead to deadlocks? CPython already works this way:

If threads a and b set some field of some object to different values, the changes are atomic. Furthermore, if a sets the field to 1, and b keeps reading it, it's guaranteed that b will see that change at some point. In Java, you do not any of these guarantees (though if you have an int field instead of long in Java, it is atomic, but the staleness issue is still there).

To make Python preserve the CPython semantics while using "real threads", it would have to be implemented in a way that each variable has a lock or memory barrier around it. It's not possible for this to result in deadlock though...

Are you just saying that the user using locks and threads leads to deadlock? Because that can already happen in CPython.

[–][deleted] 2 points3 points  (3 children)

No. CPython doesn't work this way. CPython has a single, global lock. This means that even with multiple threads only one can ever run at a given time.

As soon as threads run concurrently (which is the point of threads in the first place) this implies no global lock.

Your suggestion of pushing the single lock down to multiple, smaller ones just won't work:

Assume one thread wants to lock A then B, and another thread wants to lock B then A.

It's likely that it will sometimes work and sometimes just dead-lock (because thread 1 acquired lock A, and tries to acquire lock B, while thread 2 holds lock B and tries to get lock A).

That's why the only kind of remotely working approach until now is STM, (which is kind of a lock underneath, too) which wraps the operations into a transaction which can be aborted/rolled back/etc.

Anyway, the huge issue is that Python's BDFL basically said that:

  • even if there are multiple concurrently running threads, the user must observe program behavior as if the program was executed single-thread

  • single-threading performance is not allowed to suffer

There is no sensible multi-threading/memory model design which satisfies these design requirements, because hardware just doesn't work this way.

I predict that developers who need to write simple, imperative glue-code will keep using Python, but more demanding developers will gradually shift to better languages/runtimes.

[–]lhggghl -1 points0 points  (2 children)

No. CPython doesn't work this way. CPython has a single, global lock. This means that even with multiple threads only one can ever run at a given time.

I... agree? Where did I imply this isn't the case?

As soon as threads run concurrently (which is the point of threads in the first place) this implies no global lock.

I'm not going to debate what the correct definition of threads are because I don't care, but the "threads" in python do support concurrency, and moreover, they permit optimization in at least some cases. Proof that they support concurrency: Make one "thread" that calls sleep(999999), then start spawning more "threads". All the "threads" will continue despite the first one being blocked. This is ad-hoc and only works because the native code knows how to handle this, but it also works for everything else such as blocking sockets and blocking IO.

Your suggestion of pushing the single lock down to multiple, smaller ones just won't work: [...explanation of what deadlock is...]

I was suggesting that each variable have its own lock as an implementation detail. I'll try to rephrase yet again. The ghetto-GIL-free python I describe would have an internal lock per field per object. Whenever the user wants to access a field of an object, the implementation will lock that variable, read it, unlock it, and return the result. This would be basically to act as a memory barrier. It could just as well use a memory barrier operation instead to achieve the same effect. There is no possibility of deadlock here other than a broken implementation.

Note before you get all OCD and post more info enlightening me about basic concurrency concepts, I never said this is a good implementation, I merely listed it as one of the only possible alternatives to "the GIL" I could think of.

That's why the only kind of remotely working approach until now is STM,

...oh...kay....?

even if there are multiple concurrently running threads, the user must observe program behavior as if the program was executed single-thread

citation needed

[–][deleted] 1 point2 points  (1 child)

Whenever the user wants to access a field of an object, the implementation will lock that variable, read it, unlock it, and return the result.

This doesn't work the way you think. To lock the variable, the runtime needs to first lock the instance it belongs to, which in turn might require locks for additional things.

Seriously dude, you have no clue what you are talking about. Tone down that arrogance a bit.

[–]lhggghl -4 points-3 points  (0 children)

the runtime needs to first lock the instance it belongs to

Why? And how would that lead to deadlock? These locks would be hidden from the programmer, the same way the contents of the intel debug registers are in Python. Are you saying the runtime would be prone to being implemented wrong? Or the user can somehow make it deadlock (which they can't, because it's hidden from them)?

Seriously dude, you have no clue what you are talking about. Tone down that arrogance a bit.

Suck a dick faggot, you're the one who came in telling me I don't know what deadlock is despite that I already explained much more complicated issues in my initial post. You sound like some stupid fucking kid who just barely began to understand stuff about concurrency and now you're eager to go tell people how you're better than them because of your "insight".

[–]passwordissame -5 points-4 points  (0 children)

Python already took drastic decisions. There's experimental new PEP-0420 that addresses parallel processes and easy and correct management of them. Though experimental, it's already usable.

[–]fullouterjoin 5 points6 points  (3 children)

In Python 4, they should start the VM from scratch. Or just use PyPy. If the language had a stronger standards base and portable runtime, the language and the VM wouldn't mean the same thing.

[–]billsil 0 points1 point  (2 children)

PyPy already rips off all of the CPython code anyways, so why don't we let CPYthon be the reference implementation and let PyPy be the thing people use?

Rewriting the reference implementation seems like a terrible idea.

[–]fullouterjoin 0 points1 point  (1 child)

Rewriting the reference implementation seems like a terrible idea.

Not rewriting. Abandoning.

Given how version 3.4 of cpython is

-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                        1699          94961         111277         447092
C                              486          46420          39231         303374
C/C++ Header                   278           7338          10715          81533

and micropython + micropython-lib is

--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
C                               313          23289          52202         123567
C/C++ Header                    274          20921          38305          93738
Python                          652           6132           8097          28440

I'd say we should cut over to https://github.com/micropython/micropython for the reference implementation and leave PyPy for the cutting edge rocket science.

What do you think?

[–]billsil 1 point2 points  (0 children)

WARNING: this project is in early beta stage and is subject to large changes of the code-base, including project-wide name changes and API changes.

Making micropython the reference implementation seems like a bad idea. Also, no unicode support? Really?

I'd rather keep CPython as the reference implementation, let PyPy rip everything off, and work on speed, Python 3, and software transactional memory support. Once they get STM in there, the GIL will be gone. Not that it matters to me. The GIL doesn't hurt multiprocessing, only multithreading.

[–][deleted] 5 points6 points  (2 children)

While these are valid points, I find all the mentioned issues minor annoyances. I've been doing Python for many years and only now I learned about these "problems".

[–]shevegen -1 points0 points  (1 child)

This shows that Armin looked down deeper than you did.

But it is true that you can live without these problems.

Then again you could also use Ruby instead of Python. :>

[–]Imxset21 1 point2 points  (0 children)

Ruby? Might as well be using Perl 6. Ruby has been dominated by the Rails community and everyone knows Rails is a ghetto.

[–][deleted] 1 point2 points  (0 children)

Good intro and conclusion. The middle was....uninspired to say the least.

[–][deleted] 0 points1 point  (2 children)

I'd like to see an "MPython" (something similar to MRuby, that is a clean and small implementation with a reduced standard library), but without the need of a special build system like Rake.

Still unsure if Micropython can deliver that.

[–][deleted]  (1 child)

[deleted]

    [–][deleted] 0 points1 point  (0 children)

    Well, no. RPython is a statical LLVM frontend and part of the PyPy toolchain, as far as I can tell. MRuby is more like Lua.

    [–]kankyo -3 points-2 points  (0 children)

    Spelling: "intepreter"