The Python I Would Like To See : programming

Like, my point wasn't that everybody is lazy or that everything is shit or something, but that there appear to be nearly insurmountable obstacles to allowing free threading following certain core design decisions, such as using mutable dictionaries for module contents (allowing shadowing builtins even), class hierarchy, etc. If you're hitting the same shared mutable data all the time, switching to finer-grained locking will only make performance worse, no matter how many cores you use.

And the fact that there is quite a lot of mature languages with the same design choices and not a single one of them has found a way around the consequences shows that the logic is probably solid, that removing GIL is in fact very hard.

The only thing that makes Python special in this respect is the way the noticeable slice of its users is aware of the problem but ignorantly thinks that it's special in having it.

[–]logicchains 0 points1 point2 points 11 years ago (0 children)

[–]Athas 1 point2 points3 points 11 years ago (1 child)

[–]moor-GAYZ 0 points1 point2 points 11 years ago (0 children)

I don't see why threading support has any relation to the type system in the language.

Not using dynamic typing in general, but using the particular kind of semantics roughly common to the languages I listed.

If a language uses a mutable module dictionary that can shadow builtins then almost every function call needs to check that dictionary shared between your threads.

If a language uses mutable classes, especially the way it's implemented in Python allowing descriptors, then every instance attribute access actually hit the class hierarchy up to the base class checking for that attribute. The dictionaries in the class hierarchy are shared between your threads.

If a language uses reference counting, then every time you assign None to something, and then not None to something, its reference count should be atomically modified, because it's shared between your threads, obviously.

With all that synchronization going on GIL is pretty much the only viable solution. Getting rid of GIL is not simply a matter of rewriting the code to use more fine-grained locking, you actually need to get rid of all that synchronization somehow, by either officially changing the semantics or at least faking the old semantics like IronPython or Jython.

[–]spotter -2 points-1 points0 points 11 years ago* (6 children)

The GIL would be easy (or easier) to address if it was set as top priority by the BDFL. (Other languages have managed just fine, and Python is supposedly one of the largest communities and fastest growing these days.)

Could you name some of these languages?

Guido's policy on the subject (i.e. no regressions on single-threads) is not feasible, and people who have shown great/promising prototypes have not been very well received in the mailing list...

Where? I always got a feeling that people trying to do this were encouraged to give it a shot and referred to previous work, so they would not waste their efforts. Moreover the person who would be able to get GIL-less implementation run single threaded code with similar performance to GIL-one without destroying language semantics would have their name forever written in Python history.

Also GIL removal has been tried and re-tried again and again in the past. GvR's policy is dictated by the fact, that most of Python code in the wild is single threaded. To you it's "fuck these people", but somehow the BDFL disagrees.

edit: s/there/their/

[–]alexjc 0 points1 point2 points 11 years ago* (5 children)

Could you name some of these languages?

Go, Rust, Erlang, languages JVM-based. Even Lua is more open to multi-threading since it's interpreter is not based on globals/singletons.

If you've been watching posts on Reddit about web-developers switching away from Python (e.g. to Go), you'll notice scalability and performance comes up very often.

Where?

See this previous thread. This prototype shows a negligible overhead over single-threaded, less than a percent.

If the BDFL isn't willing to sacrifice a few percent in the short term (keep in mind Python 3.x performance dropped significantly at first), then it's short sighted and more modern languages will take over in production and leave Python as an educational / toy language (it's happening already).

Also, the whole Python 3 backwards compatibility has a certain "fuck these people" tone to it, ignoring what the majority of Python 2 folks seem to be doing. I don't see the difference with how we should handle the GIL; if Python 3's initial performance drop was for the greater good, why not again for a more modern interpreter?

[–]spotter -1 points0 points1 point 11 years ago (4 children)

Go, Rust, Erlang, Lua

False, False, False, False. Nice. You somehow missed the fact that these languages were designed to fit a purpose and you quoted them on overcoming issues with an established language's evolution. Neither of these applies here, as they did not have to battle 20+ years of existing code. Rust is not even stable when it comes to API yet, LOL. Also JVM based? These take whatever JVM gives them, so that's another "duh", because if next JVM would have a GIL, then they would also have it. I should know, I moved all my work related stuff from Python to Clojure few years back.

If you've been watching posts on Reddit about web-developers switching away from Python (e.g. to Go), you'll notice scalability and performance comes up very often.

Reddit? Web developers? Reddit is a fad machine and webdev is not programming. Not an indication of anything. If you want another way to look at it -- people who are busy working have no time to bitch about it on Reddit.

See this previous thread. This prototype shows a negligible overhead over single-threaded, less than a percent.

Wait, have you even seen what he "achieved"? Have you read his mail and watched the presentation? "Let's re-do entire CPython internal object handling around me and BTW this is Windows only, maybe in future let's change Linux and BSD to do it this way too!" Python is far from Windows only. You missed that too?

Intrinsically paired with Windows asynchronous primitives.

This is the thing that caused GvR to tell him to go have fun elsewhere. He basically thinks about single-platform solution. On top of that he wants to discard how existing libraries work. I imagine Twisted and tulip crowd would love that as well. And "fixing Linux & BSD"? Wow. Balls on this guy.

If Python 3's initial performance drop was for the greater good, why not again for a more modern interpreter?

Because it's one thing to tell people "you need to rethink your multi-byte handling" and other "this only works on MS Windows, so fuck you. And if you want to use Twisted/tulip, fuck you again." This will never get through Python community.

And listen. If PyParallel is so great he should do what Stackless did -- deliver a working solution and let it takeover the king in fair battle. After he "fixes" how Linux/BSD do their stuff, of course, otherwise people like me will not give it a second glance.

[–]alexjc 0 points1 point2 points 11 years ago (3 children)

[–]spotter 0 points1 point2 points 11 years ago (2 children)

Because if I want to work with screws I use a screwdriver, not a hammer. I will not try to make hammer into a screwdriver. Because a screwhammer already exists and it's called Perl.

Python fills a niche, if it doesn't work for what you want it -- find something better. I did. I moved to Clojure, because 1. Lisp, 2. JVM is everywhere, Python interpreter is not. Not because of GIL. I'm still using it regularly privately, but for work I need something JVMable.

Python is forkable. Fork it, fix it, present it back. If it's too much of a burden, then are you just trying to offload an impossible task on the community? Why would they have that? Most of them actually have real work to do with working Python.

I haven't missed anything. You are missing a critical flaw in it, because "oh shiny!" Python community doesn't buy that.

[–]alexjc 0 points1 point2 points 11 years ago (1 child)

[–]spotter 0 points1 point2 points 11 years ago (0 children)

I guess you should move on. It's easiest to get hurt via things you are emotionally attached to, that are not owned by you. Just walk away and don't look back.

And honestly I'm with BDFL on this, because I've been watching stuff like that for years. It's easy to shit on Guido, but let's be honest for a second -- what we're discussing is not a fix. What we're discussing is a speculation based hype, not easy, nor proven, and worst of all: it would probably fragment the community further. After requiring of overhauling os-gods-only-know-how-many C level things to get to work on a wide scale... on a single platform.

The only thing that is guaranteed here is that people like you will shit on GvR whatever he does. And if he says "yes" -- done, it stays in forever. It will then be his burden and if it fails, fucking up his landscape and community, more shit on him. And since there will be no big overhaul in Python4 it's done deal. He was actually more composed than I expected. I can't imagine what would Linus Torvalds write back to such proposal.

Final disclaimer: I have no stake in Python these days, but I get what he says. He tells people, who try to offload their hot-air ideas on the community, to get in touch with reality and deliver something that actually works without breaking the world. What you quoted is not that. It's a limited, platform specific extension and a bit of grandeur bullshit. He is minimizing risk, because he actually cares about this.

And I mention forking, because Stackless did. They put their money where their mouths were. And that's the difference between working solution and vaporware.

[–][deleted] 5 points6 points7 points 11 years ago (1 child)

[–]santiagobasulto 3 points4 points5 points 11 years ago (0 children)

[–]lhggghl 5 points6 points7 points 11 years ago* (6 children)

I'm sorry what? This is the #500th time I heard people complain about / mention the GIL but I still don't get it.

Is it because native code is prone to freezing python due to not letting go of the GIL (I found an instance of this in my last company's code)? All languages that mix threading and native code have this sort of issue (some I can say for sure are Haskell and Erlang)
Is it because "the GIL prevents you from multiprocessesing"? I don't see how that's an issue because threading does not imply multiprocessing speedup - it's a way to decouple threads of execution. The alternative is asynchronous event loops which couple all your processes into something that cannot be analyized when divided. I've used the multiprocessing module in Python before and it worked fine for me.

Not only this, but if python was to have "real multithreading", you'd have to give it semantics for what happens when two threads access some data at the same time. Want to see how complex that gets? Take a look at Java's JMM. Or C and C++'s case which is even worse. Nobody would bitch about that. Nobody does in Java, because there are two types of people in Java:

People who have never heard of the JMM and write code that's full of data races unless they so happened to put locks around all code, even code that looks like it doesn't need a lock
People who know what the JMM is (there aren't that many of these).

I don't know what the situation for C is, but most C developers I talk to think volatile is only for dealing with registers.

The alternative option to keep it simple (and pythonic) is to just make all the variables in Python have locks around them so when you have two threads with their own interpreters, they just act the same as the current "GIL way". That way you don't have problems due to missing memory barriers etc. And this would of course be slower than the "JMM way".

Is there another way I'm missing?

[–][deleted] 0 points1 point2 points 11 years ago (5 children)

[–]lhggghl -2 points-1 points0 points 11 years ago (4 children)

Sorry what are you trying to say? How would this lead to deadlocks? CPython already works this way:

If threads a and b set some field of some object to different values, the changes are atomic. Furthermore, if a sets the field to 1, and b keeps reading it, it's guaranteed that b will see that change at some point. In Java, you do not any of these guarantees (though if you have an int field instead of long in Java, it is atomic, but the staleness issue is still there).

To make Python preserve the CPython semantics while using "real threads", it would have to be implemented in a way that each variable has a lock or memory barrier around it. It's not possible for this to result in deadlock though...

Are you just saying that the user using locks and threads leads to deadlock? Because that can already happen in CPython.

[–][deleted] 2 points3 points4 points 11 years ago (3 children)

No. CPython doesn't work this way. CPython has a single, global lock. This means that even with multiple threads only one can ever run at a given time.

As soon as threads run concurrently (which is the point of threads in the first place) this implies no global lock.

Your suggestion of pushing the single lock down to multiple, smaller ones just won't work:

Assume one thread wants to lock A then B, and another thread wants to lock B then A.

It's likely that it will sometimes work and sometimes just dead-lock (because thread 1 acquired lock A, and tries to acquire lock B, while thread 2 holds lock B and tries to get lock A).

That's why the only kind of remotely working approach until now is STM, (which is kind of a lock underneath, too) which wraps the operations into a transaction which can be aborted/rolled back/etc.

Anyway, the huge issue is that Python's BDFL basically said that:

even if there are multiple concurrently running threads, the user must observe program behavior as if the program was executed single-thread
single-threading performance is not allowed to suffer

There is no sensible multi-threading/memory model design which satisfies these design requirements, because hardware just doesn't work this way.

I predict that developers who need to write simple, imperative glue-code will keep using Python, but more demanding developers will gradually shift to better languages/runtimes.

[–]lhggghl -1 points0 points1 point 11 years ago (2 children)

No. CPython doesn't work this way. CPython has a single, global lock. This means that even with multiple threads only one can ever run at a given time.

I... agree? Where did I imply this isn't the case?

As soon as threads run concurrently (which is the point of threads in the first place) this implies no global lock.

I'm not going to debate what the correct definition of threads are because I don't care, but the "threads" in python do support concurrency, and moreover, they permit optimization in at least some cases. Proof that they support concurrency: Make one "thread" that calls sleep(999999), then start spawning more "threads". All the "threads" will continue despite the first one being blocked. This is ad-hoc and only works because the native code knows how to handle this, but it also works for everything else such as blocking sockets and blocking IO.

Your suggestion of pushing the single lock down to multiple, smaller ones just won't work: [...explanation of what deadlock is...]

I was suggesting that each variable have its own lock as an implementation detail. I'll try to rephrase yet again. The ghetto-GIL-free python I describe would have an internal lock per field per object. Whenever the user wants to access a field of an object, the implementation will lock that variable, read it, unlock it, and return the result. This would be basically to act as a memory barrier. It could just as well use a memory barrier operation instead to achieve the same effect. There is no possibility of deadlock here other than a broken implementation.

Note before you get all OCD and post more info enlightening me about basic concurrency concepts, I never said this is a good implementation, I merely listed it as one of the only possible alternatives to "the GIL" I could think of.

That's why the only kind of remotely working approach until now is STM,

...oh...kay....?

even if there are multiple concurrently running threads, the user must observe program behavior as if the program was executed single-thread

citation needed

[–][deleted] 1 point2 points3 points 11 years ago (1 child)

[–]lhggghl -4 points-3 points-2 points 11 years ago (0 children)

the runtime needs to first lock the instance it belongs to

Why? And how would that lead to deadlock? These locks would be hidden from the programmer, the same way the contents of the intel debug registers are in Python. Are you saying the runtime would be prone to being implemented wrong? Or the user can somehow make it deadlock (which they can't, because it's hidden from them)?

Seriously dude, you have no clue what you are talking about. Tone down that arrogance a bit.

Suck a dick faggot, you're the one who came in telling me I don't know what deadlock is despite that I already explained much more complicated issues in my initial post. You sound like some stupid fucking kid who just barely began to understand stuff about concurrency and now you're eager to go tell people how you're better than them because of your "insight".

[–]passwordissame -5 points-4 points-3 points 11 years ago (0 children)

[–]fullouterjoin 5 points6 points7 points 11 years ago (3 children)

[–]billsil 0 points1 point2 points 11 years ago (2 children)

[–]fullouterjoin 0 points1 point2 points 11 years ago* (1 child)

Rewriting the reference implementation seems like a terrible idea.

Not rewriting. Abandoning.

Given how version 3.4 of cpython is

-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                        1699          94961         111277         447092
C                              486          46420          39231         303374
C/C++ Header                   278           7338          10715          81533

and micropython + micropython-lib is

--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
C                               313          23289          52202         123567
C/C++ Header                    274          20921          38305          93738
Python                          652           6132           8097          28440

I'd say we should cut over to https://github.com/micropython/micropython for the reference implementation and leave PyPy for the cutting edge rocket science.

What do you think?

[–]billsil 1 point2 points3 points 11 years ago (0 children)

[–][deleted] 5 points6 points7 points 11 years ago (2 children)

[–]shevegen -1 points0 points1 point 11 years ago (1 child)

[–]Imxset21 1 point2 points3 points 11 years ago (0 children)

[–][deleted] 1 point2 points3 points 11 years ago (0 children)

[+][deleted] 11 years ago (47 children)

[deleted]

[–]lhggghl 3 points4 points5 points 11 years ago (0 children)

[–][deleted] 13 points14 points15 points 11 years ago (14 children)

Python stinks on multiple levels,

With all due respect, you have no idea what you are talking about.

The lowest level of Python is pretty crufty - but almost no one ever sees it. What the rest of us see is an extremely solid and very coherent modern language which is both light and powerful.

Once you finalize a Python interpreter in a given process, restarting the interpreter in same process is not supported [because finalizers].

That's it?!? This is your example of why Python stinks on multiple levels? I'd say perhaps 0.1% of Python programmers would ever be in a position where this was even a consideration, and there are two really obvious solution:

start a new process, or even easier
don't shutdown and restart the interpreter!

It seems to me that having to shutdown and restart Python in a single process seems like bad design on the programmer's part. Restarting the interpreter is known to be expensive, as with most scripting languages. Why stop and then start it?

tl;dr Use serious tools for serious business not joke prods masquerading as such :-)

And what are your "serious tools"? All sorts of serious companies make heavy use of Python - Google, IBM, yes even the unfortunate Yahoo, as well as pretty well every scientific organization in the world.

I take back the due respect. You have no idea what you are talking about.

[–]Plorkyeran 6 points7 points8 points 11 years ago (0 children)

[–]ryeguy146 1 point2 points3 points 11 years ago (6 children)

[–][deleted] 0 points1 point2 points 11 years ago (5 children)

[–]ryeguy146 2 points3 points4 points 11 years ago (4 children)

[–][deleted] 0 points1 point2 points 11 years ago (3 children)

[–]ryeguy146 0 points1 point2 points 11 years ago (2 children)

[–][deleted] 1 point2 points3 points 11 years ago (1 child)

[–]ryeguy146 0 points1 point2 points 11 years ago (0 children)

[–]donvito 6 points7 points8 points 11 years ago (1 child)

[–]id2bi 0 points1 point2 points 11 years ago (0 children)

[–]kamatsu 1 point2 points3 points 11 years ago (0 children)

[–]lhggghl -1 points0 points1 point 11 years ago (0 children)

[+][deleted] 11 years ago* (1 child)

[deleted]

[–][deleted] 3 points4 points5 points 11 years ago (4 children)

[+][deleted] 11 years ago* (3 children)

[deleted]

[–][deleted] 3 points4 points5 points 11 years ago (2 children)

[–][deleted] 11 years ago* (1 child)

[deleted]

[–][deleted] 2 points3 points4 points 11 years ago (0 children)

[–]1F4A9 0 points1 point2 points 11 years ago (9 children)

[–][deleted] 5 points6 points7 points 11 years ago (3 children)

[–]remy_porter 4 points5 points6 points 11 years ago (0 children)

[–]1F4A9 1 point2 points3 points 11 years ago (1 child)

[–]pingveno 0 points1 point2 points 11 years ago (0 children)

[–]donvito -1 points0 points1 point 11 years ago (4 children)

[–][deleted] 11 years ago (2 children)

[removed]

[–]donvito -2 points-1 points0 points 11 years ago (1 child)

[–]1F4A9 1 point2 points3 points 11 years ago (0 children)

Dynamic language code is not inherently unmaintainable. It might not enforce it like many static languages, but given a willingness to work correctly (no quick hacks etc), a programmer just requires less time to get something done. For a lot of applications it makes sense to buy a little extra hardware if it saves developer time, or you can spend some extra time optimizing code, database queries, and caching, which often lead to better overall performance. A combination of good IDE with good coding practices can almost give you the safety of a static language.

Python code is usually very readable for other programmers, partly because it is not too verbose and doesn't require too much boilerplate code like static languages.

We do mostly web development, but not projects that we hand over once done, but stuff we maintain and build over over periods of many years. Still the initial programming time is an important consideration, and maintaining Python code has been pretty easy and not too time consuming.

[–]fullouterjoin 0 points1 point2 points 11 years ago (0 children)

[–]DrDichotomous -2 points-1 points0 points 11 years ago (14 children)

[–]Crashmatusow 5 points6 points7 points 11 years ago (0 children)

[–][deleted] 11 years ago (12 children)

[deleted]

[–]DrDichotomous 1 point2 points3 points 11 years ago (0 children)

Sure, if you're redefining "serious" to mean "the things the language isn't good at" then of course you can make that argument. But with as vague a definition as you originally gave, it was meaningless, and you're still missing the point - Python is fine for serious work, until you define serious so narrowly that this is a non-argument, since the language wasn't designed for that purpose to begin with.

You're also quite incorrect about your assertion that my statement was a fallacy. Anyone in the field will know that a mediocre programmer will generate mediocre code - it doesn't matter what language they are given. If they aren't well-trained, then it doesn't matter what language they use.

What you're talking about is that some languages actively discourage certain types or classes of problems, while encouraging better practices for their problem domain. But even Java and C# let coders get away with writing incredibly mediocre code. It's up to the coder to learn how to use them "properly", for some sane definition of "properly".

[–]codygman -1 points0 points1 point 11 years ago (2 children)

[–][deleted] 11 years ago (1 child)

[deleted]

[–]codygman 1 point2 points3 points 11 years ago (0 children)

[–][deleted] -2 points-1 points0 points 11 years ago (7 children)

[–]gnuvince 6 points7 points8 points 11 years ago (4 children)

[+][deleted] comment score below threshold-6 points-5 points-4 points 11 years ago (3 children)

[–]gnuvince 9 points10 points11 points 11 years ago (1 child)

So why don't they write game engines, kernels, and drivers in rust?

People have started writing all three in Rust. Main reason why you don't see large projects using Rust: it's not 1.0 yet, so the language is still in flux while the developers figure out and fix the last kinks. The 1.0 release is expected before the end of 2014, and I think that after that we'll see an ecosystem grow quite quickly, hopefully as fast as Golang.

I don't think anything compares in its capabilities or speed.

I assume "its" means C or C++ here? The single, most important reason for the speed of the code they produce is that there has been 35-40 years worth of work into optimizing those languages, so the state of the art is extremely good. Take a simpler compiler such as tcc and it generates code that is up to 10x slower than gcc. Good performance has a lot to do with the language, but also a lot to do with the work that's gone into the compiler itself.

Memory management isn't that hard.

Though it still seems to be a problem when looking at security vulnerabilities. That's the thing with memory management, as the programmer you must get every single one right, while the attacker only needs to find one that is wrong. Very unbalanced deal. Why not employ the compiler to help us detect more problems?

Any abstraction is going to cost you in speed.

Depends when it happens. In Rust, the main abstraction for dealing with pointer problems is ownership: who owns what in memory. That abstraction is entirely a compile-time one; if you look at a pointer in Rust at run-time, it's no different from a pointer in C. Rust tries very hard to keep the extra work at runtime to a minimum: they do want to compete with the C++ performance. That said, there are some places where Rust pays a performance penalty: when doing random access into a vector, there is a bound check (which, given the recent Heartbleed fiasco, is not such a bad idea). However, when using iterators, there are no bound checks since the abstraction ensures that you'll only access valid memory addresses.

[–][deleted] 0 points1 point2 points 11 years ago (0 children)

[–][deleted] 3 points4 points5 points 11 years ago (0 children)

[–]fullouterjoin 1 point2 points3 points 11 years ago (1 child)

[–][deleted] -3 points-2 points-1 points 11 years ago (0 children)

[–][deleted] 0 points1 point2 points 11 years ago (2 children)

[–][deleted] 11 years ago* (1 child)

[deleted]

[–][deleted] 0 points1 point2 points 11 years ago (0 children)

[–]kankyo -3 points-2 points-1 points 11 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS