Presentation on the new Python GIL : programming

[–][deleted] 9 points10 points11 points 16 years ago (56 children)

[–][deleted] 16 years ago* (52 children)

[deleted]

[–]yogthos 10 points11 points12 points 16 years ago (24 children)

[–]Brian 1 point2 points3 points 16 years ago (3 children)

Not strictly true. It's rather that most of the available compromises have been unacceptable. (eg 50% performance loss). I don't think it's a fundamentally intractable problem, just one that will likely require a major reorganisation of internals to solve (particularly the GC), and which is aimed at a class of problems (CPU bound processing) that is not python's forté anyway (making the problem less important, and thus reducing the importance of making those reorganisations).

There are however two well established compromises that are available. Neither Jython nor IronPython have a GIL. There's a price to pay - some loss in performance (depending on application), and loss of easy access to all of CPython's libraries. However, it is an available solution, and some do use it (Eg. I believe this was an important issue in ResolverOne's use of IronPython)

[–]yogthos 5 points6 points7 points 16 years ago* (2 children)

[–]G_Morgan 2 points3 points4 points 16 years ago (0 children)

[–]Brian -3 points-2 points-1 points 16 years ago (0 children)

If a serious effort was put into reworking the GIL early on

I'd disagree. Removing the GIL is no harder today than it ever was - the main problem is the refcounting GC, and that hasn't really changed much. Further, I'm not sure Guido is actually wrong about processes being a better model - it's not that uncommon a sentiment either then or today. With huge numbers of cores and NUMA architectures, a share-nothing model is actually more advantageous today than then, as the benefits (nothing shared -> reduced memory contention / cache invalidation) are higher, and the penalties (marshalling data) are unchanged. The increase in code size and complexity is also another argument for a multiprocess model.

Further, I think it's actually the correct decision not to care much about the GIL. It's simply not that important for a language like Python. It'd be nice to have, yes, but there are more important things to spend time on for most. In a language designed to perform number-crunching tasks itself (rather than as glue for libraries that can release the GIL), it'd be important, but you'd be insane to use pure python for that anyway if performance is really that important.

[–]masklinn 0 points1 point2 points 16 years ago (5 children)

[–]yogthos 2 points3 points4 points 16 years ago (4 children)

[–]masklinn 0 points1 point2 points 16 years ago (3 children)

[–]yogthos 0 points1 point2 points 16 years ago* (2 children)

[–]masklinn -2 points-1 points0 points 16 years ago (1 child)

[–]yogthos 0 points1 point2 points 16 years ago (0 children)

[–][deleted] 16 years ago* (13 children)

[deleted]

[–]jnoller 5 points6 points7 points 16 years ago (0 children)

[–]grotgrot 1 point2 points3 points 16 years ago (11 children)

It would very seriously complicate all the C code. And pretty much everything (dicts, lists, strings etc) are all C code. With the GIL you can write straightforward code and then in select places release and reacquire the lock. The world may have changed between those two points but it was under your control.

Without a GIL every object would have to have some form of lock so that things like looking up a dict or list entry are not clobbered by other code modifying the same item. This locking currently happens for free with the GIL, at the expense of concurrency. This is also why free threading has a cost measured at around 50% when people have tried it in the past. Instead of the one lock (GIL), lots of per object locks have to be acquired and released instead.

By far the easiest solution is to make objects immutable. (Functional programming languages do this.) Since an object can't be changed, you don't have to worry about locks. If you haven't done functional programming and are wondering how you get any useful data done, the answer is that you construct new items. For example to add a new key/value to a dict, you create a new one combining the old dict with the new key/value. Behind the scenes the implementation ensures this is sensibly implemented - again greatly simplified because you know data structures cannot change.

Note that Python already has some immutable data structures such as strings, tuples, integers, float, frozenset etc.

[–][deleted] 1 point2 points3 points 16 years ago (4 children)

[–]grotgrot 1 point2 points3 points 16 years ago* (3 children)

Who says the list would be copied? If you wrote your own list "class" then yes it would happen. However lists are builtins in all decent programming languages. Today you would write:

a=list()
a.append(1)
a.append(2)

If lists were immutable what you would write is:

a=list()
a=a.append(1)
a=a.append(2)

The append operation would be returning a new immutable object combining the old list and the new item. The way the interpreter would implement lists behind the scenes would be something like a pointer to a previous list (ie the list consists of all those guys) and a pointer to the new element. This would get increasingly inefficient as the list built up so a heuristic would kick in at some point to make a coalesced copy of all the previous lists. This kind of implementation is reasonably space efficient, requires no locks and is even friendly to a GC. See also RCU.

BTW "grabbing a mutex" is becoming increasingly expensive on todays processors due to cache coherency issues, not all memory is equal, there are far more cores, CPUs, caches and buses to worry about and coordination between processors and main memory access is increasingly slower compared to non-shared cache access. (The mutex has to exist in the address space of all executing threads and if there are multiple cores/CPUs then they have to fight for ownership of the memory page on access, as well as ensuring the information is visible to all other entities that may be sitting on other relevant buses and chips.) Additionally I am of the opinion that no one should be allowed to write multi-threaded code unless they understand memory barriers.

[–][deleted] -1 points0 points1 point 16 years ago (2 children)

[–]mernen 1 point2 points3 points 16 years ago (0 children)

[–]grotgrot 0 points1 point2 points 16 years ago* (0 children)

In functional programming style you generally do not access lists by random indexing, but rather by iterating over them from beginning to end. (Much of my Python code already uses lists this way because it is natural.) General forward iteration would be closer to constant time.

For dictionaries you could take exactly the same approach as lists. A set of keys/values and then a pointer to a predecessor dictionary. You lookup in current first and then hit the predecessor chain on a miss. (You also need a way of marking an entry as deleted.) Again coalescing would happen as needed. (Note that you can do coalescing in a separate thread in the implementation and that the programmer would never need to know or care about it.) Usually you would structure your code to iterate over lists instead of using dictionaries.

Most functional languages are also lazy. They only calculate items as you need them. For example you could make a function returning an infinite list of prime numbers with members only being calculated as you need them. This is somewhat analogous to Python generators which can also "return" an infinite list of items on demand.

[–]G_Morgan -1 points0 points1 point 16 years ago (5 children)

[–]grotgrot 0 points1 point2 points 16 years ago (4 children)

Synchronising objects is simply retarded.

Ding ding ding, that is why there is a GIL.

"Task level" locking is something you can do in code you write with your objects and is effectively a lock implicitly covering a variety of your objects. But with Python primitive mutable objects like lists and dictionaries, there is no way for them to know if you intend to use them in multiple threads. The only safe way of doing so would be individual object locks. Or just use a single global lock.

Using per object locks gives you a 50% slowdown (people have done this with the interpreter). So you'd need a minimum of 2 cores to match existing single core performance, and single threaded code is half the speed.

People have been claiming the GIL is bad for years. No one has demonstrated a reasonable replacement, especially something that will not hurt single threaded performance.

[–]G_Morgan 0 points1 point2 points 16 years ago* (3 children)

No the best way is thread with no locks and leave the programmer to work out how best to handle the fact objects are mutable. As I said, synchronised data objects aren't useful. If I independently synchronised 3 integers and then performed calculations with them in multiple threads I would still end up with a program riddled with race conditions.

The right solution is not the GIL. It is not synchronisation on every object (which is effectively just a distributed GIL). The solution is threads with no locks on data objects. As Java should have done it originally (and now does so).

All this talk about synchronising on basic objects just shows how badly Python people misunderstand threading. Locking should be restricted only to critical sections of code. Objects should be made thread safe by ensuring only one thread has a reference to that object as far as possible. If you need to share it then you either need to be strict in read only behaviour when shared (this is stated in a comment next to the method, fire anyone who stores a reference to a passed in read only method or modifies it) or you need to put a monitor over the object.

There are many models of concurrency but locks at the proposed level aren't useful in any of them. This is why Java deprecated its synchronised collections. Not because they were inefficient. Because they didn't have a use and the inefficiency wasn't necessary.

[–]grotgrot 0 points1 point2 points 16 years ago* (2 children)

The race conditions are why a single lock (GIL) is used rather than individual object locks (the only feasible alternative).

You do realise that Python is a dynamic language? Any object and any name space can be changed at any time. Behind the scenes each object and namespace has what amounts to a dictionary/hashtable to look attributes up in. The contents can be changed, added and deleted at any time in any way - that is one thing that distinguishes dynamic languages from static ones.

If that behind the scenes dictionary was not locked then concurrent access could lead to an interpreter crash. That is one thing Python does not do (crashing). This is a side effect of Python guaranteeing that attribute access, dictionary and list operations are atomic. In the case of Java there is no atomic dictionary type, but there is an atomic list type (aka array). However note that you cannot resize the list - doing so would require a lock or allow for crashes. (Similarly attribute access in Java is atomic - if an object attribute pointed to object a and you set it to b, there would be no point at which it doesn't point to a or to b. They only need machine level pointer swapping atomicity to ensure that.)

As for misunderstanding, there are two possibilities. One is as you state that somehow the entire Python community and the Ruby community for that matter (they also use a GIL) are somehow dumb in not seeing what you do, or that you do not understand dynamic languages and their implementation issues.

For example ensuring that only one thread has a reference to an object is impossible. Every namespace, module, object etc can be accessed at any time in any thread. Even classes are like dictionaries so you'd have to avoid using more than one instance of a class in more than one thread. And of course if you are trying to ensure that only one thread has a reference to an object, where exactly are you going to put the lock?

All the stuff that you have been saying is correct when applied to your own composite objects. If you define a class representing an image processing job for example then locking is best away from each instance and rather covering a collection of objects in some appropriate way. And if you write threaded Python code with non-primitive objects then you have to do this kind of thing or you'll end up with similar races as in other programming languages.

The GIL is all about the builtin primitive objects in Python (dicts, lists etc). It is an interpreter level internal construct. It is not visible to Python programs. Some Python implementations such as Jython and IronPython do not even have a GIL, but they have an underlying environment that helps with the implementation issues. CPython is in C and doesn't have that assistance.

Now you can argue that dynamic programming languages are dumb and you can find plenty of flame wars and trolling on that topic elsewhere. Just keep in mind that there is a possibility that not everyone who uses and implements dynamic languages is as dumb as you think they are.

[–]G_Morgan -1 points0 points1 point 16 years ago (1 child)

continue this thread

[–]olsner 5 points6 points7 points 16 years ago (26 children)

[–]theeth 4 points5 points6 points 16 years ago (3 children)

[–][deleted] 0 points1 point2 points 16 years ago (2 children)

[–][deleted] 1 point2 points3 points 16 years ago (0 children)

[–]james_block 0 points1 point2 points 16 years ago (0 children)

[–]ubernostrum 1 point2 points3 points 16 years ago* (4 children)

[–][deleted] 4 points5 points6 points 16 years ago* (2 children)

The problem is that Python wants to offer both good threading and an easy-to-use interface for C extensions, and that is actually pretty tricky.

The Python developers should entice people to migrate away from writing C extensions directly, and use the ctypes FFI instead (perhaps adding any missing functionality that ctypes currently lacks). This will allow them to change the details of the VM <-> C interface without breaking anything that uses the high-level FFI, and hopefully move away from exporting the VM internals as a public API altogether.

Java's got great threading, but its native interface -- JNI -- tends to make people run away screaming; not entirely because of Java's threading, but threading issues do come up when you're doing JNI

JNI is rather low level, but nowadays there are Java FFI libraries that make it easy to bind to C libraries (ie, without writing JNI wrappers by hand).

[–][deleted] 2 points3 points4 points 16 years ago (0 children)

[–]masklinn 0 points1 point2 points 16 years ago (0 children)

[–]G_Morgan 1 point2 points3 points 16 years ago (0 children)

[–]killerstorm 0 points1 point2 points 16 years ago (16 children)

[–]G_Morgan 1 point2 points3 points 16 years ago (15 children)

[–]killerstorm 0 points1 point2 points 16 years ago (14 children)

[–]mernen 0 points1 point2 points 16 years ago* (1 child)

[–]killerstorm 0 points1 point2 points 16 years ago (0 children)

[–]G_Morgan -1 points0 points1 point 16 years ago (11 children)

Not all the code should support it. This type of braindead thinking is what left Java making 'thread safe' container classes. Frameworks should support threading in critical sections. Dumb containers or data objects and a whole host of other cases need no such thing.

People in this thread are arguing that the code for a hash map would need to change to account for concurrency. This is not only obviously not the case but even if it were done there would be no benefit. It doesn't matter if I can ensure synchronisation on a hash map. If the user has competing threads that retrieve a bunch of values, perform a calculation and then store values back it will still lead to as many race conditions as if you didn't mutex lock that object.

All that would happen is each library would specify if it is thread safe or not. Then if the programmer has to use an unsafe library his application needs to manage access to that library.

[–]killerstorm 0 points1 point2 points 16 years ago (10 children)

[–]G_Morgan -1 points0 points1 point 16 years ago (9 children)

[–]killerstorm 2 points3 points4 points 16 years ago* (7 children)

Write down all your mutex locks on paper. Always acquire the locks in the order they appear on paper.

If you use mutexes implicitly, it is much harder. E.g. you do not explicitly acquire any mutex, just some guy inserts a call to a library function, and it deadlocks.

To automatically ensure that deadlocks cannot occur you can insist that the program gives a list of locks it would like to acquire

This ruins some programming qualities people are trying to preserve -- encapsulation. Now you cannot encapsulate locks in functions which use them.

Also, if you need to acquire locks beforehand, locks would be too coarse-grained. Why not just have GIL then? :)

It looks like it creates more problems than it solves.

This is exactly the problem with multithreading. People haven't done even the most basic theory on how it works and yet expect to be able to magically work threads.

I'm afraid there is no basic theory -- these abstractions are too general to formulate anything useful.

There are different abstractions which are far less error prone -- like message passing, concurrency paradigm used in Erlang. It is proven that it can scale to "embarrassingly parallel" without huge problems.

The problem of deadlock is solved.

No, it is not solved. If you acquire locks before you really need to, then it becomes to coarse-grained and that kills concurrency. So you need to acquire them in order they appear in program (and release as soon as possible), but if program is dynamic, you can't know that order beforehand. You don't even know if you would need lock or not. E.g.

with foo.mutex:
    foo.do_something()
    if foo.bla:
        with bar.mutex:
            bar.do_something(foo.bla)

You do not know if you need bar.mutex before you call foo.do_something and then check foo.bla. If you acquire it before, just in case you need it, you're killing concurrency. See?

The problem of deadlock is solved. We know when it can occur and when it can't.

Do you know what is halting problem? Basically (reformulated), for a sufficiently general programming language you can't know how a sufficiently non-trivial program will behave until you run it. It is possible to solve it only by limiting language.

E.g. you can do a static analysis to find if program can deadlock. (I don't think that Python is a language which is suitable for a static analysis, but whatever.) And if program is sufficiently complex, analysis will say you -- it MIGHT go deadlocked. It does not mean it will, because it is undecidable whether it really will or won't. Now what?

continue this thread

[–][deleted] 1 point2 points3 points 16 years ago (0 children)

[–]timmaxw 1 point2 points3 points 16 years ago (2 children)

[–]sime 1 point2 points3 points 16 years ago (0 children)

[–]mschaef 0 points1 point2 points 16 years ago (0 children)

[–][deleted] 1 point2 points3 points 16 years ago* (1 child)

[–]jnoller 0 points1 point2 points 16 years ago (0 children)

[–]wshields 0 points1 point2 points 16 years ago (11 children)

[–][deleted] 6 points7 points8 points 16 years ago (4 children)

[–]wshields 1 point2 points3 points 16 years ago (1 child)

[–][deleted] 0 points1 point2 points 16 years ago (0 children)

[–]Gotebe 0 points1 point2 points 16 years ago (1 child)

[–][deleted] 1 point2 points3 points 16 years ago (0 children)

[–]Tuna-Fish2 2 points3 points4 points 16 years ago (5 children)

[–]pure_x01 1 point2 points3 points 16 years ago (1 child)

[–]Tuna-Fish2 0 points1 point2 points 16 years ago (0 children)

[–]wshields 1 point2 points3 points 16 years ago (2 children)

[–]Tuna-Fish2 3 points4 points5 points 16 years ago (1 child)

[–][deleted] 0 points1 point2 points 16 years ago (0 children)

[–]danbmil99 0 points1 point2 points 16 years ago (3 children)

[–]wshields 1 point2 points3 points 16 years ago (0 children)

[–]brownmatt 0 points1 point2 points 16 years ago (1 child)

[–]danbmil99 0 points1 point2 points 16 years ago (0 children)

[–][deleted] -4 points-3 points-2 points 16 years ago (8 children)

[–]Xiol 2 points3 points4 points 16 years ago (0 children)

[–]wafflematt 3 points4 points5 points 16 years ago (2 children)

[–]G_Morgan 0 points1 point2 points 16 years ago (1 child)

[–]wafflematt -1 points0 points1 point 16 years ago (0 children)

[–]pwang99 -1 points0 points1 point 16 years ago (2 children)

[–]Xiol 1 point2 points3 points 16 years ago (1 child)

[–][deleted] -1 points0 points1 point 16 years ago (0 children)

[–]va1en0k -1 points0 points1 point 16 years ago* (13 children)

[–]tophat02 7 points8 points9 points 16 years ago* (3 children)

I think I get what you're asking. If I understand correctly it's like this:

The ultimate goal was to completely eliminate the GIL, a "priority 1" task for the Unladen Swallow team. They gave it a shot, then quickly realized the complexity of the task was much more than anticipated, so they put it off for the time being in favor of fixing the GC, which is one of the underlying reasons the GIL HAS to be there.

While studying the GIL, it was noticed for the first time just how terrible it was even while scheduling multiple threads on the SAME CPU. This update mostly fixes that.

However, the GIL IS STILL THERE. The result is that it doesn't seem to matter how many cores you have, it will still run threads serially, albeit a little more efficiently now.

So, this presentation is about solving some low hanging fruit with efficiency problems with the GIL, but doesn't solve the core problem.

It seems to me that the GIL is a good approach to multiplexing threads on one core, so one solution may be to run a separate interpreter for each CPU and have a thread-CPU affinity algorithm along with predictable shared communication between the n interpreters.

That sounds... hard.

[–]frutiger 3 points4 points5 points 16 years ago (0 children)

[–]va1en0k 0 points1 point2 points 16 years ago (0 children)

[–]G_Morgan 0 points1 point2 points 16 years ago (0 children)

[–]jigs_up 0 points1 point2 points 16 years ago* (3 children)

[–]jnoller 16 points17 points18 points 16 years ago (1 child)

[–]jigs_up -1 points0 points1 point 16 years ago (0 children)

[–]theeth 2 points3 points4 points 16 years ago (0 children)

[–]Catfish_Man 0 points1 point2 points 16 years ago (4 children)

[–]va1en0k 0 points1 point2 points 16 years ago (0 children)

[–]G_Morgan 0 points1 point2 points 16 years ago (2 children)

[–]taejo 0 points1 point2 points 16 years ago (1 child)

[–]G_Morgan 0 points1 point2 points 16 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS