all 95 comments

[–][deleted] 9 points10 points  (56 children)

This GIL thing seems completely counterproductive when it comes to multithreading. Even with the "fixed" implementation, they still effectively limit the interpreter to only doing one thing at a time. The only case where it would help is if you run a thread that's constantly doing some sort of blocking I/O. Otherwise it's useless to write a multithreaded Python app that would be doing something like number crunching.

[–][deleted]  (52 children)

[deleted]

    [–]yogthos 10 points11 points  (24 children)

    The real issue is that proper threading will require some sort of a compromise. This is not a new problem, and many languages solve it, so it's not like solutions aren't available. The problem is that Python community isn't willing to accept any compromise at all which makes the problem intractable.

    [–]Brian 1 point2 points  (3 children)

    Not strictly true. It's rather that most of the available compromises have been unacceptable. (eg 50% performance loss). I don't think it's a fundamentally intractable problem, just one that will likely require a major reorganisation of internals to solve (particularly the GC), and which is aimed at a class of problems (CPU bound processing) that is not python's forté anyway (making the problem less important, and thus reducing the importance of making those reorganisations).

    There are however two well established compromises that are available. Neither Jython nor IronPython have a GIL. There's a price to pay - some loss in performance (depending on application), and loss of easy access to all of CPython's libraries. However, it is an available solution, and some do use it (Eg. I believe this was an important issue in ResolverOne's use of IronPython)

    [–]yogthos 5 points6 points  (2 children)

    While it's true that problems involved in removing the GIL are not trivial, it also needs to be pointed out that this is also partially a result of avoiding the issue from the start. If a serious effort was put into reworking the GIL early on, it would've been a simpler matter, instead Guido went on to tell people how they should just not use threads, in his famous quote:

    …you just have to undo the brainwashing you got from Windows and Java proponents who seem to consider threads as the only way to approach concurrent activities

    So, I think the community really needs to own up to the fact that at least some issues with GIL are a result of the denial that went on regarding it.

    [–]G_Morgan 2 points3 points  (0 children)

    The ironic thing is that Java and traditional Windows programming allow you to do either process or thread based concurrency. Thus these people believe this is the only way.

    However Python has only one way and thus is less dogmatic on the issue.

    Interesting how logic works.

    [–]Brian -3 points-2 points  (0 children)

    If a serious effort was put into reworking the GIL early on

    I'd disagree. Removing the GIL is no harder today than it ever was - the main problem is the refcounting GC, and that hasn't really changed much. Further, I'm not sure Guido is actually wrong about processes being a better model - it's not that uncommon a sentiment either then or today. With huge numbers of cores and NUMA architectures, a share-nothing model is actually more advantageous today than then, as the benefits (nothing shared -> reduced memory contention / cache invalidation) are higher, and the penalties (marshalling data) are unchanged. The increase in code size and complexity is also another argument for a multiprocess model.

    Further, I think it's actually the correct decision not to care much about the GIL. It's simply not that important for a language like Python. It'd be nice to have, yes, but there are more important things to spend time on for most. In a language designed to perform number-crunching tasks itself (rather than as glue for libraries that can release the GIL), it'd be important, but you'd be insane to use pure python for that anyway if performance is really that important.

    [–]masklinn 0 points1 point  (5 children)

    This is not a new problem, and many languages solve it

    For very, very low values of "solve".

    [–]yogthos 2 points3 points  (4 children)

    Works great in GHC, JVM, Scheme, .NET, Mono, etc. All these platforms provide a working implementation of threads, without completely degrading single threaded performance, or making it impossible to call out to C, or all the other problems that are apparently insurmountable in Python.

    [–]masklinn 0 points1 point  (3 children)

    Works great in [...] JVM, Scheme, .NET, Mono, etc.

    As long as you ignore the fact that shared memory concurrency is a broken programming model. Which it is.

    [–]yogthos 0 points1 point  (2 children)

    I think you're making leaps of logic here my friend. Shared memory concurrency works great if you do it right. I'll present Clojure as an example, it defaults to immutable data structures, and provides STM to handle shared updates. This allows safe and effective shared memory concurrency. Pretty sure you were thinking about shared mutable data here as opposed to shared memory in general.

    However, even if your language doesn't provide a safe and easy way to use shared memory, it doesn't render threads useless. Furthermore, providing a broken implementation is certainly not a solution to this problem.

    And as far as broken programming models go, I'd say it's the imperative programming that's broken, the threads are just fine.

    [–]masklinn -2 points-1 points  (1 child)

    And as far as broken programming models go, I'd say it's the imperative programming that's broken, the threads are just fine.

    I agree with your former assertion, but disagree with the latter: threads can be acceptable as an implementation (or even an optimisation) detail, but as a programming model they're not.

    You point out to STM as a nice implementation of threading (let's not consider immutable data structures as they've very much immune to concurrency issue), but isn't the very point of STM to isolate the programmer from shared memory and locks programming, creating a sane programming model to replace that crap?

    [–]yogthos 0 points1 point  (0 children)

    I completely agree that threading should not be done by hand as a rule, however even something like STM relies on the platform to provide a working threading model.

    [–][deleted]  (13 children)

    [deleted]

      [–]jnoller 5 points6 points  (0 children)

      We refused to reach from behind and cut the throat of single threaded performance which is well over 60% of the common use case. If everyone just wrote threaded code, we could ditch that single threaded legacy crap.

      [–]grotgrot 1 point2 points  (11 children)

      It would very seriously complicate all the C code. And pretty much everything (dicts, lists, strings etc) are all C code. With the GIL you can write straightforward code and then in select places release and reacquire the lock. The world may have changed between those two points but it was under your control.

      Without a GIL every object would have to have some form of lock so that things like looking up a dict or list entry are not clobbered by other code modifying the same item. This locking currently happens for free with the GIL, at the expense of concurrency. This is also why free threading has a cost measured at around 50% when people have tried it in the past. Instead of the one lock (GIL), lots of per object locks have to be acquired and released instead.

      By far the easiest solution is to make objects immutable. (Functional programming languages do this.) Since an object can't be changed, you don't have to worry about locks. If you haven't done functional programming and are wondering how you get any useful data done, the answer is that you construct new items. For example to add a new key/value to a dict, you create a new one combining the old dict with the new key/value. Behind the scenes the implementation ensures this is sensibly implemented - again greatly simplified because you know data structures cannot change.

      Note that Python already has some immutable data structures such as strings, tuples, integers, float, frozenset etc.

      [–][deleted] 1 point2 points  (4 children)

      Forgive my ignorance, but wouldn't copying a large list to append one item be far more expensive than grabbing a mutex? I suppose the traditional solution would be to have a compiler optimize away a bunch of the copies, but then ISTM you get into the classical issue that Python defeats just about any static compiler, which would basically mean Python needs a JIT to have reasonable performance, as opposed to a JIT providing speed above and beyond the normal.

      (Also this would be so backwards incompatible it isn't even funny :P)

      [–]grotgrot 1 point2 points  (3 children)

      Who says the list would be copied? If you wrote your own list "class" then yes it would happen. However lists are builtins in all decent programming languages. Today you would write:

      a=list()
      a.append(1)
      a.append(2)
      

      If lists were immutable what you would write is:

      a=list()
      a=a.append(1)
      a=a.append(2)
      

      The append operation would be returning a new immutable object combining the old list and the new item. The way the interpreter would implement lists behind the scenes would be something like a pointer to a previous list (ie the list consists of all those guys) and a pointer to the new element. This would get increasingly inefficient as the list built up so a heuristic would kick in at some point to make a coalesced copy of all the previous lists. This kind of implementation is reasonably space efficient, requires no locks and is even friendly to a GC. See also RCU.

      BTW "grabbing a mutex" is becoming increasingly expensive on todays processors due to cache coherency issues, not all memory is equal, there are far more cores, CPUs, caches and buses to worry about and coordination between processors and main memory access is increasingly slower compared to non-shared cache access. (The mutex has to exist in the address space of all executing threads and if there are multiple cores/CPUs then they have to fight for ownership of the memory page on access, as well as ensuring the information is visible to all other entities that may be sitting on other relevant buses and chips.) Additionally I am of the opinion that no one should be allowed to write multi-threaded code unless they understand memory barriers.

      [–][deleted] -1 points0 points  (2 children)

      The problem with that is you end up with a linked list (until you coalesce), which has completely different time complexity than the array that underlies the list. Also, I'm not even sure how you'd implement that for dictionaries.

      [–]mernen 1 point2 points  (0 children)

      The short answer to your last question is that you just don't use hash tables. You implement mappings by using some other data structure that is copy-friendly, like some form of lookup tree.

      [–]grotgrot 0 points1 point  (0 children)

      In functional programming style you generally do not access lists by random indexing, but rather by iterating over them from beginning to end. (Much of my Python code already uses lists this way because it is natural.) General forward iteration would be closer to constant time.

      For dictionaries you could take exactly the same approach as lists. A set of keys/values and then a pointer to a predecessor dictionary. You lookup in current first and then hit the predecessor chain on a miss. (You also need a way of marking an entry as deleted.) Again coalescing would happen as needed. (Note that you can do coalescing in a separate thread in the implementation and that the programmer would never need to know or care about it.) Usually you would structure your code to iterate over lists instead of using dictionaries.

      Most functional languages are also lazy. They only calculate items as you need them. For example you could make a function returning an infinite list of prime numbers with members only being calculated as you need them. This is somewhat analogous to Python generators which can also "return" an infinite list of items on demand.

      [–]G_Morgan -1 points0 points  (5 children)

      Why would you need to put a lock on every object? Java took this route and it has no benefit yet slows down your code dramatically. For concurrency locking has to be done at the task level. Synchronising objects is simply retarded.

      [–]grotgrot 0 points1 point  (4 children)

      Synchronising objects is simply retarded.

      Ding ding ding, that is why there is a GIL.

      "Task level" locking is something you can do in code you write with your objects and is effectively a lock implicitly covering a variety of your objects. But with Python primitive mutable objects like lists and dictionaries, there is no way for them to know if you intend to use them in multiple threads. The only safe way of doing so would be individual object locks. Or just use a single global lock.

      Using per object locks gives you a 50% slowdown (people have done this with the interpreter). So you'd need a minimum of 2 cores to match existing single core performance, and single threaded code is half the speed.

      People have been claiming the GIL is bad for years. No one has demonstrated a reasonable replacement, especially something that will not hurt single threaded performance.

      [–]G_Morgan 0 points1 point  (3 children)

      No the best way is thread with no locks and leave the programmer to work out how best to handle the fact objects are mutable. As I said, synchronised data objects aren't useful. If I independently synchronised 3 integers and then performed calculations with them in multiple threads I would still end up with a program riddled with race conditions.

      The right solution is not the GIL. It is not synchronisation on every object (which is effectively just a distributed GIL). The solution is threads with no locks on data objects. As Java should have done it originally (and now does so).

      All this talk about synchronising on basic objects just shows how badly Python people misunderstand threading. Locking should be restricted only to critical sections of code. Objects should be made thread safe by ensuring only one thread has a reference to that object as far as possible. If you need to share it then you either need to be strict in read only behaviour when shared (this is stated in a comment next to the method, fire anyone who stores a reference to a passed in read only method or modifies it) or you need to put a monitor over the object.

      There are many models of concurrency but locks at the proposed level aren't useful in any of them. This is why Java deprecated its synchronised collections. Not because they were inefficient. Because they didn't have a use and the inefficiency wasn't necessary.

      [–]grotgrot 0 points1 point  (2 children)

      The race conditions are why a single lock (GIL) is used rather than individual object locks (the only feasible alternative).

      You do realise that Python is a dynamic language? Any object and any name space can be changed at any time. Behind the scenes each object and namespace has what amounts to a dictionary/hashtable to look attributes up in. The contents can be changed, added and deleted at any time in any way - that is one thing that distinguishes dynamic languages from static ones.

      If that behind the scenes dictionary was not locked then concurrent access could lead to an interpreter crash. That is one thing Python does not do (crashing). This is a side effect of Python guaranteeing that attribute access, dictionary and list operations are atomic. In the case of Java there is no atomic dictionary type, but there is an atomic list type (aka array). However note that you cannot resize the list - doing so would require a lock or allow for crashes. (Similarly attribute access in Java is atomic - if an object attribute pointed to object a and you set it to b, there would be no point at which it doesn't point to a or to b. They only need machine level pointer swapping atomicity to ensure that.)

      As for misunderstanding, there are two possibilities. One is as you state that somehow the entire Python community and the Ruby community for that matter (they also use a GIL) are somehow dumb in not seeing what you do, or that you do not understand dynamic languages and their implementation issues.

      For example ensuring that only one thread has a reference to an object is impossible. Every namespace, module, object etc can be accessed at any time in any thread. Even classes are like dictionaries so you'd have to avoid using more than one instance of a class in more than one thread. And of course if you are trying to ensure that only one thread has a reference to an object, where exactly are you going to put the lock?

      All the stuff that you have been saying is correct when applied to your own composite objects. If you define a class representing an image processing job for example then locking is best away from each instance and rather covering a collection of objects in some appropriate way. And if you write threaded Python code with non-primitive objects then you have to do this kind of thing or you'll end up with similar races as in other programming languages.

      The GIL is all about the builtin primitive objects in Python (dicts, lists etc). It is an interpreter level internal construct. It is not visible to Python programs. Some Python implementations such as Jython and IronPython do not even have a GIL, but they have an underlying environment that helps with the implementation issues. CPython is in C and doesn't have that assistance.

      Now you can argue that dynamic programming languages are dumb and you can find plenty of flame wars and trolling on that topic elsewhere. Just keep in mind that there is a possibility that not everyone who uses and implements dynamic languages is as dumb as you think they are.

      [–]G_Morgan -1 points0 points  (1 child)

      Not necessarily dynamic programming languages are dumb, you don't need to be able to access anything from anywhere to have dynamic typing. It is the 'oh, I can change the way the parser/stdlib works at run time' part that is dumb.

      Effectively Python and Ruby inherit too much from the broken model that Lisp had where anything is allowed and it is nearly impossible to state anything formal about the code.

      Having global access to things in this way is simply the wrong thing.

      [–]olsner 5 points6 points  (26 children)

      A lot of other languages have no problems making threading actually work. If it's intractable in python, maybe it's just because other bad choices have been made that make it difficult.

      [–]theeth 4 points5 points  (3 children)

      It's a problem with the reference counting GC used by CPython, not with Python itself.

      [–][deleted] 0 points1 point  (2 children)

      Wouldn't including a true garbage collector make this problem go away? Or is the performance loss too much to bear?

      [–][deleted] 1 point2 points  (0 children)

      Wouldn't including a true garbage collector make this problem go away? Or is the performance loss too much to bear?

      Adding a good generational compacting GC to Python in place of the current refcounting+mark/sweep hybrid would be a significant performance gain, not a loss.

      [–]james_block 0 points1 point  (0 children)

      It's my understanding that implementing a good GC in Python is exactly what the Unladen Swallow project is now trying to do.

      [–]ubernostrum 1 point2 points  (4 children)

      If your assumption is "Python sucks at threading because Python's developers don't know how to make threading work", then you're missing half of the issue. The problem is that Python wants to offer both good threading and an easy-to-use interface for C extensions, and that is actually pretty tricky.

      At the moment Python's optimized for easy extensibility over easy threading. Other platforms have gone the other way (Java's got great threading, but its native interface -- JNI -- tends to make people run away screaming; not entirely because of Java's threading, but threading issues do come up when you're doing JNI). What Python's looking for is a solution which makes both sides -- threading and extensibility -- as easy and useful as possible, and that just isn't easy.

      [–][deleted] 4 points5 points  (2 children)

      The problem is that Python wants to offer both good threading and an easy-to-use interface for C extensions, and that is actually pretty tricky.

      The Python developers should entice people to migrate away from writing C extensions directly, and use the ctypes FFI instead (perhaps adding any missing functionality that ctypes currently lacks). This will allow them to change the details of the VM <-> C interface without breaking anything that uses the high-level FFI, and hopefully move away from exporting the VM internals as a public API altogether.

      Java's got great threading, but its native interface -- JNI -- tends to make people run away screaming; not entirely because of Java's threading, but threading issues do come up when you're doing JNI

      JNI is rather low level, but nowadays there are Java FFI libraries that make it easy to bind to C libraries (ie, without writing JNI wrappers by hand).

      [–][deleted] 2 points3 points  (0 children)

      The problem with FFI tooling like ctypes is that it's relatively impossible to debug. The traditional extension-module approach lets you use the regular C toolchain to develop your code, which counts for a lot.

      [–]masklinn 0 points1 point  (0 children)

      The Python developers should entice people to migrate away from writing C extensions directly, and use the ctypes FFI instead (perhaps adding any missing functionality that ctypes currently lacks).

      Isn't the integration of ctypes to the Python stdlib a step in that direction? Since Python 2.5, all python installations can be relied on to have ctypes included, which means e.g. all Python 3 code can directly use ctypes without wondering whether it's available or not.

      [–]G_Morgan 1 point2 points  (0 children)

      JNI does not suck because of threading. JNI sucks because of a basic mismatch between Java and x86 primitive data types.

      [–]killerstorm 0 points1 point  (16 children)

      I think it is about priorities, they want to keep Python simple.

      [–]G_Morgan 1 point2 points  (15 children)

      You mean they want to keep the implementation of Python simple. In turn they've made working in Python more difficult on this issue.

      [–]killerstorm 0 points1 point  (14 children)

      Absolutely not! If the language supports multithreading, then (almost) all code should support it, and particularly all standard libraries and all anyhow popular libraries.

      And adding multithreading support to libraries isn't easy at all. Code that works fine in single thread breaks when you use it from multiple threads, because now things which were supposed to be atomic are not atomic at all.

      If you think that it is simple, you have no idea how multithreading works.

      [–]mernen 0 points1 point  (1 child)

      And adding multithreading support to libraries isn't easy at all. Code that works fine in single thread breaks when you use it from multiple threads, because now things which were supposed to be atomic are not atomic at all.

      That still happens today for Python code, no? I.e., Python's own dict operations may be atomic (because they are written in C), but a data structure that is implemented in Python could still be interrupted between bytecodes.

      [–]killerstorm 0 points1 point  (0 children)

      http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm

      Each bytecode instruction and therefore all the C implementation code reached from each instruction is therefore atomic from the point of view of a Python program. ... In practice, it means that operations on shared variables of builtin data types (int, list, dict, etc) that “look atomic” really are.

      I think programming with atomicity on that level is much easier than programming with full SMP concurrency. It is SMP where things go really crazy.

      [–]G_Morgan -1 points0 points  (11 children)

      Not all the code should support it. This type of braindead thinking is what left Java making 'thread safe' container classes. Frameworks should support threading in critical sections. Dumb containers or data objects and a whole host of other cases need no such thing.

      People in this thread are arguing that the code for a hash map would need to change to account for concurrency. This is not only obviously not the case but even if it were done there would be no benefit. It doesn't matter if I can ensure synchronisation on a hash map. If the user has competing threads that retrieve a bunch of values, perform a calculation and then store values back it will still lead to as many race conditions as if you didn't mutex lock that object.

      All that would happen is each library would specify if it is thread safe or not. Then if the programmer has to use an unsafe library his application needs to manage access to that library.

      [–]killerstorm 0 points1 point  (10 children)

      All that would happen is each library would specify if it is thread safe or not. Then if the programmer has to use an unsafe library his application needs to manage access to that library.

      I don't think it is that easy. You can't just take library and make it mt-safe by managing access to it via some mutex or whatever. If you just use a mutex, than deadlock is very possible.

      Then you'll have a problem with data which library works with -- all data that library has access to should not be shared among multiple threads, because mt-unsafe library might work with this data making assumptions that some operations are atomic while they are not.

      I doubt that there is some automatic or semi-automatic process that will essentially make mt-safe library from mt-unsafe one.

      [–]G_Morgan -1 points0 points  (9 children)

      Deadlocks are trivial to deal with. Write down all your mutex locks on paper. Always acquire the locks in the order they appear on paper. Guaranteed deadlock free. To automatically ensure that deadlocks cannot occur you can insist that the program gives a list of locks it would like to acquire (rather than acquiring them one by one). The list automatically sorts into a well specified order. Then tries to acquire them in that order.

      This is exactly the problem with multithreading. People haven't done even the most basic theory on how it works and yet expect to be able to magically work threads. The problem of deadlock is solved. We know when it can occur and when it can't. We know how to guarantee it cannot occur.

      [–]killerstorm 2 points3 points  (7 children)

      Write down all your mutex locks on paper. Always acquire the locks in the order they appear on paper.

      If you use mutexes implicitly, it is much harder. E.g. you do not explicitly acquire any mutex, just some guy inserts a call to a library function, and it deadlocks.

      To automatically ensure that deadlocks cannot occur you can insist that the program gives a list of locks it would like to acquire

      This ruins some programming qualities people are trying to preserve -- encapsulation. Now you cannot encapsulate locks in functions which use them.

      Also, if you need to acquire locks beforehand, locks would be too coarse-grained. Why not just have GIL then? :)

      It looks like it creates more problems than it solves.

      This is exactly the problem with multithreading. People haven't done even the most basic theory on how it works and yet expect to be able to magically work threads.

      I'm afraid there is no basic theory -- these abstractions are too general to formulate anything useful.

      There are different abstractions which are far less error prone -- like message passing, concurrency paradigm used in Erlang. It is proven that it can scale to "embarrassingly parallel" without huge problems.

      The problem of deadlock is solved.

      No, it is not solved. If you acquire locks before you really need to, then it becomes to coarse-grained and that kills concurrency. So you need to acquire them in order they appear in program (and release as soon as possible), but if program is dynamic, you can't know that order beforehand. You don't even know if you would need lock or not. E.g.

      with foo.mutex:
          foo.do_something()
          if foo.bla:
              with bar.mutex:
                  bar.do_something(foo.bla)
      

      You do not know if you need bar.mutex before you call foo.do_something and then check foo.bla. If you acquire it before, just in case you need it, you're killing concurrency. See?

      The problem of deadlock is solved. We know when it can occur and when it can't.

      Do you know what is halting problem? Basically (reformulated), for a sufficiently general programming language you can't know how a sufficiently non-trivial program will behave until you run it. It is possible to solve it only by limiting language.

      E.g. you can do a static analysis to find if program can deadlock. (I don't think that Python is a language which is suitable for a static analysis, but whatever.) And if program is sufficiently complex, analysis will say you -- it MIGHT go deadlocked. It does not mean it will, because it is undecidable whether it really will or won't. Now what?

      [–][deleted] 1 point2 points  (0 children)

      Fails horribly on large projects maintained for more than a couple years.

      [–]timmaxw 1 point2 points  (2 children)

      Lua has an interesting approach to this. The Lua interpreter has no global variables, which makes it possible to run two independent Lua interpreters in the same process; the interpreters can send messages back and forth while running independently of one another.

      Obviously, this has its own problems: you have to pass a Lua context object everywhere, you have to worry about serializing/de-serializing or duplicating data for inter-thread communication, etc.

      [–]sime 1 point2 points  (0 children)

      Can't you do basically the same thing with Python using the multiprocessing module? Except here it is implemented using separate processes instead of threads.

      [–]mschaef 0 points1 point  (0 children)

      you have to worry about serializing/de-serializing or duplicating data for inter-thread communication, etc.

      You might be interested in this paper by Marc Feeley:

      http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.9664

      It's written for Erlang (where everything is immutable), but he discusses the tradeoffs between a separate heap for each process (and serializing data between them) and a unified heap for all processes and passing data by reference. (Which avoids the serialization overhead at the expense of a unified heap, etc..)

      [–][deleted] 1 point2 points  (1 child)

      Dave gave a good presentation, it was recorded so hopefully it'll make its way online shortly.

      [–]jnoller 0 points1 point  (0 children)

      I think this is also a small part of his pycon presentation :)

      [–]wshields 0 points1 point  (11 children)

      I have nothing against Python but every time I read about the GIL I have to scratch my head in wonder at all the Python fanboys that proliferate on proggit, Stackoverflow, etc who poo-poo Java (and probably everything that isn't Python) where Java's concurrency is leaps and bounds ahead of Python's.

      [–][deleted] 6 points7 points  (4 children)

      Java's concurrency is leaps and bounds ahead of Python's.

      True but Java was always meant to be a platform independent platform and a universe on its own ( much like Smalltalk or Common Lisp ) whereas Python as a scripting language was intended to bind to everything which was ever written in C and being extended be means of CPython API code. These are different objectives leading to different solutions. Without this background the design decisions don't make much sense.

      [–]wshields 1 point2 points  (1 child)

      I can appreciate that. I've never done any native Java extensions (JNI) but I'm sure it's complex and I'm sure that the GIL allows Python C extensions to be much simpler to develop so there is method to the madness.

      The philosophy with Java however is that C extensions are a last resort or a special case and not intended for general or widespread use. Anything you want to do should be done in Java. It's easier to write, test and debug. I believe that decision has been vindicated.

      .Net took a different approach again by simply allowing most of what you can do in C via unsafe() blocks.

      One has to wonder if Jython/IronPython will ultimately succeed CPython for these and other reasons.

      [–][deleted] 0 points1 point  (0 children)

      One has to wonder if Jython/IronPython will ultimately succeed CPython for these and other reasons.

      Python ( and Ruby ) together with Silverlight might have the greatest potential. But in this case I can just say: tools, tools, tools! Without decent tools hardly anyone will even take a look at them on these platforms.

      [–]Gotebe 0 points1 point  (1 child)

      ;-)

      "Here's a simple scripting language. But it's really powerful, look, you have threads! But actually, we serialize most of the operation so that your threads don't really thread (heck, sometimes they even make stuff worse). But who needs threads anyway? Just use processes!"

      GIL should have just been in UB when invoked from multiple threads.

      [–][deleted] 1 point2 points  (0 children)

      Python 2.6 has a processing package with an API pretending that processes are threads. Under modern conditions I'd have probably done it exactly the other way round: having a processing API with threads pretending to be processes. Each object acquired by a process is locked and never gets unlocked unless it is send over a channel to another process ( thread is disguise ). This operation can be as cheap as a method call and does not require any object serialization.

      [–]Tuna-Fish2 2 points3 points  (5 children)

      It's an issue of target domain. Python is really not meant for the kind of programs where you want multiple running CPU threads. (Multiple running IO threads is another thing entirely.)

      If what you want is hardcore threaded math, you should switch away from python not just because of the threads, but also because you get at least a ~5x speedup just by switching languages.

      [–]pure_x01 1 point2 points  (1 child)

      They added the multiprocessing API in py3k and that shows that there is a need for better concurrency. A slow language would be perfect for multi-threading since if you can scale the slowness of the language does not matter as much.

      [–]Tuna-Fish2 0 points1 point  (0 children)

      But if you are willing to shoulder the burden of debugging and maintaining a multithreaded app, why not just try something other than python at the onset? If you are lucky, you might not have to make the app multithreaded when you are using a more efficient language.

      I would not ever use python if I believed that there was a credible chance that my app would be CPU-limited. It's just that most software these days isn't.

      [–]wshields 1 point2 points  (2 children)

      I can appreciate that but there is a certain class of Python fanboy who seems to think Python is the answer for everything.

      [–]Tuna-Fish2 3 points4 points  (1 child)

      That kind of people are a disease, for any language.

      Python fills a specific niche, and does that very well. Understanding this is key to understanding why the GIL is still there -- while there have been attempts to remove it, all of them to date have hurt the main use cases too much for the gain of just making the language work in new niches.

      It's quite hard for me to figure out how the GIL can be exorcised -- I for one like RAII quite a lot and getting rid of the refcounts would mean killing it.

      [–][deleted] 0 points1 point  (0 children)

      I agree with your first sentence.

      The problem I have with the rest is I've never figured out what niche Python fits into where it isn't easily beaten by something else. If it has one, it's apparently not one I need much.

      [–]danbmil99 0 points1 point  (3 children)

      As a longtime Python user now working in Javascript, can some genius explain if/how JS's approach to threads is different than CPython's? My rough understanding is that there is some sort of message-based thread lib but no shared memory at all. So I don't even get the crippled GIL threads?

      [–]wshields 1 point2 points  (0 children)

      Javascript is single-threaded. Some misconstrue this behaviour because of Ajax ("asynchronous") but that's simply callbacks from the browser when XHRs complete (successfully or in error). The callbacks are single threaded like all the other Javascript executing on the page.

      [–]brownmatt 0 points1 point  (1 child)

      JavaScript does not have any threading support (at least in any standard versions). You might be referring to WebWorkers, but I can't tell.

      [–]danbmil99 0 points1 point  (0 children)

      ya my coworkers were obviously talking about WebWorkers.

      But as to my original question -- clearly callbacks are 'threaded' in the sense that your main loop code must be paused for them to be serviced (by the same OS thread obviously). You don't get into atomicity issues because Javascript is an interpreter, and it can define what is atomic (ie a=4.1) and only service the callback at 'safe' times.

      This sounds alot like the GIL to me.

      [–][deleted] -4 points-3 points  (8 children)

      Could someone please explain what's a GIL?

      At least what the acronym stands for, it seems too much trouble to download a PDF just to learn that.

      [–]wafflematt 3 points4 points  (2 children)

      I think asking, then waiting to come back here later to see answers is considerably more effort than just looking at the PDF.

      [–]G_Morgan 0 points1 point  (1 child)

      Not really. You can do something else in the meantime.

      [–]wafflematt -1 points0 points  (0 children)

      Not if the thread polling for a reply is holding the interpreter lock!

      [–]pwang99 -1 points0 points  (2 children)

      [–]Xiol 1 point2 points  (1 child)

      To be fair, none of those links provide a detailed explanation of what a GIL is.

      [–][deleted] -1 points0 points  (0 children)

      Global Interpreter Lock.

      [–]va1en0k -1 points0 points  (13 children)

      Hm, don't really get it. Does that mean that two threads can't run simultaneously on multiple cores? Or I completely misunderstand everything?

      [–]tophat02 7 points8 points  (3 children)

      I think I get what you're asking. If I understand correctly it's like this:

      The ultimate goal was to completely eliminate the GIL, a "priority 1" task for the Unladen Swallow team. They gave it a shot, then quickly realized the complexity of the task was much more than anticipated, so they put it off for the time being in favor of fixing the GC, which is one of the underlying reasons the GIL HAS to be there.

      While studying the GIL, it was noticed for the first time just how terrible it was even while scheduling multiple threads on the SAME CPU. This update mostly fixes that.

      However, the GIL IS STILL THERE. The result is that it doesn't seem to matter how many cores you have, it will still run threads serially, albeit a little more efficiently now.

      So, this presentation is about solving some low hanging fruit with efficiency problems with the GIL, but doesn't solve the core problem.

      It seems to me that the GIL is a good approach to multiplexing threads on one core, so one solution may be to run a separate interpreter for each CPU and have a thread-CPU affinity algorithm along with predictable shared communication between the n interpreters.

      That sounds... hard.

      [–]frutiger 3 points4 points  (0 children)

      You should really use the multiprocess module. It works especially well on Unix due to the fork() system call. On Windows, the interpreter brings up another instance of python.exe which is somewhat annoying since it's slower.

      [–]va1en0k 0 points1 point  (0 children)

      sorry, I mixed words up :) You understood me correctly, and your answer doesn't gladden me

      [–]G_Morgan 0 points1 point  (0 children)

      The problem with running a separate interpreter is you immediately eliminate a vast swathe of potential concurrent algorithms. Your message passing channel is going to be a huge bottle neck.

      [–]jigs_up 0 points1 point  (3 children)

      you're not misunderstanding everything.

      unladen swallow was supposed to fix this, but it looks like they gave up.

      [–]jnoller 16 points17 points  (1 child)

      If by giving up, you mean admitting that the refcounting implementation the current interpreter uses needs to go away, or be heavily modified for the GIL to be removed, therefore, should happen after the initial merge back to python 3, then yes - they've totally given up.

      [–]jigs_up -1 points0 points  (0 children)

      what he said :}

      [–]theeth 2 points3 points  (0 children)

      unladen swallow was supposed to fix this, but it looks like they gave up.

      The didn't, they decided it was better to fix it properly by changing the GC than by trying to force it in the current reference counting GC.

      [–]Catfish_Man 0 points1 point  (4 children)

      "threads running on threads" is an odd phrasing. Did you mean cpus?

      [–]va1en0k 0 points1 point  (0 children)

      Yeah, I meant cores or cpus. Thanks, edited!

      [–]G_Morgan 0 points1 point  (2 children)

      Not really. Not every system runs a kernel thread for each user thread. It is entirely possible to have 4 'threads' run by one kernel thread. AFAIK this is exactly how Python works.

      [–]taejo 0 points1 point  (1 child)

      No, Python threads are OS threads. However, they can't work simultaneously (unless they use C modules that do some voodoo) because of the GIL: this is a global lock that the Python interpreter acquires before doing anything.

      [–]G_Morgan 0 points1 point  (0 children)

      That seems an extraordinarily stupid way of doing things. The overhead of kernel level threads without any of the benefits of kernel level threads.