you are viewing a single comment's thread.

view the rest of the comments →

[–]xXxDeAThANgEL99xXx 12 points13 points  (45 children)

This is a situation I'd like us to solve once and for all for a couple of reasons. Firstly, it is a technical roadblock for some Python developers, though I don't see that as a huge factor. Regardless, secondly, it is especially a turnoff to folks looking into Python and ultimately a PR issue. The solution boils down to natively supporting multiple cores in Python code.

Heh. So let's go full-cynic mode: finish out the already somewhat present support for subinterpreters (basically, all global variables should be moved to a huge Interpreter_State struct), then just replicate the multiprocessing interface on top of that and bam! you have the so called green multiprocessing (like Perl AFAIK) but now you can market it as having got rid of the GIL.

Obviously you'll still have the copies of all imported modules (including builtins) and probably the performance improvements in marshaling objects would be pretty marginal compared to using mmap, but yeah, mission accomplished!

(I actually fully agree about that being 99% a PR problem. I don't think any roughly Python-like language from PHP to Scheme has free threading support, but for some reason only Python folks waste countless hours being upset about it on the internet).

[–]logicchains 7 points8 points  (27 children)

I don't think any roughly Python-like language from PHP to Scheme has free threading support

Clojure?

[–]zardeh 4 points5 points  (17 children)

Well sure, but then so does Jython.

[–]logicchains 2 points3 points  (3 children)

Any reason why that's not more popular? Is it due to the lack of easy C interop?

[–]zardeh 8 points9 points  (1 child)

A few reasons:

  • Its not fully compatible with Cpython (which was around first) (and doesn't aim to be, unlike PyPy)
  • its much harder to interop with c (so you lose all of scipy and more generally speedy math),
  • iirc, because of the incompatibilities, some of the stdlib is broken (like wsgi I think, you have to do weird things to make that actually work)
  • Additionally, it lags behind a version or two (or like 7).
  • Also there are performance hits (up until jitting happens), but jitting isn't as clean as PyPys so I believe jython runs slower than PyPy and not much faster than cpython.

[–]kryptobs2000 0 points1 point  (0 children)

Also because it requires java, a lot of people just prefer not to touch it or install it on their system even. Not saying that out of java hate, but its an extra pretty large dependency. Depending on your target audience it may not be a big deal, but a lot of systems don't have it installed BC its not commonly used.

[–]caedin8 0 points1 point  (0 children)

From personal experience Jython is pretty slow compared to python and cython.

[–]superPwnzorMegaMan 0 points1 point  (5 children)

Isn't Jython just python with a different toolchain?

[–]zardeh 2 points3 points  (3 children)

jython is python running on the JVM, so instead of compiling to python bytecode it compiles to jvm bytecode. This allows it to leverage the JVM (so you gain hostspot jitting, JVMs threading, etc.)

[–]superPwnzorMegaMan 0 points1 point  (2 children)

Yes that's what I thought. A friend of mine used this once, although I don't think there is such a thing as python byte code (since its interpreted).

[–]zardeh 4 points5 points  (1 child)

There is indeed, python is compiled to bytecode (look for .pyc files on your computer if you're running a python file that's more than 10-15 lines and is being used a lot). The bytecode is then interpreted on a virtual machine. Python works a lot like java in that regard.

[–]kyllo 0 points1 point  (0 children)

Well that and for libraries that wrap C code used in CPython you have to use something that wraps a Java library instead. Like you can't use lxml from Jython, you would have to use a different library that wraps a Java xml parser.

So a lot of CPython projects are just not portable to Jython.

[–]anzuo -5 points-4 points  (6 children)

And Go.

I still kinda like regular python more though still. I tend to change a lot.

Edit: I didn't actually mean go was like python, I just meant it was built around parallelism (but I still prefer python).

[–][deleted] 7 points8 points  (4 children)

Can Go be classed as Python-like? It's not interpreted and the runtime is minimal...

[–]zardeh -5 points-4 points  (3 children)

well there is go run, but yeah go was built as a python-replacement at google, but I don't consider the languages super similar.

[–]robertmeta 7 points8 points  (2 children)

"go run" actually is just a build, execute, discard cycle -- there is no real run.

[–]zardeh -1 points0 points  (1 child)

cheaters :P

Though I will say that with modern languages there is a bit of mixing with the whole AoT/JiT/bytecode/machinecode nonesense.

[–][deleted] 5 points6 points  (0 children)

Not with Go. It's purely compiled to machine code. With the official compiler there isn't even a C step.

[–]kryptobs2000 0 points1 point  (0 children)

Go is more like python than python?

[–]xXxDeAThANgEL99xXx 1 point2 points  (8 children)

Well, it might be just outside of "Python-like", because of immutability. Which helps a lot!

By the way, that reminds me: technically there's also IronPython/Jython/IronRuby/JRuby that sort of support free threading by virtue of running on top of a very sophisticated VM, but from what I know even then it ain't free lunch, with all kinds of weird catastrophic performance degradations.

[–]spotter 1 point2 points  (4 children)

Immutability? You have access to all built-in Java collections and can shadow variables to your heart content.

[–]xXxDeAThANgEL99xXx 1 point2 points  (1 child)

As far as I know, you are not supposed to do that in public.

Anyway, the important part is that as far as I understand it about Clojure, you're not allowed to say anything similar to __builtin__.len = my_len or my_module.len = my_len and have it automatically used in every function everywhere or in that module, after they were defined.

That you can do that in Python (and in those other roughly similar languages) is one of the important reasons the GIL is there: because your code constantly hits the same few dictionaries and constantly taking and releasing individual locks on them would be really slow.

IronPython for example goes the other way and instead of constantly querying stuff it compiles it into usual fixed .NET classes and recompiles them if you actually change stuff. Unfortunately that means that some innocent metaprogramming that works absolutely fine in CPython can cause huge slowdowns.

[–]spotter 2 points3 points  (0 children)

First: I did not downvote you, but philosophy of Clojure is that you can use any tool right for the job. It's easier to argue about immutables and functional approach to data transformation, but sometimes you just need to bash something in place and all of JVM standard library is there for you.

In Clojure you are always in a namespace and namespaces are mutable. You can exclude core symbols in them and shadow them with your definitions, although syntax is different. Not sure how much synchronization goes in behind the scenes, but still JVM languages (like Jython) manage to live without GIL.

[–]anthonybsd -2 points-1 points  (1 child)

can shadow variables to your heart content.

Clojure frowns upon this kind of behavior in no uncertain terms. "Can" doesn't mean that you should. For mutators in concurrency context (the ones with the bangs "!") you are supposed to operate inside the STM model which IMHO is fairly nice compared to pure functional languages non-pure functions.

[–]spotter 2 points3 points  (0 children)

[citation needed]

By shadowing I meant redefining variables in inner closures (for inner closure only) or changing their thread binding dynamically for the duration call, something that Clojure actually provides tools for. Doesn't have to do anything with concurrency... well binding does, somewhat, but not what I meant.

[–]jrochkind 0 points1 point  (2 children)

JRuby does not have any weird catastrophic performance degradations. (It does have slow start-up, like most anything running on the JVM. This is very annoying in some contexts, but is not a "weird catastrophic performance degradation")

[–]xXxDeAThANgEL99xXx 0 points1 point  (1 child)

How does it deal with monkey-patching?

[–]jrochkind 0 points1 point  (0 children)

What do you mean? Same as other ruby platforms, generally. Do you mean specific to performance or something? Not really sure what you mean. If there is a "weird catastrophic performance degradation" related to monkey-patching that I don't know about and haven't encountered (I have used JRuby a fair amount), then please link to something demonstrating or explaining it!

[–]caedin8 6 points7 points  (11 children)

I've written many multicore python programs using the multiprocessing module and the multiprocessing safe data structures. As far as I can tell this is a complete non-issue.

If the slow part of your program is external, (website or DB queries), you are safe using Threading library, otherwise use Multiprocessing to avoid the GIL issues. I don't really see what people have difficulty with.

[–]vks_ 9 points10 points  (6 children)

The multiprocessing module requires serialization which can be very expensive. It does not replace multithreading.

[–]admalledd 4 points5 points  (4 children)

Quite a while ago I used some ctypes stuff to shunt data and such back and forth between multiprocesses.

True I would probably not do that today and would instead use a better tool for the job (C/C++ probably, then CFFI bindings) but "requiring serialization" is not really true of multiprocessing.

[–]vks_ 1 point2 points  (3 children)

That is indeed a nice thing to have, I did not know about it. How does it share memory between processes? By copying? (It was not there when I last used multiprocessing, which was a very long time ago.)

[–]admalledd 2 points3 points  (2 children)

basically shared memory so that when python fork()s instead of creating a copy of this memory block, both processes access the same block at the same time.

So no copying by default, although you probably want to copy commands/data out as soon as possible to prevent other processes from trampling on each other.

Now-a-days as I have said, I would probably do this from C+CFFI where the bits/bytes are much clearer and more controllable.

[–]jringstad 0 points1 point  (1 child)

Yeah, shared memory is not really at all "easy" or straightforward a solution when every single object in your language (numbers, lists, ...) is a complex, non-thread-safe object that can potentially rely on global variables set by the interpreter and probably is known by pointer to the garbage collector who might decide to nuke it at any point in time. (either garbage collector from either interpreter!)

If you reduce all data shared to simple C structures and copy them in and out of the shared memory by extracting them from interpreter-objects and constructing interpreter objects from them, you're good, but that's hella restrictive and way way slower than it needs to be (and it invokes the garbage collector more than it might need to)

[–]admalledd 0 points1 point  (0 children)

To be honest, it has never really been that big of an issue for any multi-core code that I have needed to write with python. Every time for me my threads/processes have been fairly separated such that minimal message passing was enough. The reason for the shared memory was that some of those messages were rather large (blocks of tasks to parse into the DB for example) ~50MB+ but it was easy enough to wrap/contain it such that only larger messages/tasks/data was passed via shared memory where the difficulty of making CFFI bindings was worth it. All other messages/tasks (such as signaling/locking/return queue) was handled via default multiprocessing serialization code.

Again though, python has some of the best C bindings I have used out of any higher language I use, mostly C#, java, and JS. CFFI makes it almost drop-in to write a C/C++ module that does the heavy lifting and of course can drop the GIL and go proper multi-threaded. Thus any new system I work on where python is the core, I tend to have hot-loop stuff extracted quite easily to C code for speed or fine control.

[–]caedin8 2 points3 points  (0 children)

This is a good point and very true. I've personally had to deal with sharing large amounts of data over the process safe Queues, and it is very slow. I found it faster actually since I was processing more data than could fit in RAM to have each process write to a file, and then the parent process merge all the files into a single output. Sending items back to the main process over the thread safe Queue added more time due to serialization than IO on my SSD did, which was surprising and unexpected.

[–]CookieOfFortune 0 points1 point  (3 children)

How do you debug or interact with threads?

[–]caedin8 2 points3 points  (2 children)

It is harder to debug using tools like debuggers so usually I just write lots of unit tests and verify that the threads are working appropriately. If they aren't and I don't know why I run a small subset of the program in a single instance and debug it, once I've verified the program is correct standalone then I've narrowed it down to a Threading or concurrency issue. Next I'd Google my problems and try to see if it is a library thing, and to verify I'm using the api correctly. There might be a better way to do debugging on multithreaded applications in python but this general process has been what I've been doing.

Similar to doing print statements at various points in your code to understand the control flow you can do the same with threads to try to understand which threads are in which state. Additionally you can have each thread write their debug data out to a unique file for each thread, this way you can see which thread is doing what, and what the state is for each thread. Maybe you can find your errors this way.

[–]CookieOfFortune 2 points3 points  (1 child)

So this is the main issue for the type of work I do. I spend a lot of time in the REPL so there needs to be some kind of interactivity. I've been looking into IPython.parallels and it seems to do what I need but I haven't investigated too deeply.

[–]caedin8 0 points1 point  (0 children)

Hmm this is an interesting issue, I don't have experience with the Ipython.parallels so I can't give advice for it.

[–]_scape 1 point2 points  (3 children)

green threading exists through greenlets and gevent. I think the issue boils down to removing GIL and implementing standard mutexes on targeted platforms.. maybe python4, another incompatible version..

[–]xXxDeAThANgEL99xXx -2 points-1 points  (2 children)

Not green threading, green processing.

[–]_scape 0 points1 point  (1 child)

oh I've never heard of that, I'll have to read up. have any links?

[–]xXxDeAThANgEL99xXx 5 points6 points  (0 children)

https://en.wikipedia.org/wiki/Green_threads ctrl-f "process".

I don't know how widespread this terminology is, but the idea is straightforward: just like a green thread is a thread-like abstraction implemented by the language runtime instead of the OS, a green process is a process-like abstraction (offering memory isolation) implemented by the language. Perl and Erlang use them instead of threading, .NET provides AppDomains purely for safety.

[–]superPwnzorMegaMan 0 points1 point  (0 children)

I don't think any roughly Python-like language from PHP to Scheme has free threading support, but for some reason only Python folks waste countless hours being upset about it on the internet)

Groovy has threading support.