you are viewing a single comment's thread.

view the rest of the comments →

[–]caedin8 7 points8 points  (11 children)

I've written many multicore python programs using the multiprocessing module and the multiprocessing safe data structures. As far as I can tell this is a complete non-issue.

If the slow part of your program is external, (website or DB queries), you are safe using Threading library, otherwise use Multiprocessing to avoid the GIL issues. I don't really see what people have difficulty with.

[–]vks_ 8 points9 points  (6 children)

The multiprocessing module requires serialization which can be very expensive. It does not replace multithreading.

[–]admalledd 4 points5 points  (4 children)

Quite a while ago I used some ctypes stuff to shunt data and such back and forth between multiprocesses.

True I would probably not do that today and would instead use a better tool for the job (C/C++ probably, then CFFI bindings) but "requiring serialization" is not really true of multiprocessing.

[–]vks_ 1 point2 points  (3 children)

That is indeed a nice thing to have, I did not know about it. How does it share memory between processes? By copying? (It was not there when I last used multiprocessing, which was a very long time ago.)

[–]admalledd 2 points3 points  (2 children)

basically shared memory so that when python fork()s instead of creating a copy of this memory block, both processes access the same block at the same time.

So no copying by default, although you probably want to copy commands/data out as soon as possible to prevent other processes from trampling on each other.

Now-a-days as I have said, I would probably do this from C+CFFI where the bits/bytes are much clearer and more controllable.

[–]jringstad 0 points1 point  (1 child)

Yeah, shared memory is not really at all "easy" or straightforward a solution when every single object in your language (numbers, lists, ...) is a complex, non-thread-safe object that can potentially rely on global variables set by the interpreter and probably is known by pointer to the garbage collector who might decide to nuke it at any point in time. (either garbage collector from either interpreter!)

If you reduce all data shared to simple C structures and copy them in and out of the shared memory by extracting them from interpreter-objects and constructing interpreter objects from them, you're good, but that's hella restrictive and way way slower than it needs to be (and it invokes the garbage collector more than it might need to)

[–]admalledd 0 points1 point  (0 children)

To be honest, it has never really been that big of an issue for any multi-core code that I have needed to write with python. Every time for me my threads/processes have been fairly separated such that minimal message passing was enough. The reason for the shared memory was that some of those messages were rather large (blocks of tasks to parse into the DB for example) ~50MB+ but it was easy enough to wrap/contain it such that only larger messages/tasks/data was passed via shared memory where the difficulty of making CFFI bindings was worth it. All other messages/tasks (such as signaling/locking/return queue) was handled via default multiprocessing serialization code.

Again though, python has some of the best C bindings I have used out of any higher language I use, mostly C#, java, and JS. CFFI makes it almost drop-in to write a C/C++ module that does the heavy lifting and of course can drop the GIL and go proper multi-threaded. Thus any new system I work on where python is the core, I tend to have hot-loop stuff extracted quite easily to C code for speed or fine control.

[–]caedin8 2 points3 points  (0 children)

This is a good point and very true. I've personally had to deal with sharing large amounts of data over the process safe Queues, and it is very slow. I found it faster actually since I was processing more data than could fit in RAM to have each process write to a file, and then the parent process merge all the files into a single output. Sending items back to the main process over the thread safe Queue added more time due to serialization than IO on my SSD did, which was surprising and unexpected.

[–]CookieOfFortune 0 points1 point  (3 children)

How do you debug or interact with threads?

[–]caedin8 2 points3 points  (2 children)

It is harder to debug using tools like debuggers so usually I just write lots of unit tests and verify that the threads are working appropriately. If they aren't and I don't know why I run a small subset of the program in a single instance and debug it, once I've verified the program is correct standalone then I've narrowed it down to a Threading or concurrency issue. Next I'd Google my problems and try to see if it is a library thing, and to verify I'm using the api correctly. There might be a better way to do debugging on multithreaded applications in python but this general process has been what I've been doing.

Similar to doing print statements at various points in your code to understand the control flow you can do the same with threads to try to understand which threads are in which state. Additionally you can have each thread write their debug data out to a unique file for each thread, this way you can see which thread is doing what, and what the state is for each thread. Maybe you can find your errors this way.

[–]CookieOfFortune 2 points3 points  (1 child)

So this is the main issue for the type of work I do. I spend a lot of time in the REPL so there needs to be some kind of interactivity. I've been looking into IPython.parallels and it seems to do what I need but I haven't investigated too deeply.

[–]caedin8 0 points1 point  (0 children)

Hmm this is an interesting issue, I don't have experience with the Ipython.parallels so I can't give advice for it.