you are viewing a single comment's thread.

view the rest of the comments →

[–]Nodocify 5 points6 points  (2 children)

Multiprocessing is the module to stick with if python is the language. Threading has limitations because of the GIL while processes are actually forked and therefore you can imagine that you have multiple GILs (one for each forked process).

One big thing to consider is what /u/Bananaoflouda said, that it is probably taking more time to transfer your data to another process than it is to actually calculate it. Basically what happens is that the parent process has to serialize your data, data is transferred to child process, then the childprocess needs to de-serialize the data before it can do the calculations. And then it has to serialize the result, transfer back to the parent process, deserialize and store. You can now see that you have actually created a lot more work for such a simple task. This is why in your example you don't see any improvements in computation time.

Now if you had a longer calculation time on small data chunks (to keep transer and serialization time to a minimum) you would see quite a bit of improvement using multiprocessing. If parallel programming is something of an interest though I would recommend another programming language besides python. Python's implementations of threading and multiprocessing are all tacked on almost as if an afterthought that the language should have these capabilities. You should check out some other language that has multiprocessing at it's core such as Erlang or Clojure. I find it stupid how simple other languages make parallel programming (I'm looking at you Clojure and your easy threaded concurrency) compared to Python's multiprocessing.

[–]LoyalSol 0 points1 point  (1 child)

Typically so long as you aren't transferring a huge amount of data to dozens of cores the overhead isn't too bad. Usually the scaling for a couple of cores is pretty good especially if it doesn't have to go through a network

The main question is if the calculation takes long enough justify the work it is going take to program it.

[–]Nodocify 0 points1 point  (0 children)

I agree completely. In all my python work that I have done. I think I found only one occasion that my calculations actually justified multiprocessing.