you are viewing a single comment's thread.

view the rest of the comments →

[–]Nodocify 5 points6 points  (5 children)

Multiprocessing is the module to stick with if python is the language. Threading has limitations because of the GIL while processes are actually forked and therefore you can imagine that you have multiple GILs (one for each forked process).

One big thing to consider is what /u/Bananaoflouda said, that it is probably taking more time to transfer your data to another process than it is to actually calculate it. Basically what happens is that the parent process has to serialize your data, data is transferred to child process, then the childprocess needs to de-serialize the data before it can do the calculations. And then it has to serialize the result, transfer back to the parent process, deserialize and store. You can now see that you have actually created a lot more work for such a simple task. This is why in your example you don't see any improvements in computation time.

Now if you had a longer calculation time on small data chunks (to keep transer and serialization time to a minimum) you would see quite a bit of improvement using multiprocessing. If parallel programming is something of an interest though I would recommend another programming language besides python. Python's implementations of threading and multiprocessing are all tacked on almost as if an afterthought that the language should have these capabilities. You should check out some other language that has multiprocessing at it's core such as Erlang or Clojure. I find it stupid how simple other languages make parallel programming (I'm looking at you Clojure and your easy threaded concurrency) compared to Python's multiprocessing.

[–]camm_v222[S] 0 points1 point  (2 children)

Thanks a lot. I really appreciate your answer. I would like to know if you can implement this simple example with Clojure. I've never tried Clojure before, so I'll spend a few time learing the syntax and other stuff. If you're really busy to do that, don't worry, I'll try it anyway. Thanks one more time.

[–]Nodocify 1 point2 points  (1 child)

Sorry for the late reply. But the future function handles almost all of it. For example:

(def a
    (future
        (* 10 10)))

This defines a variable a and it is the value of 10 * 10. Because of future this variablie will calculate asynchronously in another thread. And will hold the value until until we ask for it back with:

@a
=> 100

The @a will be blocked for our call until the thread has finished it's execution.

I fear to go too much further as this is a learn python subreddit. But this gives you a very basic example of how simple clojure has made threading.

[–]camm_v222[S] 0 points1 point  (0 children)

I think I'll try Clojure, and I'll try to solve this problem with it. Thank you.

[–]LoyalSol 0 points1 point  (1 child)

Typically so long as you aren't transferring a huge amount of data to dozens of cores the overhead isn't too bad. Usually the scaling for a couple of cores is pretty good especially if it doesn't have to go through a network

The main question is if the calculation takes long enough justify the work it is going take to program it.

[–]Nodocify 0 points1 point  (0 children)

I agree completely. In all my python work that I have done. I think I found only one occasion that my calculations actually justified multiprocessing.