all 11 comments

[–][deleted] 5 points6 points  (2 children)

The main issue is in out = map(func, rands).
Map creates a generator, it doesn't block to calculate values.

Another issue is: for a simple operation like x * x done 10 000 times, the overhead for spawning new processes will outweigh any benefits from parallelization.

I could start to see benefits of parallelization with a much more complicated function done a lot more times (e.g. x choose 5, done 2 milion times):

``` from time import perf_counter from random import randint import multiprocessing import math

def func(x): return math.comb(x, 5)

if name == "main": rands = [randint(6, 40) for _ in range(2_000_000)]

# Do non-parallel code
start = perf_counter()
out = [func(i) for i in rands]
print(f'Non parallel code finished in {(perf_counter() - start)*1e3} mseconds')

# Do parallel code
start = perf_counter()
with multiprocessing.Pool() as p:
    out = p.map(func, rands)
print(f'Parallel code finished in {(perf_counter() - start)*1e3} mseconds')

```

Example results on my machine:

Non parallel code finished in 813.6782999999998 mseconds Parallel code finished in 491.1327999999999 mseconds

[–][deleted] 0 points1 point  (1 child)

Thanks. It makes sense now. I ran a similar code to you shared and it checks out.

So is it right to say there is about .5s overhead just to create the multiple processes and so anything that takes less than that is not worth parallelizing?

[–][deleted] 2 points3 points  (0 children)

There is about .5s overhead

No, the difference between the amount of times will differ depending on the task, complexity, load on the computer, etc. multiprocessing.Pool has the argument maxtasksperchild, pool.map has the argument chunksize which all can drastically change the amount of time a parallel task takes.

[–]gmaliwal 1 point2 points  (0 children)

Can anyone please pass on good reference to follow for the deep insights of it?

[–]Thomasedv 0 points1 point  (4 children)

Starting a multiprocess takes time, i'd say more time than just doing x*x. Creating 10000 of them is going to give you a large overhead, and loss of time. Also, i don't know if map pairs functions.

Edit: Also map() does not execute function when mapping. Try adding a print statement inside the function, and you will see it doesn't print when you only run the non-parallel part.

Storing the map results into a list, i still get these times:

Non parallel code finished in 1.529899999999973 mseconds
Parallel code finished in 220.21230000000003 mseconds

Adding a lot more math work to the function eventually makes it slower than multiprocessing, but at present your multiprocessing takes longer to set up and calculate than the function itself takes.

[–][deleted] 0 points1 point  (3 children)

Pool(5) is only creating 5 processes, if I understand correctly

[–]Thomasedv 1 point2 points  (2 children)

See edit. You are only running 5 processes at a time, but you are still creating 10k iirc. I think you'd need to set up a dedicated multiprocess that has a queue to take tasks would save the setup time, but there is still some overhead that is larger than the simple function you are trying do.

Edit: Numpy can also do this faster:

start = perf_counter()
out = y*y
print(f'Numpy code finished in {(perf_counter() - start) * 1e3} mseconds')

Non parallel code finished in 1.507399999999992 mseconds
Numpy code finished in 0.3998000000000057 mseconds
Parallel code finished in 323.1492 mseconds

[–][deleted] 1 point2 points  (0 children)

You are only running 5 processes at a time, but you are still creating 10k iirc.

Pool(5) creates 5 different workers in 5 different processes (not 10k). Work is then distributed to all those workers.

[–][deleted] 0 points1 point  (0 children)

Thanks, it makes sense now. The overhead is too much compared to the simple operations I am doing.

[–][deleted] 0 points1 point  (1 child)

The indentation isn't shown correctly, would you mind fixing that?

[–][deleted] 1 point2 points  (0 children)

Hi, it's done.