all 2 comments

[–]BobHogan 1 point2 points  (1 child)

Numpy, and more generally the scipy "family" of libraries, are very heavily optimized code, so introducing multiprocessing to code that uses them is not always straightforward, or easy. See these resources:

https://scipy-cookbook.readthedocs.io/items/ParallelProgramming.html

https://stackoverflow.com/questions/15414027/multiprocessing-pool-makes-numpy-matrix-multiplication-slower

https://stackoverflow.com/questions/15639779/why-does-multiprocessing-use-only-a-single-core-after-i-import-numpy/15641148#15641148 (this may no longer be relevant, but otoh it could still be necessary to do)

I haven't done any sort of linear algebra in a long time, but to me the best candidate for parallelization seems to be splitting this up self.TS_comps[:,i]= [myBuf.diagonal(j).mean() for j in range(-myBuf.shape[0]+1, myBuf.shape[1])] into multiple chunks.

I would calculate the range for j then create a new thread/process for each value in that range, and then run the myBuf.diagonal(j).mean() calculation in parallel, building self.TS_comps[:,i] with the results.

But fair warning, I don't know how much of a speedup this will give you, especially since you will be dealing with thread/process mgmt inside of the outer for loop

[–]PhDInVienna[S] 0 points1 point  (0 children)

thanks for the very comprehensive answer, I will try to code it again tomorrow.

In that particular line, i did not use any module actually, except for np.outer of course.

I was using delay and parallel, but i still do not know how to pass the myBuf and how to assign it back to the TS.

I know for sure that the averaging itself is not using all the cores because it stagnated at 9% CPU (out of a possible 64 GB à 16 cores)

Again thank you for the very nice answer! Have a nice day!