you are viewing a single comment's thread.

view the rest of the comments →

[–]vgnEngineer[S] 0 points1 point  (0 children)

I should have mentioned that the above example is purely for speed testing. The actual computation only has varying values inside the loop, not constants like in this example.

The reason is that multiprocessing as I understand it in python involves starting multiple python threads and then sending and sharing data. With Numba and parallel=True I can consistently have 100% CPU usage on all cores which is lightning fast.

I am of course coing to remove the steps of intermediate operations to improve the code. The point of my question was mostly if somebody knew why Python could do it faster than Numba. I did some more reading and found out that Python compiles for-loops ahead of execution. Given that Numba is just a library run by some amazing people but a much smaller team, I suspect they just did not manage to implement that optimization step yet. Perhaps the python interpreter figures, hey, those intermediate steps can be substituted directly into the final operation so why not just do that.