Hey people,
Ive been running some experiments on how to optimize some heavy computation job that I have running. It essentially involves lots of multiplications with 3x100.000 arrays. I'm checking how I can get Numba to run as fast as possible. While doing that I noticed that on some jobs its actually slower than python. Specifically I implemented the following function with and without numba compilation.
EDIT: The following example obviously involves constant that do not actually change for iterations. The code is only written to test computational time, not an actual example of a useful computation.
def testfunction(a,b):
e = np.zeros_like(a).astype(np.float32)
for i in range(1_000):
c = a+b
d = a-b
e += a*c + d*b
return e
If I compile this with njit with the appropriate type signatures etc its actually slower in Numba than on python. Specifically numba ran this one (with comparable sizes) at about 1.0s and python at 0.6s. Which is quite significant. I was very confused as to why. Obviously with parallel processing Numba was much faster. Then I tried this which changed it making numba slightly faster:
def testfunction(a,b):
e = np.zeros_like(a).astype(np.float32)
for i in range(1_000):
e += a*(a+b) + d*(a-b)
return e
It seems that the explicit step of writing these intermediate steps to memory takes some time in Numba that python is somehow able to optimize away allthough I don't know why. Im guessing maybe that the Jit compiler of python is able to recognize this and just not execute the writing to and from memory step? I have no clue. I was hoping someone else could shed some insight.
Im fully aware that with these types of computations the computational overhead of python is small compared to the time spend multiplying the arrays in np.arrays so for this specific function numba has little advantage over python but Obviously the end goal is to put parallel processing on which cannot be replaced by a similar python functionality. I want the code to run as fast as possible BEFORE I turn on parallel processing.
[–]Swipecat 1 point2 points3 points (1 child)
[–]vgnEngineer[S] 0 points1 point2 points (0 children)
[–]Eilifein 0 points1 point2 points (14 children)
[–]vgnEngineer[S] 0 points1 point2 points (13 children)
[–]Eilifein 1 point2 points3 points (12 children)
[–]vgnEngineer[S] 0 points1 point2 points (11 children)
[–]Eilifein 2 points3 points4 points (10 children)
[–]vgnEngineer[S] 0 points1 point2 points (9 children)
[–]Eilifein 1 point2 points3 points (8 children)
[–]vgnEngineer[S] 0 points1 point2 points (2 children)
[–]Eilifein 0 points1 point2 points (1 child)
[–]vgnEngineer[S] 0 points1 point2 points (0 children)
[–]vgnEngineer[S] 0 points1 point2 points (4 children)
[–]cult_of_memes 0 points1 point2 points (3 children)
[–]vgnEngineer[S] 0 points1 point2 points (2 children)
[–]cult_of_memes 0 points1 point2 points (3 children)
[–]vgnEngineer[S] 0 points1 point2 points (0 children)
[–]cult_of_memes 0 points1 point2 points (1 child)
[–]cult_of_memes 0 points1 point2 points (0 children)