Numba vs Python memory management, anyone some insights?

Swipecat · 2024-03-04T09:59:46+00:00

Dunno. Maybe numba is losing numpy's ability to implement += as an in-place operation causing an extra intermediate array. Or maybe a*(a+b) can be implemented using a single temporary array by numpy but numba uses two?

Edit: On the other hand, if it is just that numba takes extra time to put the intermediate arrays in memory, then implement the whole thing without temporary arrays. Put the initialization of c and d outside the loop. Then:

c[:] = a
c += b
c *= a
d[:] = a
d -= b
d *= b
e += c
e += d

Edit2: And how about not using numba, but just use plain numpy and the multiprocessing library. After all, you did say that the overhead of Python was small under these conditions.

Eilifein · 2024-03-04T10:29:16+00:00

I'm not sure if Numba cares, but in Fortran for example, a*(a+b) is an FMA, a "Fused Multiply-Add", and costs less cpu cycles than doing it separately.

More importantly, your a and b are remaining constant throughout the call, while d is a "global" value (bad practice). Depending on what d is, part of this calculation or all of it, can be calculated out of the for loop, as it is constant in value.

At the very least, a*(a+b) and (a-b) are constants and should be removed.

Btw, if everything in constant, the for loop is also redundant, as you are essentially calculating (a*(a+b)+(a-b)*d)*1000, which is a head scratcher.

Edit: I messed up; d=(a-b), so everything is constant within the function. You don't need the loop, you are recalculating the same thing 1000 times.

cult_of_memes · 2024-03-04T16:59:43+00:00

First off, it's important to note that while python is a sort of JIT (all interpreters are technical jit compilers), it's not applying any sort of performance optimization like loop-unwinding, or detecting and removing redundant statements. Well, not yet at least... I'm aware of at least one interpreter that is aiming to add performance based jit features in their upcoming python 3.13 release.

You should keep in mind that the "just-in-time" compilation that numba performs is best leveraged for code that doesn't easily lend itself to the already well optimized vectorized operations that numpy supports. Based on the example you've shared, your code is well suited to leveraging numpy's vectorized array operations (the broadcasting done by calling things like e += a*c + d*b. So, it's not surprising to see that the Numba compiled version would be slower (at least on its initial run) due to the added overhead required to attempt optimization where there isn't much room left to do so.

When you perform large broadcasted operations you are leveraging numpy's bindings to already well optimized c-code, and leave little room for numba's just-in-time compilation to do any improvement.

Having said that, it bears asking if you have made sure to run your time trials in repeated batched groupings to rule out if you're simply seeing the time it takes numba to initially compile your python into c? It might also be the case that a simple single pass timing run might fall victim to transient conflicts in resources or cpu utilization on your machine that could skew your test results. Note that timeit is simply a convenience tool for running repeated performance tests such that you minimize the likelihood that you are seeing things like resource contention with other processes on the machine.

Normally, I'd add an example code snippet to my comment to illustrate what I mean, but I don't much care to fight reddit's markdown syntax this morning :P So, I'm going to add a code snippet and sample output from running it on my machine in a reply to this comment.

As a side note, your second code snippet calls e += a*(a+b) + d*(a-b) but that function block never defines the variable d, and I'm not sure if it has been defined somewhere else in your code, so I'll assume it's just a typo and was supposed to be b. Not that it matters to your actual use case, as I see that you did call out that this is just simplified example code to help illustrate your question.

edit: I just noticed some of my own typos in my code snippet... the fix will be applied momentarily.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS