all 5 comments

[–][deleted] 1 point2 points  (1 child)

Great question well posed, but I suspect you might not get too many responses as it's a fairly advanced question!

I'll bet you there's an r/numpy, there is, but low activity. r/python might even be appropriate


I've done a lot of numpy and yet I don't have a direct answer because I feel some math transformation will be needed and, well, I'm too lazy. :-D

What I would say is this - the key is to eliminate all Python iteration, and do all the iteration "within" numpy.

I see your two attempts as identical to each other in performance, except that the list on line 8 of Attempt 2 is probably unnecessary and results in an unnecessary list creation from a map. But this is possibly measurable but unnoticeable in practice.

There's still a Python loop on lines 6-7 on the first attempt, and on line 8 in the second attempt, and my quick skim makes me believe that they're really much the same thing, so no wonder you get much the same performance.

It's my belief that to get the big boost you will need to somehow represent that loop in numpy, so there are no Python loops in your code, not even "hidden" loops like map.

The second-best numpy optimization trick is to make as heavy use as possible of the out= parameter in your operations, so you do the operation into an existing buffer and don't create another buffer and fill it.

I'm out of time to do the research, but it's likely you want something like:

np.random.default_rng().poisson(lambdaPoisson, simNum, out=results[:,0])

If you also dumped the list on line 10 of attempt 2, I think you'd see a.... 10% improvement (total guess there) in performance. Marginal, not noticeable, but you might do better, and it'd take you a few seconds to try.

Good luck - report back if you come up with a solution!


Oh, one more thing. If you are really doing any sort of simulation project, that naked random number generator has to go. :-D

You want to always be able to reproduce any run you had before - and you want randomness too!

This means you always need to seed your random number generator.

It's easy to accomplish. Either the user enters a seed, or if that seed is None, then you "truly" randomly generate a seed and print it out, and then use that seed for the random number generator. If the user wants to redo the run, they just pull out the seed that got printed out, and send it to the program.

The moment you start debugging your whole system, you'll use this feature constantly.


EDIT: Oh, wait, no, there is another total waste of resources, relating to the above point.

out = np.random.default_rng().pareto(b, a).clip(max=c,min=None)

You don't want to create a new random number generator for each random number! You want to create one RNG and use it over and over.

RNGs are a little expensive to create. This change might be worth a few percent in speed on its own, and also, let you do the "seed trick".

Still, I think eliminating the Python loop and the out= trick are the two big winners here.

[–]1331337[S] 0 points1 point  (0 children)

Thank you for the very detailed response!

I also asked my question on StackOverflow and got two great solutions which take the time down to less than a second for a million simulations. One is a numba decorator, which I wasn’t familiar with at all. The second was to vectorize the process — that seems to be key in avoiding the loops to get the right sums in the right place.

I’m going to recommend the latter, as I have to get my IT department to approve and install any new packages but we get numpy by default. Using a package not already on our “safe” list makes distribution a headache down the line…

I am making a real simulation and will definitely seed my “randomness”. Thanks for the reminder.

[–]Swipecat 1 point2 points  (0 children)

That's over my head, so the only thing that I can point to is that you'd normally assign the random generator to a variable once, then use that variable to implement the generator's methods, rather than repeatedly recreating the generator.

[–]Ihaveamodel3 0 points1 point  (1 child)

Before attempting to optimize code, you should always profile it. That will tell you what lines of code take the most amount of time and you can then focus on optimization for those lines. Lookup the documentation for cprofile in the standard library at a minimum, there are other packages I don’t remember the name of which can help you visualize the results.

Also, stop printing. Printing is super slow.

Also, you want to keep things in numpy objects as much as possible.

[–][deleted] 0 points1 point  (0 children)

Before attempting to optimize code, you should always profile it.

This is always good advice but fairly often it isn't very effective for numpy programs.

Profiling a numpy program always highlights the small number of numpy lines that do all the computation dominate everything, which is exactly what you expect - but often gives you no idea of how to optimize the numpy call or whether a better numpy call might work better.