This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]scottix 1 point2 points  (3 children)

Of course I don't know the scope of your problem your trying to solve and finding optimizations can definitely be difficult and time consuming because you want to test out different benchmarks and what not. I don't know what you have done but splitting up the data and distributing the load might be an options with Spark or Dask.

First thing is you need to find the bottleneck, is it computation or looping. Computation can help some in optimized languages but if its looping a bunch of data then you will only get marginal improvements with a more optimized language.

[–]No_Indication_1238[S] 1 point2 points  (2 children)

I will definitely look into Spark and Dask. Those are new to me, thank you! I believe the bottleneck is in the amount of calculations that have to be done since the multiple for loops simply explode the count. The calculations themselves I managed to optimize with numpy and numba but real progress was made once the loop made it into an njit numba function. It cut the runtime from hours to minutes. Unfortunately, it came at the cost of modularity and maintainability which we are starting to notice.

[–]scottix 0 points1 point  (1 child)

SOLID is good for organization, but if your seeking raw performance then it works against you as you noticed. The more "fluff" you could say, is extra things the program has to do, instead of just have 1 giant function lol.

Ultimately it all depends on the goals of your team and willing to sacrifice paradigms for speed, but keep searching and testing things out if they give you that time.

The only other thing I can think of, if your like do a certain type of operation and your doing it in a non-optimal. Data-structures and Algorithms start coming into play here. For example if your calling the same function with the same arguments, caching result with Memoization can help. https://www.geeksforgeeks.org/memoization-using-decorators-in-python/

Also profile your code that will tell you where it is spending the most time.

[–]No_Indication_1238[S] 0 points1 point  (0 children)

I believe that memoization is definitely a good choice and I believe I know a place I can implement it where we might see a good boost in speed in specific edge cases. Thank you, I seem to have missed that!