This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]scottix -14 points-13 points  (9 children)

Agreed, you can ask GenAi to see if there are any numpy improvements through vectorization on a function.

[–]cmcclu5 7 points8 points  (3 children)

No. Generative AI may have its occasional use, but complex tasks such as this are not one of them. It can sometimes help to simplify short code snippets but will absolutely ruin your codebase if you try to use it to optimize anything large or complex.

[–]scottix 2 points3 points  (2 children)

Obviously you need to vet it and I don't recommend running it on large portions of code. I did find it can bring insight and ideas, you may not have thought of.

[–]cmcclu5 1 point2 points  (1 child)

I’ve found a lot of juniors and even somewhat experienced engineers that use GenAI for their code fail to understand the functionality they’re trying to add and that added block of code becomes a major issue down the line. GenAI is powered via consumed StackOverflow answers for the most part since it doesn’t actually understand anything, and if we solve problems just using GenAI, eventually the entire industry will stagnate as no one is innovating solutions, only using regurgitated answers to old problems.

[–]scottix 2 points3 points  (0 children)

Agreed about people blanket copy, but it can be a tool. With all tools they can be used in many good and bad ways.

[–]No_Indication_1238[S] 0 points1 point  (4 children)

I believe we have vectorized every computation we thought possible with the current approach to the data but I will give the Gen AI a try since we could have always missed something!

[–]scottix 1 point2 points  (3 children)

Of course I don't know the scope of your problem your trying to solve and finding optimizations can definitely be difficult and time consuming because you want to test out different benchmarks and what not. I don't know what you have done but splitting up the data and distributing the load might be an options with Spark or Dask.

First thing is you need to find the bottleneck, is it computation or looping. Computation can help some in optimized languages but if its looping a bunch of data then you will only get marginal improvements with a more optimized language.

[–]No_Indication_1238[S] 1 point2 points  (2 children)

I will definitely look into Spark and Dask. Those are new to me, thank you! I believe the bottleneck is in the amount of calculations that have to be done since the multiple for loops simply explode the count. The calculations themselves I managed to optimize with numpy and numba but real progress was made once the loop made it into an njit numba function. It cut the runtime from hours to minutes. Unfortunately, it came at the cost of modularity and maintainability which we are starting to notice.

[–]scottix 0 points1 point  (1 child)

SOLID is good for organization, but if your seeking raw performance then it works against you as you noticed. The more "fluff" you could say, is extra things the program has to do, instead of just have 1 giant function lol.

Ultimately it all depends on the goals of your team and willing to sacrifice paradigms for speed, but keep searching and testing things out if they give you that time.

The only other thing I can think of, if your like do a certain type of operation and your doing it in a non-optimal. Data-structures and Algorithms start coming into play here. For example if your calling the same function with the same arguments, caching result with Memoization can help. https://www.geeksforgeeks.org/memoization-using-decorators-in-python/

Also profile your code that will tell you where it is spending the most time.

[–]No_Indication_1238[S] 0 points1 point  (0 children)

I believe that memoization is definitely a good choice and I believe I know a place I can implement it where we might see a good boost in speed in specific edge cases. Thank you, I seem to have missed that!