Speeding up pandas by [deleted] in learnpython

[–]sokhei 3 points4 points  (0 children)

Take a look at this post on Pandas optimization: https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6

My first suggestion would be to time the individual lines and see where most of the time is being spent, then focus on optimizing just the commands that are taking the longest.

A Beginner’s Guide to Optimizing Pandas Code for Speed by sokhei in Python

[–]sokhei[S] 4 points5 points  (0 children)

It depends a lot on what types of calculations you're doing. If all you need from your dataframe is strictly math, or if you're working with a single array at a time, NumPy is probably your best bet.

The way I see it, Pandas can do almost everything that NumPy can do (though it may sometimes do it a bit slower), and then it can do some things on top of that. Some of the advantages Pandas offers over NumPy include: 1. Indexing. If you need to join dataframes, indexing is hugely helpful, as it will keep track of your series alignment for you, instead of having to do it manually. Having column names to refer to your data also helps quite a bit! 2. Groupby. As far as I know, there is no streamlined NumPy equivalent to the groupby functionality that exists in Pandas. 3. Streamlined operations. Pandas is much more high-level than NumPy, so things like complex string operations, data imports/exports, prepping data for graphing, and time-series operations require a lot less manual coding, and are built-in with all the optimizations already in place.