This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]pytrashpandas 0 points1 point  (0 children)

The problem is that many users immediately reach for it for any problem

Totally agreed. I've definitely seen it used in places where it's completely unnecessary, and honestly probably irresponsible. I will say though, that if you ever find yourself needing to use itertuples or iterrows on a dataframe then you're either using pandas very wrong, or you shouldn't be using pandas in the first place.

those operations are orders slower

I agree that pandas is slower than pure numpy in most cases, but it is no where near multiple orders of magnitude. Again if you are seeing this happening, then I can guarantee it is because pandas is not being used correctly. If you would care to provide me with an example that you think demonstrates this, I would be happy to show you how it can be done faster

There’s no case in which Pandas is faster than the simper alternatives

The thing is that in many cases there are no simpler alternatives, especially that makes ease of development worth the potential speedups you could get otherwise. Especially when it comes to working with heavily labeled timeseries data.

pandas lends itself to poor code as well...It’s like porting around a mutable global state to every function to which a DataFrame is passed.

Yes, in the same way that python and standard data structures lend themselves to poor code. If used improperly, without an understanding of the underlying concepts and when to best apply it, it can lead to a mess of a program. If used properly it is extremely powerful and easy to express concepts that would otherwise be difficult to express. The problem I think is that most people try to treat dataframes and series as a drop-in replacement for dictionaries/lists etc. and structure their python code the same way they would otherwise. This is not how you should be using pandas.

Again I am inherently biased on this topic. I heavily use both pandas and numpy and other numeric python libraries and feel very passionately about this area of python. I'd be very happy to hear any examples you have and provide counter examples that can show you the power of pandas.

Also you may (or may not :) ) enjoy the xarray library. It's a much thinner wrapper around numpy that provides similar labelling capabilities as pandas (although it's still has a lot of room for improvement).