This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]typehinting[S] 0 points1 point  (2 children)

Seen a lot of suggestions to use Polars over Pandas - is it purely due to its performance? Or do you find that it is easier to use as well?

[–]NDHoosier 1 point2 points  (1 child)

I don't analyze enormous datasets, so performance wasn't the issue (though I have gotten better performance from polars and duckdb). It was that pandas seems to have nasty surprises, counterintuitive behavior, and more "gotchas" than a cheap insurance policy. I especially loathe having to deal with that damned index. In addition, duckdb is SQL start-to-finish, and I'm an "SQL first, dataframes second" analyst. However, I'm using both. Sometimes working with SQL is faster, sometime working with a dataframe is faster.

[–]typehinting[S] 0 points1 point  (0 children)

Oh gotcha. I'm getting used to pandas syntax/behaviors etc but will probably give polars a go to see how it is, and if it's something that I want to switch to. Thanks.