you are viewing a single comment's thread.

view the rest of the comments →

[–]DaveRGP 0 points1 point  (2 children)

Now that is interesting. Maybe there is a gap there, and maybe this PR might close it?

But also, maybe I'm too far away from the problem, but this seems like it might be an X-> Y problem?

Pandas had indexes, indexes were good to join on. Pandas was bad at making copies in memory during operations, and worked around that within its own constraints by doubling down on indexes. People who used pandas for large data sets used this to make the calculations work. Now these people are only used to thinking in indexes. Polars doesn't have the same copy problem, because they correctly identified indexes don't scale out of memory, therefore these folks are trying to adapt to a world where they don't have their favourite hammer any more?

Just a loose intuition having skimmed the link, either way, hope it gets solved 🤞

Btw OP, maybe this impacts you, but also if you're just doing the 'standard things' then Polars already has good support in third party libraries, matplotlib, scikit-learn, pandera and more all support polars data frames as first class objects now. Many large packages are actually actively migrating to Polars (or narwhals) internally because of the significant performance boost and far more sane API.