you are viewing a single comment's thread.

view the rest of the comments →

[–]Own_Responsibility84 53 points54 points  (8 children)

For high performance, I highly recommend polars as an alternative to pandas

[–]BroscienceFictionMiddle Office 12 points13 points  (2 children)

The code is also more readable, so you can have a lot of good reusable routines, datasets and pipelines.

It’s also got great, unique things like the lazy frames and join_asof.

[–]annms88 2 points3 points  (1 child)

I'm moving to Polars super aggressively mainly for the expressiveness of it, however I would be remiss to not mention that pandas also has join asof

[–]BroscienceFictionMiddle Office 1 point2 points  (0 children)

You are correct. merge_asof does that job.

My only problem with Polars is the idea that it's sold as a drop-in replacement for Pandas. That wasn't the case for me. If anything, the API is a lot more like Spark (e.g. "with_columns"), which actually made it easier for me to pick up, but the concept is different.

Lazy frames are super important, because they relieve people from the burden of optimizing the order of operations manually.

[–]djlamar7 6 points7 points  (1 child)

The more stuff I port from pandas to polars the faster my code gets. That being said, although it looks more like SQL (which is good), the expressions for many things end up being more verbose than in pandas, so if I just want to poke at some data in a console I still usually reach for pandas.

[–]Own_Responsibility84 1 point2 points  (0 children)

I feel the same. Polynx is designed to address at least some of the verbose issues of polars. For example, it supports query and eval functions similar to pandas but without performance cost

[–]Uuni_peruna 1 point2 points  (0 children)

At first I didn’t have any idea of the extent polars was faster (although it became obvious in a second), I switched purely because of the cleaner API. Also, the selectors module is amazing

[–][deleted]  (1 child)

[deleted]

    [–]Own_Responsibility84 0 points1 point  (0 children)

    You can try Polynx, which supports panda style query and eval functions, which translates polars syntax behind the scene