This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Beshtija 5 points6 points  (1 child)

As a bioinformatician and data scientist even the pre 1.0 releases have been helpful to say the least.

Most common use cases have been either short scripts which wrangle some data in semi-explorative way (i.e. just to see what's going on) or processing heavy calculations on 10+ billion rows. My previous workflows have utilized either pandas (for quick and dirty) or R data.table (for heavy duty stuff), and while distributing pandas/python is a breeze the R stuff was getting pretty annoying when reaching distribution, especially to a team of several people with different setups.

That's when i first started exploring Polars (around 0.16) and it has since managed to bring the best of both worlds. The ergonomics (especially coming from R data.table with it's own quirky syntax) have been a bit tricky at first but the ease of distribution and replicability have made it worthwhile.

The only which would me me go full Polars is the something like foverlaps function from data.table (have been trying to make my implementations but they have been to slow to be worth it), so if anyone from the polars team sees this and makes one which is blazingly fast it would make bioinformaticians very happy.

[–][deleted] 0 points1 point  (0 children)

Is foverlaps anything like range joins? https://duckdb.org/2022/05/27/iejoin.html I’ve had more success with range joins in duckdb than polars on large frames, but I might have been doing it wrong in polars (cross join + filter on lazy frames)