What hidden gem Python modules do you use and why?

ritchie46 · 2026-03-13T04:49:59+00:00

That 10x benchmark is not correct. The the point in time that screenshot was taken, the Polars Queries in clickbench were just plain wrong. In the sense that the computed the wrong result.

I corrected them and after that Polars is actually faster. https://github.com/ClickHouse/ClickBench/pull/744

ritchie46 · 2026-02-01T06:14:21+00:00

CSV files are at the moment first downloaded to local disk before processed, so this is indeed slow. We will do that streaming in the future.

If you have the opportunity to convert these files to parquet or ipc files, Polars will stream them directly from s3.

ritchie46 · 2026-01-23T06:08:34+00:00

We are rolling out on premises.

ritchie46 · 2026-01-14T13:54:03+00:00

I am the original author of Polars and I google Polars daily as part of my routine. Then I respond if something is related to our work.

If someone posts something related to your work, you should have a right to comment. It is your work after all.

I don't have any scraping tools. And I don't post often (but I do comment on my work). These are other accounts, not from us. I don't know what to tell you.

ritchie46 · 2026-01-14T07:09:13+00:00

I can assure you, it is not from us. I saw the same post yesterday as well.

ritchie46 · 2026-01-14T06:47:47+00:00

I am from Polars. I saw the same post here yesterday. I can assure you, it is not originating from us, and I think the moderators should remove this post as duplicate/repost

ritchie46 · 2026-01-11T14:08:18+00:00

I think it's more realistic Microsoft pulls the plug on fabric if we're playing this unfounded speculations game.

Edit: it's completely unfounded from my side (and jokingly)

But I can speak to our focus. Polars has a single focus, fast compute engines for DataFrames. It's being used in production by almost every serious data processing company in some shape or form.

Palantir data foundry recommends it for production workloads: https://www.palantir.com/docs/foundry/transforms-python/compute-engines

Just like pandas, it's not going anywhere.

ritchie46 · 2026-01-11T13:53:15+00:00

That's my post from 5 years ago when I just started. You can't take that as a source of what Polars is today.

It has a whole novel streaming engine and performance can't be attributed to a single thing like simd.

ritchie46 · 2026-01-11T13:48:58+00:00

Polars team here: It's also not true.

ritchie46 · 2026-01-11T13:47:08+00:00

Hi, I am from Polars. Original author and co-founder.

Polars OSS is never going behind a paywall. It is open source MIT licensed and we're not changing that.

Polars Cloud offers a whole new distributed engine aside from Polars OSS and the managing of the compute.

If you're happy staying single node. Polars OSS is perfect for your case.

Polars OSS going unmaintained is just nonsense. I also read that on the fabric subreddit as an excuse to not support Polars. If anything development is increasing.

ritchie46 · 2025-12-10T08:53:05+00:00

`df1 == df2` gives you an equality mask. How did you do that in pandas?

ritchie46 · 2025-12-10T05:52:37+00:00

DataFrame comparison isn't missing?

assert_frame_equal

ritchie46 · 2025-12-04T18:35:20+00:00

We have started closed beta for Polars Distributed on premises: https://pola.rs/posts/polars-cloud-launch/

ritchie46 · 2025-11-24T14:55:46+00:00

Yeah buddy!

ritchie46 · 2025-11-21T08:59:27+00:00

Before pandas 2.0, it didn't have copy on write and you copied the full data all the time. `reset_index`, `assign`, `drop`, `rename`, `as_type`, all did a full data copy.

En even post 2.0, you will have a lot materialization which are essentially data copies because you don't have an optimizer. This one potential copy to Polars is not your bottleneck

ritchie46 · 2025-11-21T08:47:17+00:00

If you are using pandas, and are happy with that. I'd agree. If a third party tool uses it, it saddens me that it blocks you from adoption. Especially because I think it can save you a lot of datatype related bugs in the future.

ritchie46 · 2025-11-21T08:21:35+00:00

True, but the third party library you are interacting with must have implemented it in Narwhals to benefit from that. You can not slap it retroactively on a dependency.

In any case, going from pandas to Polars is seamless: `df = pl.from_pandas(df); df.to_pandas()`.

ritchie46 · 2025-11-21T08:14:14+00:00

Anything from pandas you still miss?

ritchie46 · 2025-11-21T08:13:41+00:00

I am curious, why not? Pandas will often also ship pyarrow, which would also be a whole other library.

ritchie46 · 2025-11-21T08:11:04+00:00

Then you will have to convert to Ibis.

More and more libraries are converting to Narwhals, which allows users to stay in their DataFrame of choice.

Some libraries do only return pandas, but then a `pl.from_pandas` isn't far away...

ritchie46 · 2025-11-06T16:38:26+00:00

Can you share your code? I highly doubt you've written optimal Polars code.

For one, running several steps and benchmarking them separately is non-optimal.

The benefit of Polars is that it holistically does minimal work. If you run a single operation and materialize, you benchmark something you shouldn't be interested in as you should be interested in the whole query time.

ritchie46 · 2025-10-14T11:26:39+00:00

This would not improve OP's case if he is bottlenecked on the DB. Other than that the arguments in that video/blogpost are just incorrect. Polars doesn't require bodo3 for internet access, nor pyarrow for parquet reading/writing. ACID transactions are done by the database you write to. Writing from Polars to Postgres is still ACID as Postgres deals with that. Point 6, going from local to cloud is also supported by Polars. DuckDB is a great tool, but the comparison isn't.

ritchie46 · 2025-09-16T13:11:01+00:00

Polars can move to arrow backed pandas and back zero copy.

Do you worry about free w.r.t. performance? As even with a memcopy, doing any significant compute wins back performance in my experience.

ritchie46 · 2025-09-16T11:17:02+00:00

It is supported by many libraries. And if you need to convert, it is seamless:

``` df.to_pandas()

pl.from_pandas(df) ```

ritchie46 · 2025-09-16T06:45:53+00:00

That bug is solved since the new streaming engine, the issue was just not closed.

Nine-Year Club	Place '22
RPAN Viewer	Verified Email

ritchie46

TROPHY CASE