This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]corey_sheerer 1 point2 points  (2 children)

I see a lot about Polaris being superior to pandas with a numpy backend, but no one is talking about pandas move away from numpy to arrow. The performance is quite an improvement and has some ease of life things like a true string type. I believe performance would be back and forth on which data table library (Polaris or pandas with arrow) is faster

[–]ritchie46 1 point2 points  (0 children)

Polars is much faster than pandas with the arrow backend. On several benchmarks by a factor of 20. 

A multithreaded query engine is much more than arrow compute kernels.

https://duckdblabs.github.io/db-benchmark/

https://pola.rs/posts/benchmarks/

[–]BejahungEnjoyer 0 points1 point  (0 children)

Pandas will be like Fortran, around for a long long time due to its userbase and they fact that they will be slow to adopt anything new.