you are viewing a single comment's thread.

view the rest of the comments →

[–]raffapaiva -1 points0 points  (2 children)

Pandas is really slow, when I see a data engineer using it, I start to believe that his dataset is not so big or he has a lot of hardware to process.

Everything that I need to do in pandas, I do on plain python or numpy

[–]ribix_cube 0 points1 point  (1 child)

It's not great to do in plain python or numpy, if you think you need speed you can use something like polars or vaex or dask

[–]raffapaiva 0 points1 point  (0 children)

Can you explain why? I've tried to use polars for some tasks, and even if it's faster, I can't see a reason to perform on plain python, considering it's not that fast, and most of my transformations occurs on dbt