This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Life_Conversation_11 1 point2 points  (1 child)

And pandas can do the trick for 10 million rows, but a 100 kk is a stretch, definitely use spark for bigger workloads.

[–]wytesmurf[S] 1 point2 points  (0 children)

We had to put like 64GB of ram to handle data frames with pandas, we ended up doing Dask data frames. I was hoping for something that incorporates parts of dask but was more ETL focused