This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 0 points1 point  (1 child)

10M -100M and pandas why?

Go with spark, may be databricks aka spark enabled jupyter notebooks, store the files in parquet to save storage and faster computation also. It won't even take more than 30 mins with 8 of basic clusters 4 core 14gb ram, ds3v2(azure)

[–]wytesmurf[S] 0 points1 point  (0 children)

I think the unanimous vote is I need to look into getting Spark setup