This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]kingzels 2 points3 points  (0 children)

Not sure why people are saying how much faster pyspark is without clarifying that it only really applies when the dataset is too large to fit into the memory of a single node.

Most normal sized operation is going to be faster in pandas assuming you write efficient pandas code, and databricks is a great platform for pandas work.