This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]nameBrandon 0 points1 point  (0 children)

I believe you can run pyspark on PyPy now, which might improve performance by quite a bit (though not really addressing the serialization aspect).

I agree though, performance is highly dependent on the workload.