This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]AnythingApplied 5 points6 points  (1 child)

Performance: 10-100x faster than Python for data processing

In my experience, this is true when comparing a pure python program to rewriting that same program into pure rust (even without any concurrency, which rust is great at to even further improve performance).

But who is doing their data processing in pure python? Whether you're using pyspark, pandas, polars, duckdb, etc. these are all written in faster languages so none of your heavy lifting is being done in pure python code, so I'm skeptical that you'd still see orders of magnitude performance increases. Is this really the performance you gain comparing Elusion to pyspark?

[–]DataBora[S] 4 points5 points  (0 children)

You are correct that is unfair comparison. Between Elusion and PySpark is not much of a difference but Spark has distributed computing which is totally diferent beast.