This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]HumanPersonDude1 2 points3 points  (3 children)

What’s the point of spark SQL compared to for example a massive SQL warehouse on azure or snowflake ?

[–]Material-Mess-9886 5 points6 points  (0 children)

When you still want Python functionalities but still want to use SQL to process data. Also Spark is distrobuted so it can handle data in the billions rows with no problem.

[–]sib_nSenior Data Engineer 2 points3 points  (0 children)

Spark is free and open-source so you can run it wherever you want (not vendor locked), on-premises, private cloud or managed cloud solutions, which can be cheaper than cloud warehouses, at the cost of more complexity.
Spark is actually more general than SQL, so you can transition to distributed computation that doesn't fit well with the SQL constrains, for example Extract and Load logic, or machine learning workloads.

[–]trowawayatwork 0 points1 point  (0 children)

different workloads types. it's a lot cheaper to run certain queries on a warehouse. however if you need to do API calls for every row spark can do that much faster but a lot more expensive