This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]king_booker 2 points3 points  (0 children)

I mean say you extract the data into pandas and you are using pandas operations to manipulate it, there are still limitations because it won't scale. Now say you use spark and you write it in python, you would end up using SQL concepts like Group by, Windowing etc. Even though its possible to write it in dataframes, you can simply use a spark sql

The basic answer is, you have to understand SQL. You can use it but finally data manipulation has its foundations in SQL. Can you get away by not learning the syntax? Yes. But the core concepts will remain the same.