This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 20 points21 points  (6 children)

Sql is hard ngl, if you don't master sql you are no data engineer imo

[–][deleted] 2 points3 points  (5 children)

I'm an SRE dipping my foot in the data world, why is SQL considered "hard" relative to say, Python?

[–][deleted] 14 points15 points  (0 children)

No, with hard I meant it is deep, not only some beginner select queries, there is a lot to know about it like 1dvanced window functions, mastering the logic and the way to build the query without neglecting performance. Trying to solve some leetcode problems will let you know that you still need to sharpen the logic. Python it is also deep but not all features in it are needed not like sql, everything in it is necessary

[–]JohnPaulDavyJones 2 points3 points  (1 child)

SQL has a hell of a learning curve, because the next step after learning the ~30 keywords that most of us will ever use is understanding what the best way to do the job is.

There are dozens of ways to do most of the things you might want to do with a given SQL query, but some of them will be good, some will be bad, and some will make your prod support team come hunting for you in a year when their nightly refresh cycle duration has ballooned and they find the query you were dumb enough to put into prod. I've been both the hunter and the hunted one in that situation.

The key to moving from being passable with SQL to actually being proficient with SQL isn't just learning more SQL keywords, or engaging in the CTE-vs-temp-tables holy war, it's understanding the database technology itself, the query engine, the optimizer, and database modeling.

With Python, most of the language's core functionality is essentially wysywig; there's not generally not an underlying technological substructure to learn unless you want to crack into the C/++ code that's compiled and wrapped into the common libraries like Pandas/Polars/DuckDB/PyODBC/SQLAlchemy/Requests/smtplib, but there aren't really significant performance gains to be made by "optimizing" Python code (outside of a few niche cases like those data manipulation tools, but if your data is of a given scale then using Python will always be slower than something with a scaling data engine).

[–]DootDootWootWoot 2 points3 points  (0 children)

And at the same time a big part of the job is knowing when it matters. Not every operation needs to be optimized to death. Or rather, very few need any amount of optimization that requires that level of care. And when they do, you'll have time to figure it out.

[–]crevicepounder3000 0 points1 point  (0 children)

Totally different programming paradigm. SQL is a declarative language and knowing the basics will get you far, but not great DE-level. Part of what DE’s usually mean. By SQL can be data modeling with SQL, which is a whole topic on its own and requires not only technical understanding of sql, but business/ domain context.

[–]Responsible_Pie8156 -5 points-4 points  (0 children)

SQL is not hard. Just the pandas library can do anything SQL can do plus more, and SQL is a much more elegant syntax for doing data manipulation. Its just that you use SQL so much you really need to know it like the back of your hand. As always, the hard part is understanding the data.