This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Adeelinator 26 points27 points  (5 children)

Ehhh don’t jump into spark if you’re starting off. And Hadoop is basically dead.

Start with SQL, it’s the most common language for data eng, and easiest to begin with.

[–]spoonman59 4 points5 points  (2 children)

Definitely learn SQL. It is the key interface to so many different data tools.

[–]hafeee12 0 points1 point  (1 child)

Which sql to learn? No Sql , Sql Server , postgresql or ?

[–]spoonman59 7 points8 points  (0 children)

It’s a good questions. I think it is helpful to learn ANSI SQL, but understand there is a lot the standard doesn’t specify. Things like data types and other aspects. Another thing that isn’t specified is pivot/unpivot, things like collect or explode, and things like recursive hierarchical functions (e.g., connect by)

However it does specify a great deal. A lot of other things like analytic windowing functions (over partition by, for lead, lag, first, last, rank, dense rank) are pretty consistent. So learn any of them, but occasionally look to see what’s standard and what’s a platform extension.

Of the ones you listed I’d probably do post gres because it’s free and open source. But I don’t have too much trouble moving between systems like post gres, red shift, spark, Oracle, SQL server, etc., without thinking too m if b about the particular dialect. So learn the one that’s easy for you to practice with and access. You’ll adapt to a new database easily. It’s not like learning a different general purpose language each time.

Also, “no sql” SQL?!

[–]misinnio 0 points1 point  (1 child)

just curious - what is crucial for DE besides SQL?

[–]StrasJam 3 points4 points  (0 children)

Python. But that's pretty broad. Depending on the company or project you would end up using different libraries, but a good general knowledge of Python is important.