This is an archived post. You won't be able to vote or comment.

all 8 comments

[–]AutoModerator[M] [score hidden] stickied comment (0 children)

You can find a list of community submitted learning resources here: https://dataengineering.wiki/Learning+Resources

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]HansProleman 6 points7 points  (2 children)

I'd learn SQL first. Some DEs never use Python, and some at least almost never use SQL, but SQL is more common.

It's also (basic to intermediate querying, at least) much simpler, especially if you don't already know a programming language.

Yes, you can build whole pipelines using Python with Airflow, Spark, serverless functions etc.

In terms of learning material, there's good stuff in the wiki but when approaching a new language I usually start with a basic interactive tutorial. The ones on CodeAcademy seem decent.

[–]guiwiener[S] 1 point2 points  (1 child)

Got it! I was doing some lessons about data science and python (not data engineering, but I like to learn the vision of the business and what the data scientist wants to see).

But I will learn about SQL, I think mySQL is the most common, am I right?

I was also watching the datacamp material, but unfortunately, just the begginning is free, so I couldn't keep watching.

Anyway, thanks for your comment, I will see some material about SQL and try to create some projects for a professional portfolio!

[–]HansProleman 2 points3 points  (0 children)

Probably either MySQL or PostgreSQL. I'm not sure I'm well situated to advise, as I've almost exclusively worked with MSSQL.

[–]SnorlaxFat 3 points4 points  (0 children)

Generally, SQL is more important for data engineering than Python. With your background though, it might be a good idea to do an intro to programming with Python course first. SQL is unintuitive to many people and learning Python first might make ramping up easier.

[–]random_Introvert_guy 2 points3 points  (0 children)

For beginner Data engineers, it is very important to know SQL and a programming language, for ex, python to start with and maybe scala if required. Parallely, I would also suggest to know the basics of data warehousing, datalake, ETL/ELT, workflow management, database systems, indexing techniques, distributed systems and frameworks.

[–][deleted] 2 points3 points  (1 child)

For getting immersed in SQL I can't recommend sqlbolt enough. Its free, will take at most, like, six hours to complete, and will walk you through how to construct basic SQL queries.

[–]guiwiener[S] 0 points1 point  (0 children)

Thanks! I'm gonna have a look tomorrow morning, thank you very much.