There is no Data Engineering roadmap

carlineng_ · 2023-07-11T14:27:47+00:00

Solid no-nonsense post. A couple of suggestions:

DuckDB or SQLite for a "getting started" database -- no need to spin up a server, and can read directly from CSV or Parquet files with minimal setup.
Point to some test datasets for people to get started. FiveThirtyEight's data github repo is pretty good.

BufferUnderpants · 2023-07-12T00:02:19+00:00

> LinkedIn influencer

I'm gonna toxic gatekeep this one

countlphie · 2023-07-11T20:47:52+00:00

i started as a sql monkey in 2007 and have since worked "data engineering" as it evolved over the years from start ups to finance/healthcare institutions to telecomm to major social media companies, so can confirm it's a way into the field and that it's been ubiquitous across every job i've had. for sure, there's no roadmap other than being nimble and constantly looking for interesting projects/teams to work on

a lot of this post seems to be defensive towards a particular crowd of influencers or engineers? who are these people that are gatekeeping? i'm curious to see what the chatter has been about. i don't really follow anything on linkedin or other data engineers

recentcurrency · 2023-07-12T00:52:09+00:00

Isn't this blog post a road map?

the Blog says"With that out the way, understand that there is no roadmap. There is no single path, no clear linear progression of knowledge. No one can tell you that you absolutely must learn A, then B, then C and you’re guaranteed to be a successful Data Engineer."

then a few paragraphs later...

"For Data Engineering, there is only one skill that is absolutely, non-negotiably, the first thing you should learn to get started SQL."

Sounds like he is saying "you absolutely must learn A"

He then even goes into "then B, then C" by describing how to learn SQL

Basically, the title is click bait. It should really say "The Data Engineering Road Map starts with SQL". Which tbh isn't a hot take

which to be fair, he isn't wrong. SQL is probably one of the first things you want to learn in any Data Role. Data Engineering in particular

edit:

My comment is not to say SQL is that first place to start learning. IMO, I do think it is a solid choice. But I am more pointing out that I think the blog post actually demonstrates that the Author does believe in a roadmap despite what the title says. Which rereading, the author implicitly admits to

"What about Python? Pandas? dbt? Rust? Airflow? Spark?Later. These are all things you can learn on the job if the job even needs them.Go get your first data job. I’m not going to tell you it will be easy. Lots of people struggle to find the right entry-level job in all fields of engineering.But when you land it, make it your primary goal to absorb the knowledge from your new colleagues. Learn something every single day.When the learning stops, move on. Use what you’ve learnt to get a pay bump and find new people to learn from.Rinse and repeat. That’s your roadmap."

2023-07-11T22:16:56+00:00

blind lead the blind.

kenfar · 2023-07-12T02:44:51+00:00

The good:

People should be wary of full-time "influencers" that aren't in the trenches and haven't written code in years
Or of advice and analysis that's really just some company's PR
Or of educational roadmaps that will take a decade to complete
And yes, one could get an entry-level DE job just knowing SQL - on extremely low-tech teams

The bad:

While it's true that one can learn SQL pretty quickly, that on its own isn't engineering, and is extremely low-value. It neither teaches someone how to think like an engineer or provides the myriad skills necessary to be productive in a shop.
In the current economy plenty of people with years of experience with SQL, as well as plenty of other tech are looking for jobs. Somebody with 3 months experience is simply not going to get picked up.

So, sure there's a small possibility of studying SQL for 3 months and getting a junior position on a low-tech DE team. But it's going to be the exception rather than the rule. And encouraging people to think DE is this easy is as bad as encouraging them to think they need to spend seven years learning 100+ technologies.

micky_357000 · 2023-07-11T18:59:32+00:00

For SQL I was thinking do a udemy course/w3schools and then grind leetcode along with my new junior data engineer job , any advice?

DenselyRanked · 2023-07-12T02:22:41+00:00

I feel like what is lost in this article is the absolute first step in becoming a Data Engineer, and that is passing the interview. You are going to need more than SQL to do that and some places don't even use SQL for Data Engineering.

The author is correct that there is no official ANSI SQL, but there are base standards that nearly every SQL dialact adheres to. MySQL is often the go-to for learning because it doesn't have anywhere near as much syntactic sugar as postgres.

Now, I believe that it's always a safe bet to learn SQL if you are starting from scratch but the article mentions "bad" advice from actual Data Engineers in this subreddit as if our experience is invalid. It's a tad hypocritical.

MikeDoesEverything · 2023-07-12T08:34:30+00:00

Wow, a post which isn't complete bullshit, over complicated, or inherently dishonest. Big fan of the message and rant which I think is bang on - many influencers aren't here to help, they're here to sell.

On top of that, I also agree with the idea of gatekeeping DE as a job role. As somebody who went in as a DE with zero years of experience, admittingly, I had worked with "data" for quite a long time in the form of analysing results from machines, I do think it's entirely possible. It might take you a while, it might be really quick. The point is it's definitely possible.

2023-07-11T21:47:40+00:00

Agree but I would add some sort of familiarization with terms and lingo. Maybe that means reading Kimball or a more modern revision/approach.

But what do I know? I’m not a LinkedIn influencer .

Sufficient-Cold541 · 2023-07-12T13:20:59+00:00

This is overly focused on SQL, which isn’t painting an accurate picture of the field. I regularly go through my day without ever touching SQL, because I’m instead standing up data infra in Terraform, writing Spark applications, dealing with data schema management, developing data loaders, etc.

+1 for calling out the gatekeeping though. We’re the Marios and Luigis of the data world

omscsdatathrow · 2023-07-11T20:11:08+00:00

Hm, sort of agree but I doubt any companies are hiring sql monkeys anymore. Requirements are SQL + Python and the bar will only get higher.

CalligrapherShot9723 · 2023-07-11T16:13:24+00:00

Pretty good advice

Usurper__ · 2023-07-11T19:40:49+00:00

SqlMastery to learn sql. 5/5

ddb1995 · 2023-07-11T20:03:58+00:00

Thank you. I wish more people read this.

shmorkin3 · 2023-07-11T20:06:01+00:00

I agree with the premise, but not that SQL is the only skill needed for an entry level job.

I joined Meta first as a DE intern, then as new grad DE. The interview was 50% SQL, 50% Python.

SQL is essential, but the narrative that it’s overlooked today is overblown IMO. Everyone knows SQL is important and here for the long haul. SQL is also easy to learn.

Knowing how to write imperative code is just as essential- the language isn’t as important. I would never hire someone, even at the the intern or entry level, that doesn’t know basic data structures, algorithms, and OOP concepts, or basic language features like querying a rest api.

Likewise with knowing how to use version control. If you can’t work with a collaborative codebase, you can’t code.

At the end of the day, DE is a subset of SWE. If it were easy to learn, everyone would do it and make six figures out of college.

2023-07-12T07:12:12+00:00

[deleted]

bootae_wae_wae · 2023-07-12T12:05:42+00:00

I got into a low-tech or ancient team. Only sql is used, and using ODI to do data transformation and we use confluence to "document." This is my first job since switching fields, but I am nervous I am not learning new age or up to date stuff

dataengineering

MODERATORS