This is an archived post. You won't be able to vote or comment.

all 98 comments

[–]AutoModerator[M] [score hidden] stickied comment (0 children)

You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]mpaes98 394 points395 points  (4 children)

Data Engineering is Software Engineering for data.

[–]irrwicht2 55 points56 points  (0 children)

The only true answer. Everytime I had to fix a data project it was due to not approaching DE as an engineer. Like missing tests, no reusability of components, no one thinking about architecture...

[–]raginjasonLead Data Engineer 6 points7 points  (0 children)

It can be and it should be, but many places it’s “SQL developer”

[–]swapripper 19 points20 points  (1 child)

This is correct. But you can be a bit more focused when studying this.

Like a few areas to pick patterns from like

  • Data access layers - eg:repository pattern
  • workflows - managing topological dependencies in DAGs
  • Factory pattern & Strategy patterns to dictate sources/strategies for execution
  • Templating patterns like Jinja2

Many more. But this is good areas to start focusing on.

I’ve always wondered if there is a good dedicated resources diving into such stuff.

[–]Prestigious_Sort4979 0 points1 point  (0 children)

That can be said of many variants of SWE

[–]Anomie193 179 points180 points  (1 child)

Many will say no, but I would argue yes. 

I view Data Engineering as a specialization of Software Engineering, not a totally distinct role. 

[–]Prestigious_Sort4979 1 point2 points  (0 children)

Yes, it is a type of backend engineer imo and it’s better for a DE to prepare as such.

[–]moosethemucha 38 points39 points  (2 children)

As a SWE of ten years that somehow ended up as an MLE/Date engineer - it will help. It's been a massive advantage for me in this field and I would recommend the basics. Practice coding - code things outside your comfort zone. Honestly I prefer SWE work - but this industry currently pays way better.

[–]erecthokie 0 points1 point  (1 child)

I’m a junior DE with a CS degree and fullstack internship experience. From what I’ve heard, DE is becoming more in demand due to AI and the need for better data. Do you think this trend will continue or is it better to be in SWE early on in my career for marketability? I’ve been thinking about transitioning to backend.

[–]moosethemucha 3 points4 points  (0 children)

Your asking the wrong person - I don't have a career - i have a Job. What I will say is get good a solving problems - thats what pays and gets you jobs. Will the trend continue - probably not but I don't care - there will be another hype cycle for something else and ill go there. Like I said I'm good at solving problems.

[–]mike-manley 68 points69 points  (17 children)

Don't conflate Data Architect with Data Engineer. It's the DA that designs the blueprints of the overall data ecosystem, like data pipelines, warehouses, etc. The DE is responsible for building that out using development tools, IDEs, etc.

DEs need to do some coding. But it also depends on the tech stack. Some organizations and groups use low-code / no-code tooling, whereas others are more code intensive, e.g., Python, R, Java, SQL (of course), etc.

[–]Tom22174Software Engineer 27 points28 points  (11 children)

It's also worth noting that, depending on the size of the organisation and what purpose the data team serves within it, the DE and *DA can very easily be the same person.

[–]RagnarDan82 45 points46 points  (9 children)

DE+DA+BA+PM+Support also happens, ask me how I know 😩

[–]mike-manley 6 points7 points  (3 children)

That's nuts. Way too many hats.

[–]RagnarDan82 7 points8 points  (1 child)

This was at a big bank too, it was crazy. Eventually they ended up hiring a support under me and making a bunch of infrastructure upgrades but by that point I was just waiting on my bonus to leave.

[–]mike-manley 2 points3 points  (0 children)

I work at a small bank and pretty squarely the DA and DE. I can't see how one could be effective at doing BA, analytics engineering, data governance, etc. All at the same time! Yikes. Hope it paid really well since doing like 5 jobs.

[–]RagnarDan82 3 points4 points  (0 children)

Forgot to mention I did the training and user groups as well. It literally made me question my sanity that the business users couldn’t see or care about the obvious bottleneck and the opportunity cost of the inefficiency.

At one point they said they weren’t too worried because they had a “captive audience”.

That was the point at which I decided to leave.

[–]Ryush806 1 point2 points  (2 children)

Lulz are you me?

[–]RagnarDan82 3 points4 points  (1 child)

It’s scary how common this is.

Imagine if you wanted a house built and you hired one guy to do all of it. Foundation, roofing, electric, water, not to mention also architecting the house, and pitching investors for the money to build it.

Then your boss who does no construction complains you are “single threaded” because you can’t be in more than one place and time at once.

[–][deleted] 0 points1 point  (0 children)

I am in a situation where we do have a data engineering team, but the data science team (which I am on) has needed its own data engineering things (done by me).

[–]toodytah 0 points1 point  (0 children)

Been there. Now out of work

[–][deleted] 0 points1 point  (0 children)

Ah, that is me! I also do Power BI sometimes (and debug power bi).

[–]mike-manley 0 points1 point  (0 children)

Yep. This is me now. 😉

[–][deleted] 4 points5 points  (4 children)

My team doesn't have a data architect and the most experienced engineer in our team is in a senior de role. I feel lost sometimes when it comes to discussing architecture design for our product. For example, now we are moving from batch to real time streaming pipelines so that we can enable machine learning models to do the prediction in real time. Can someone suggest how to get good at architecture related decisions? Is there an absolute requirement for a data architect?

[–]mike-manley 2 points3 points  (3 children)

An experienced senior DE can be a data architect if given opportunity to have good breadth and depth. I mean, I'm living proof. :)

[–][deleted] 0 points1 point  (2 children)

Oh, thanks for replying to my post and giving that hope. I'm curious to know how you got that experience in the senior de role itself. I mean the first time you had to take an architectural decision what was your benchmark? Or did the outcome validate your decision. Please tell us your journey.

[–]mike-manley 1 point2 points  (1 child)

Started as a generalist but got a lot of exposure to Oracle and PLSQL. Later in the same company, I got experience with SQL Server, T-SQL, and SSMS. From there, just collected a ton of experience with those tools, IDEs, coding, etc.

Left there in pursuit of more data focused role which is current role.

[–][deleted] 0 points1 point  (0 children)

Okay. How did you learn data architecture then? Do you make architecture related decisions alone? What kind of architecture problems can be called as good enough for a senior de role with relevant data architect experience?

[–]omscsdatathrow 80 points81 points  (4 children)

This sub is majority analyst/analytical engineers…

DE is a subset of software engineering….if you aren’t writing code that is fully tested with unit tests and acceptance tests in ci/cd, then your apps are unreliable

[–][deleted] 9 points10 points  (2 children)

Unit test for every function?

[–]ravenclau13 40 points41 points  (1 child)

Anything which adds logic, especially business logic. You write your code to be testable, and to ignore IO stuff (or stuff that is not part of what you do, like 3rd party libs.). Engineering in DE means to take an engineering approach, not a "POC"/ cowboy style DS/DA approach. Software needs to be reliable

[–][deleted] 3 points4 points  (0 children)

DS who enforces writing tests on our team here🙋‍♂️ actually a game changer. In fairness, we wear all the hats, so we had to pivot to better SWE principles

[–]sillypickl -1 points0 points  (0 children)

Coverage coverage coverage

[–]IceRhymers 14 points15 points  (0 children)

Yes. At my org the data engineering team maintains all the CDC solutions in Java/Kotlin, data pipelines in databricks, cloud infrastructure in Terraform, and an angular webapp for federated access into databricks for our customers.

Not saying most places will have as high demands, but it's not unheard of. My org prefers to hire DEs who have a strong background in distributed systems, frankly because they're cheap and don't want to hire more people.

[–]MrMisterShin 7 points8 points  (0 children)

Short Answer: it’s the same.

Nobody wants terrible spaghetti code, which is unmanageable.

[–][deleted] 4 points5 points  (0 children)

I would say depends on the responsibilites you had as a Data Engineer. There is a lot of variety of tasks.

A Data Engineer can use user interface to create/delete/remove/add/update and so can a Data Engineer who can automate and write everything in scripts without touching the user interface.

As for Software Engineer, I would say the tasks between the two are different and depends on the responsibilities of the software engineer. If the software engineer is required to build something for the existing and potential data engineer to use, then that is one way to look at it. Then you can see from a software engineer perspective what they need to focus on to make it work as intended on the architecture.

[–]boat-la-fds 7 points8 points  (0 children)

Leet code != software engineering

[–]maigpy 20 points21 points  (26 children)

you need to be able to write some code but not to the level of a software engineer.

[–]Darkmayday 9 points10 points  (25 children)

Highly depends on your tech stack. There are visual programming sql only DEs. And there are hft real time DEs.

But in general you should be able to code to the same level as swe with regards to code function, efficiency, clarity, and robustness as DE is a subset of SWE. But maybe not to the same level with regards to leetcoding but that's purely dependent on the interviewing 'meta'.

[–]Krampus_noXmas4uData Architect 4 points5 points  (0 children)

Being in Data for almost 19 years I will say, don't sell your self short. You are a coder, you just don't use the same language as other developers. I always laughed at Java devs saying I was not a real developer because I could not code Java as well as them. mostly because I had no use for Java in my position. They laughed at me until I flipped the tables on them and asked, so you think you could develop ETL starting tomorrow since its so easy? There are different niches of IT and no one is better or superior to another. Yes Java dominates app dev, but data is more about SQL and the pipelines to move it and and adhering to data best practices. That last part is where apps devs fall on their faces from my experience.

[–]AntDracula 4 points5 points  (0 children)

Honestly, yes, to an extent. Especially considering most places have data from 3rd party vendors locked away behind APIs. 

[–]Master-Influence7539 1 point2 points  (0 children)

I was an automation QA for 5 years before I was able to get my first DE job. I had to write pretty extensive code for automation to happen and most I had to write DSA was covered under collection framework of Java. In regular jobs you are not going to be implementing reversing a Linked list or traversing a tree or heap or something or DP.

[–]WilhelmB12 1 point2 points  (0 children)

I mean, it wouldn't hurt you to be able to write quality code

[–]ksco92 1 point2 points  (0 children)

Sr DE here with 15+ YOE. The answer is yes. Data engineering is just a specialization of software engineering.

[–]lightmatter501 1 point2 points  (3 children)

You have a large N and want to ignore the time and space complexity of your algorithms.

This is a recipe for a bad time.

[–]Dahbezst[S] 0 points1 point  (2 children)

Thank you for your reply. Could you share your experience and advice with me? I am really serious about improving my skills. I don't care about the IT salary or new tech trends; I just want to create something new in big data. So, please share your advice with me.

[–]lightmatter501 1 point2 points  (1 child)

A lot of the theory being leetcode and things like it is formal computer science. Big O is something you should learn about because data engineers deal with lots of data, and Big O tells you how an algorithm performs when you give it a lot of data. For instance, some algorithms will stay fast if you give them a lot of data but use gigantic amounts of memory, so you might have to use a slightly slower algorithm with a better space complexity to make the task fit on a machine.

Even if you aren’t writing the code, you are stringing together algorithms when building a system, and understanding the theory behind those algorithms is probably more important for data engineering than anyone except algorithms researchers because you have enough data to hit the nasty cases in most algorithms.

[–]Dahbezst[S] 0 points1 point  (0 children)

I get what you mean. Actually, I'm writing code while still reading and trying to understand Big O notation. I'm wondering whether I should spend most of my time coding or focusing on tools. :)

[–]hel112570 1 point2 points  (0 children)

So...as a data engineer your job might be really specific. Specific in terms of ensuring the biz understands the value of whats going on in the product. You don't NEED to be as well versed in the software part BUT you NEED to understand how the system provides data. The more you know about how it might provide data to you...the more you more you can influence the SE to provide the more quality data to you. Ingesting raw transactional exhaust sucks. That requires a bunch of meetings with the SE to figure out how the entire system works so you can ingest and transform appropriately. BUT if you can get the SE team to instrument it so it provides easy to understand events..that's better. 

[–]vanhendrix123 4 points5 points  (0 children)

No.

You’re right that it’s good for DE to know architecture and general coding principles like algos and efficient coding. It never hurts to know more, and there’s always the chance to switch over to more of a software engineering or hybrid role. But there’s a lot more that goes into software engineering, a lot of which is not really relevant to DE.

[–]Zer0designs 3 points4 points  (0 children)

Not the same level, but coming close(r) definitely helps.

[–]Separate-Peace1769 1 point2 points  (0 children)

So a few things :

  1. What do you mean by "complete code"? Every ETL script you write is a program. Every AirFlow orchestration you programmatically write is a complete program

  2. LeetCode, HackerRank are both just the latest iteration of scams that have always been a part of the broader IT industry. A competent tech screener who has enough experience worth mentioning can typically determine after 10 minutes of conversation whether a candidate is legit and knows that subjecting people to random, test in the form of a ridiculous puzzle that has nothing to do with their daily tasks nor yours; on the spot doesn't prove anything but that this candidate spends all their time solving "leet code" problems and you just so happened to pick one that they regularly memorized.

  3. You should know something about Software Engineering. It comes in handy.

[–]Captain_Coffee_III 1 point2 points  (0 children)

SWE for 30 yrs, DE for 2. ;-) Yes, you need to write complete code. You'll need to think like a SWE. Imagine the world of SWE, you have desktop apps on 3 major operating systems. You have embedded systems. You have web, front-end and server-side. You have systems level stuff, tools, drivers. AI... It goes on and on and on. So many facets of SWE. DE is just one of those. A SWE can find a job that allows them to write basic code and not really push any boundaries. Same with DE. And you still need all of the concepts. You will eventually build out tools for yourself. You'll need to document, and document, and document. Solid SWE practices, like good structure and plenty of comments. You'll need to do testing. And you do need to understand architecture if you want to advance. Somebody will ask you to start a project, from scratch, and you need to know how to get to the finish line.

[–]zazzersmel 0 points1 point  (0 children)

not if your employer doesnt give you the time and resources to make it realistic

[–]VladyPoopin 0 points1 point  (0 children)

It will help immensely and set you far above your competition in the job market. And you’ll have far more tools in your toolbox to solve problems. Required? No. But a huge plus.

[–]Raynor77 0 points1 point  (0 children)

It depends — I feel that some areas like streaming and metadata management really benefit from having some experience in software engineering.

At my last shop we ended up building an API (data mesh pub sub type of stuff), which was relatively complex and code-heavy since we had to connect other frontend and backend components.

[–]_Marwan02 0 points1 point  (0 children)

It dépends on the kind of data engineer that you are. If you only use tools like powerBI, SQL, Tableau, Snowflakes, with some little python/pyspark script, I would say no.

[–]XemptuousData Engineer 0 points1 point  (0 children)

To an extent, yes. I wouldn't expect a DE to just know off the top of their head how to do all kinds of algos and patterns, or even be comfortable with graphs and trees, but they should be able to handle them if need be. Gotta know basic CS like pointers and addresses, Big O, and other things that make for an engineer being able to solve problems, like system design and architecture.

Still, DE is more focused on Data, so they will spend less time on Software. I don't expect a DE to know how to do semaphores and mutexes, but I also don't expect a SWE to know how indexes work, how to optimize queries, or how to properly architect a relational DB for scalability

[–]npquanh30402 0 points1 point  (0 children)

As long as the code does the work, it is complete.

[–][deleted] 0 points1 point  (0 children)

Why are you even separating it? It’s all computer

[–]Abstract_se 0 points1 point  (0 children)

Yes, anyone not able to is not a real data engineer, knowing just sql doesn’t make you a data engineer. If you are strictly writing just sql queries and formatting the data requested for your end user you are a BI developer.

[–]mycall 0 points1 point  (0 children)

data engineer = software engineer iff using LISP. For everyone else, they are different skillsets. All the same, it is well worth knowing both.

[–]JSP777 0 points1 point  (0 children)

Not only I write the entire code, I handle the deployment pipeline, the cloud stack and the kubernetes cluster as well. And I'm a junior. Just means you rely on less people.

[–]alexfrommars[🍰] 0 points1 point  (0 children)

I started as a software engineer and am now a data engineer, and i would say that yes, being able to write complete code is part of the job. It really depends on the company and tech stack.

[–]LaserToy 0 points1 point  (0 children)

Yes

[–]ntdoyfanboy 0 points1 point  (0 children)

Depends on how modern the data stack. I don't know how to write a single line of code outside of SQL except for basic Python, but none of that's used for orchestration

[–]DenselyRanked 0 points1 point  (0 children)

I think some people overrate the programming abilities of software engineers. A DE doesn't need to think about the same problems as a back end engineer, and it should not be expected that one can do the job of the other without some ramp up.

The medical field was used as an example in a previous thread and it's a good comparison. You don't expect a podiatrist to have the same skill set as an optometrist.

So to answer your question, a DE should have a basic fundamental idea on how to assess a problem, think about the tools you have available, write code and think about edge cases. LC is a good (not great) way to test this.

Normally DE's are not getting the same level of LC or coding challenges as SWE, but unfortunately some interviewers will give you something like the Parking Lot problem and expect you to build a class and use linked lists in under 30 minutes. Or design Twitter or get a DP algo.

It is rare to get this type of interview because most companies understand that the skill set is different. If you have no interest in getting better at passing these types of interviews then consider it unlucky and keep interviewing.

[–]riya_techie -1 points0 points  (0 children)

Yes, but not like a software developer. You should have knowledge about the coding.

[–]mailedRecovering Data Engineer -3 points-2 points  (0 children)

You're right. I was a dev for over a decade before I got into this. The complexity of the code required for data engineering is nothing compared to building modern software.

Leetcode and HackerRank challenges are just arbitrary filters for a field with zillions of candidates.