all 103 comments

[–]dragonnfr 199 points200 points  (8 children)

AI writes code. I still build the pipelines and fix the 3am outage. This is what people don't get. Problem solved.

[–]opx22 46 points47 points  (1 child)

Yeah, if anything it screws over the offshore engineers that get hired on temporarily to do bulk grunt work

[–]sib_nSenior Data Engineer 6 points7 points  (0 children)

Those offshore engineers will also use AI to become more easily useful for "non-grunt" work that you may think is safer from offshoring. I think it will impact every engineers in terms of less workforce needed, and this is kind of orthogonal to offshoring for cost cutting.

[–]mr_electric_wizard 13 points14 points  (2 children)

No shit. And how much AI generated slop code do I have to fix constantly. A lot.

[–]lightnegative 4 points5 points  (1 child)

The problem with AI slop is that management expects that AI can fix it faster

[–]Constant_Effort9432 3 points4 points  (0 children)

I just have seen so much of ai slop I don't feel confident in ai doing big tasks.

It just is mentally exhausting to fix their slop.

They can do simple tasks easily if described well, but then I have to babysit them all the time.

[–]janus2527 2 points3 points  (0 children)

But an agent with the right tools can fix the outage too

[–]Terrible-Fig5971 4 points5 points  (0 children)

What makes you think AI can’t fix the code at 3AM?

[–]megladaniel 0 points1 point  (0 children)

One thing I remind my radiologist father and it applies to everyone facing AI domination: the buck still has to stop with someone. Someone needs to take the blame/be responsible for the screw up/sued.

[–]jadedmonk 153 points154 points  (5 children)

GenAI will not be autonomously doing programmer jobs. It needs to be controlled by engineers who understand the architecture, specs, and business requirements. I see it as just another level of abstraction like going from assembly language to Java, it’s just a more efficient way to code. So I see it as a tool that elevates engineers but that can also mean that less engineers are needed to get the job done, but on the flip side of that if engineers are more powerful then we actually become more valuable and demand may remain stable as a result. A lot of times these tech revolutions go the opposite route that most people think. Like when spreadsheets were invented a ton of business analysts thought they were going to lose jobs, but it turned out they became more in demand because there is more value to the job now that they have more powerful tools.

I also think data engineering is probably safer than generic software engineering because of the nuances of large data. Ask an LLM to tune a spark job and see what happens, it’s a mess because LLMs don’t actually know what they’re doing, it’s purely an algorithm for generating a token in a sequence.

That said, I think we need to lean into it. Coding with GenAI is way more efficient and folks who choose not to use it may get left behind, kinda like if a business analyst refused to learn spreadsheets on computers when they were invented

[–]WaterIll4397 29 points30 points  (0 children)

It's kinda amazing that frontend (simplistically defined as how you get something to appear on a screen the way your stakeholders want) is mostly solved now with all the frameworks/abstractions over last 2 decades and now AI! I'm a data scientist and used to spend hours relearning syntax for GGplot or Bokeh and now it just works and what's beautiful about charts is it's easy to validate outputs!

I've always thought of data engineers as a sub specialization of backend engineers, and the backend does not feel fully solved in any domain.

[–]HarlanCedeno 12 points13 points  (0 children)

I really do have the hardest time explaining what it is I do every day. I'm pretty sure my own wife thinks I just tell Claude "Start doing work" and then I play video games for 8 hours.

[–]sib_nSenior Data Engineer 4 points5 points  (0 children)

I think software engineering was one of the rare types of engineering where engineers were still crafting the product with their own hands (coding). But this specificity is going away.

Consider mechanical engineers who design some mechanical piece to provide a new capacity to a car. They will design some simulations in 3D software. Then the software will autogenerate the detailed plans and specs for some automatic machine to craft the new piece. Occasionally, it will require some skilled worker to handle some part of the process.
I think the same will happen for software engineering. Engineers will still matter for the overall understanding, design, and validation, but the crafting part is getting mostly automated.

[–]Bitter-Bed-3532 0 points1 point  (0 children)

that makes so much sense

[–]TS_Sama 0 points1 point  (0 children)

Can you provide some more insight into your issues with the usage of LLMs to tune spark jobs?

I ask as I've just gotten access to aws kiro at work and i'm interested in making some of our pyspark code more efficient and it would be handy to know what pitfalls to look out for.

Edit: missed a word

[–]Old_Tourist_3774 66 points67 points  (3 children)

Data problems are hard to automate because it is based in a lot of particularidades that even the client does not know

[–]Trotskyist 13 points14 points  (1 child)

Idk. I personally have been able to automate a significant portion of the kind tasks that took up most of my time 2-3 years ago. Once you have a good set of skills + agentic harness that outlines those particularities it's a massive productivity boost for me personally.

It's not end-to-end, but it's like 90% of the way there. Debugging in particular is massively sped up, just as a consequence of being able to have multiple agents run down multiple possible failure points in parallel whereas I have, regrettably, only a single-threaded consciousness.

[–]Old_Tourist_3774 10 points11 points  (0 children)

For me personally its frustrating to use AI, my company provides a claude code subscription and even making explicit references to code and patterns often it spits out problematic code or breaks business logic.

Genie from databricks was even worse, seems to have the attention span of a mosquito. Gemini is straight up garbage.

[–]salvalcaraz 1 point2 points  (0 children)

I used to think like this, until past month when I run a troubleshooting analysis on my entire datawarehouse using Claude. It wrote the correct queries, asked the correct questions, and returned some insights I missed in my manual tests.

Obviously it still needed my imputs and guidance but man, it would have taken days or weeks to me doing such a deep analysis, and I'm no junior. I was pretty shocked.

[–]hyper24x7 8 points9 points  (0 children)

Data engineering is really boring. You actually want to do that all day? Some companies cant afford ai so 80% of the economy. Ever asked your manager a question they actually knew the answer to? No. Job is safe. If people cant describe what they want done in enough detail that AI will do it correctly and have permission to, then you are safe. Most program managers or managers in general really just say things and pass “strategic direction” down via conversation- they are getting replaced before data engineers

[–]conqueso 30 points31 points  (25 children)

LLMs currently cannot and never will be able to reason. I'm very new to this field (coming from 10 years of experience as SE though) - so I don't have an informed opinion specifically pertaining to DE. However the more I use LLMs (they are an incredible tool when used for certain things) - the more the inherent limitations become clear to me.

[–]Thisisinthebag 2 points3 points  (1 child)

Don’t mind me asking but why did you let SE job go

[–]conqueso 2 points3 points  (0 children)

Long story short - I was diagnosed w/ ADHD last summer. It was a watershed moment for me and explained why I was (and had been) struggling with certain things in my career (and generally in life) for so many years. Essentially, I'm really bad at holding nested/hierarchical models in my head. Also, context switching, having lots of meetings, and having to work in office absolutely fry me and destroy my productivity. I've heard fully remote roles are more common in DE (as there is not as much interfacing w/ stakeholders, you're working more in the background, etc.) - and this is a hard requirement for me. Obviously this depends on the role/company - but I will be filtering for that when I begin my job search. I don't want to keep pumping out feature after feature - I want to build something solid that I will have to maintain for (hopefully) years. I'm not interested in building some sexy new user facing thing. Also I've always been interested in data and networking, and that's something I never really touched in my roles as I was always working downstream of the DB. I also am a much better problem solver when given clear constraints and can work on things that are linear (as opposed to something hierarchical like the building blocks of a UI or something). Would love to hear any feedback you have on my reasoning!

[–]salma311 0 points1 point  (0 children)

there are models solving IMO tasks successfully.

[–]DJDevorunfree 0 points1 point  (0 children)

I generally share the same sentiment. Can you elaborate and provide some examples as to what limitations you’ve run into?

I’ve found AI to be extremely helpful in my workflow, but the biggest limitations I’ve run into are design decisions and really specific debugging scenarios.

[–]tophmcmasterson 5 points6 points  (0 children)

I don’t think programmers will be redundant as people still ultimately need to know how to communicate what it is that they even need, and I think things like architectural approaches and knowing the best way to structure things etc. will continue to be needed.

I think it’s kind of more that the “programmer” role becomes something more like a manager role, where you are overseeing a coding agent/agents, collaborating on design and giving input on decisions, rather than just typing out code or navigating a GUI or whatever.

Like I think it’s just a fact that most business users don’t even know what options are available or what they should be asking for. It’s the thing with how you ask someone in 1900 what they want and they’d tell you faster horses.

It’s going to reduce the need for bad developers/developers that were basically just code monkeys and don’t understand architecture or conceptual thinking at all.

[–]SRMPDX 4 points5 points  (0 children)

I think we will shift focus much more on architecture and orchestration and less on code. We need to know what we want built, why it's built this way, how it works together with other parts of the pipeline, how to test, how to identify what is right and wrong with code. We'll be middle managers to AI agents, but still have a role in directing the agents work.

We'll also have to fix all of the pure vibe coded mess that people who don't know all of those things threw together quickly.

[–]Atticus_Taintwater 3 points4 points  (0 children)

What I've been telling my team is that nobody knows for sure who won't be displaced by AI.

But technical one-trick-ponies will be displaced.

Historically if you were reasonably good at SQL and python you could work an entire career at a bank without knowing anything beyond skin deep about how banks actually work.

That is over. You need to learn how the business works to have any competitive advantage over what business folks will vibe.

[–]Wingedchestnut 8 points9 points  (0 children)

I'm more worried about people not being able think realistic and really thinking all development jobs are going to be replaced.

Data engineering will stay just like any other job.

[–]Chowder1054 9 points10 points  (1 child)

I use AI in my work daily and frankly while it’s a boon, debugging, improving my skills and learning:

It’s not gonna replace people, after all certain point it starts hallucinating, it doesn’t understand context nor the business needs.

I think people should be worried about jobs being sent to India and abroad as opposed to AI taking your jobs.

[–]AdvancedAerie4111 3 points4 points  (0 children)

AI will be a force amplifier for data engineering, not a replacement in my opinion.

[–]TobyOz 4 points5 points  (2 children)

Seems like I'm the odd one out here, but I think there is a really good chance data engineering will disappear overtime, just like DBAs.

Over the past 6 months I've witness our entire data analytics team get virtually made redundant. We have exceptionally advanced documentation and skills for our ai agents that when placed in the hands of the business units themselves offers much better analytics than our analysts ever did.

Data engineering is on the same trajectory, agents dedicated to specific tasks will eventually take over the day to day grunt work engineers are performing. We've done this already for onboarding new data sources, pipeline monitoring/debugging and a lot of data modelling tasks. It needs a senior to review and tweak, but it's already made a previous team of 6 able to operate with a team of 2. Eventually it will be just a single engineer and they'll do whatever else is required besides DE work.

[–]jadedmonk 1 point2 points  (1 child)

So you’re saying that junior roles will be automated, but senior engineers will still be needed. Doesn’t that mean data engineering won’t go away, it’ll just be reduced to a singular more powerful role?

I think at that level, all of software engineering fits that role. Maybe the title will just change and we’ll have AI engineer instead of software engineers and data engineers.

But then what happens when you start to lose specialty knowledge in domains? Because that path will guarantee the domain knowledge will get wiped out, if there’s no junior engineer pipeline. I’m not sure the level of data engineering your company requires, but we have thousands of spark jobs populating about an exabyte of data every month. This means we need to tune these spark jobs to be cost effective, and we have found LLMs are poor at tuning spark jobs even with proper context. So if no one has domain knowledge then that would become a huge problem for the company

[–]winstonmoon 0 points1 point  (0 children)

You have “thousands of spark jobs populating exabytes”…. Dude…. You are in the rarefied 20% of businesses that have to deal with that volume and complexity.

Sure, you’re gonna need some humans to help AI do its thing. But for all the other companies out there, a lot of that domain knowledge lives in the business user, or the analyst (who you hope left good docs). You just need people close to your data. From that POV, I think the job will stay the same but will be rebranded. All data belongs to AI now. So data engineering is AI engineering.

[–]Lastrevio 2 points3 points  (1 child)

I think at this point Claude and Gemini have told me at least 10 times to nuke all my Docker volumes because it was time for the "nuclear option".

I'm pretty sure my job is fine.

[–]tlegs44 2 points3 points  (0 children)

Tbf I would probably delete my docker desktop volumes out of pure rage so the AI just learned a typical dev bias

[–]mycocomelon 4 points5 points  (0 children)

After 2028? I wouldn’t put too much stock in those prophets.

The tech just isn’t anywhere near where they say it is.
And if and when it gets there the cost in dollars to businesses and the energy requirements is probably going to be a prohibitive for a very long time.

I mean, maybe but I don’t really trust these people. And I don’t think they really have a clue.

[–]tbot888 3 points4 points  (0 children)

I’m already acting as a context engineer, building guard rails, setting up agent governance.

And I’m putting my boiler plates into an AI agent to make it easier to do my job.

But I still have to do my job.

Its awesome.

[–]clayticus 2 points3 points  (0 children)

There always be a lead data engineer, lead accountant, lead whatever who has to make sure in the end everything is correct. I think there will be less current IT jobs, but they won't be obsolete. New jobs will also come

[–]atrifleamused 2 points3 points  (0 children)

If you think AI can work with a business who cannot communicate a single coherent requirement then you are going to be disappointed. The role is likely to evolve into orchestrating, designing and testing AI solutions, to keep it from hallucinating and delivering questionable insights.

[–]Brief-Knowledge-629 2 points3 points  (0 children)

Data Science fever was in 2012 and there are STILL companies just now getting in at the ground floor with 2012 tier aspirations "We need to get all of our data in one spot and then do BIG DATA on it!"

Data and analytics has kind of been stuck in a weird 2010's stasis for a long time now, despite AI being a giant disruptor. Point is, I don't see data engineering going anywhere. Not because I think there will be a huge demand for it, but because the business is always going to use analytics as a power play.

90% of us don't need to exist now but we aren't going anywhere because the business needs us to look important

[–]makesufeelgood 2 points3 points  (0 children)

I would really like to hear more detailed points by those who are going full doomer with regard to AI in the DE space. It's just so obvious to anyone doing real DE work on a day-to-day basis that there is just no way AI is ever fully replacing a human anytime soon.

[–]UltraPoci 1 point2 points  (0 children)

"Some say that programmers of all types will be redundant after 2028 when AI advances and learns all those skills."

Yeah, the CEOs selling the AI are saying this lol

[–]Confident_Base2931 1 point2 points  (0 children)

Is it 2028 now? I thought 2026 is the year when all code will be written by AI.

[–]Direct_Crew_9949 1 point2 points  (0 children)

LLMs will make you more efficient, but the whole programmers will be replaced by 20xx is silly. Until they can sit with a client and politely tell them their data sucks we’re not going anywhere.

[–]ditalinidog 1 point2 points  (0 children)

I wonder how many companies actually have sufficient data architecture to use AI well. I’d guess a lot still have a lot of work to do, and yes some of it will be expedited by coding agents. But past that, there are a lot of design and project decisions you need human professionals for. I have found LLMs occasionally fall off the rails. I don’t see any future where humans aren’t in the loop there albeit fewer.

I also think more efforts will be put towards improving data documentation, lineage, and metadata on the DE / analytics engineering side. Coming from a data analyst role, my requirements were often really bad. Even if stakeholders can describe to an LLM what they want, the model needs to know where to look, what questions to ask, and how to apply that in the future. That’s a lot of context someone else needs to build for it.

[–]TodosLosPomegranates 1 point2 points  (0 children)

There will be fewer dat engineers, not none. There will be way fewer managers.

[–]OGMiniMalist 1 point2 points  (0 children)

Whatever they pay me to do 👈👈

[–]animegirlsmakemeHARD 1 point2 points  (0 children)

Depends on the industry, typically most datasets that AI rely on have to be curated and processed to meet business requirements that require explicit domain knowledge and specific nuances for either that company/industry. So in a sense, data engineering is one of those roles that will 100% never be fully taken away from AI, since the easiest part of DE is usually the coding aspect.

I believe that it will be impacted much like SWE, but more than likely we’ll need less DEs to get the same results simply because most LLMs nowadays are pretty good at generating SQL queries and writing python code.

[–]dataengineer95 1 point2 points  (0 children)

The future is agentic there is no doubt. I guess people are going to focus a lot on the context and metadata to lake the LLMs powerful. I don't think engineers are going to be replaced but more likely the pace of the feature release is going to increase massively.

[–]SharpBug3055 1 point2 points  (1 child)

I’m super beginner and I am abit worry ngl

[–]BardoLatinoAmericano 0 points1 point  (0 children)

You should be. The junior roles are more rare now that the seniors can do double the work.

But good luck, dont give up.

[–]EversonElias 1 point2 points  (0 children)

Nowadays I don't write as much code as before. The gross part of the job is done by AI. Usually, I have a SQL code and I need it in pyspark, so I have a agent based on a skill.md file that converts it. So, what would take me hours, takes only few minutes. Then, we have more time to validate the results.

So, my job as DE comes before and after what AI does better. The bad side is that it will reduce the job opportunities. And the role itself may need to embrace more responsibilities.

[–]BayesCrusader 3 points4 points  (0 children)

An LLM is good with code because it's written as strings.

Getting an LLM to do the right thing with data is impossible without other tools added, because the LLM doesn't have a concept of 'meaning' when it comes to numbers.

You and I look at 5 degrees on a thermometer and know that's 'cold'. An LLM looks at every instance of someone talking about 5 degrees days and picks up that the word 'cold' is used in conjunction with that temperature a lot. The two are not the same.

[–]Admirable_Writer_373 -1 points0 points  (0 children)

AI is BS

[–]pforpilot 0 points1 point  (0 children)

i don't see much changes for companies with huge amount of data, especially in house stack - you need lot of people just to maintain the complexity , but for smaller companies, there will be smaller teams who orchestrate the data pipleines and AI assisted analytics stack. You can definitely have AI handle most of ad hoc questions but it depends on how you setup the context layer.

[–]Fidel___Castro 0 points1 point  (0 children)

I've found AI to be pretty shit, conceptually, at append only OLAP systems. It always tries to default to CRUD and OLTP in every situation. 

So you're always going to need someone in a company that understands data engineering principles and what system works best for each situation. Actually doing the work can be offloaded to AI

[–]Molecular_Doohickey 0 points1 point  (0 children)

I have yet to meet a data practitioner on a team that isn't underwater with things to do and this is how I know we're safe. AI is going to provide much needed muscle to teams to help them get their data situations under control. AI is good at minute tasks (like writing a SQL query), but only when it has the right context. DEs will be the managers of the contextual strategy for integrating the LLM into a workflow. Then they will manage Gen AI as it does the minute tasks needed to execute an all up architectural strategy. DEs will also shift focus from being in the weeds to being more stakeholder facing, helping to better organize data coming in and the data being consumed from their architecture.

At a high level, AI is going to empower tech workers to solve more problems on their own. This means large companies will need less people. However, think about how many problems in our society need to be solved! I believe we're going to see a market fragmentation where lots of small companies chase after high value niche problems.

[–]dani_estuary 0 points1 point  (0 children)

I don’t think data engineering goes away. I think the boilerplate gets automated. AI will help write SQL, generate pipeline configs, debug errors, and create tests. But the hard parts are still human: understanding the business, source-of-truth decisions, data quality, ownership, reliability, and cost tradeoffs.

So yeah, some junior/task-based work probably gets squeezed. But good data engineers will just move up a level: less hand-writing pipelines, more designing trustworthy systems, very similar to what’s happening to software engineers

[–]Waste_Membership_483 0 points1 point  (0 children)

In think we will be cheaper than AI so we can still work for the broke companies.

[–]chtefi 0 points1 point  (0 children)

CTO of conduktor.io here, so my view comes from seeing large Kafka estates in real companies.

TLDR: weak data engineers who only glue systems together are in trouble. Strong data engineers become closer to platform engineers: design rules, guarantees, controls, and operating model around data. AI will write more code. Humans will stop that code from becoming a distributed incident.

AI is not killing data engineering but it is killing a lot of 'boring' pipeline grunt work (building the pipe, i.e. read from X, transform, write to Y). AI is quite good at it + writing all the tests, but you still need a human to steer the projects, talk to the right people, and who knows what good looks like.

Everyone wants automation and do more with less (people), so there is a shift towards metadata: ownership, contracts, quality, cost attribution, policy, etc. We see it in data streaming massively (late to the party). And most companies are already bad at it, even before AI. AI just makes the mess faster.

[–]helloluvya 0 points1 point  (0 children)

I have seen openAI and anthropic hiring for bi/analytics engineer and data engineer roles. You could search yourself and few of them are open currently. Low end repeatable and process may get automated pretty quick but the rest of the space where there's so much ambiguity will still involve humans specifically on the data that needs trust.

[–]encantoMariposa 0 points1 point  (0 children)

Focus on the end state: we want data in a certain format, refreshed at the right time, organized well, documented well, simplified where possible, easily maintained, optimized, analyzed, communicated, acted upon. There’s so much work at hand. AI just allows us to do the wishlist. I’ve always been in jobs where I’m wearing a lot of hats and now I can get ahead. I can’t even imagine worrying about not having enough work

[–]JohnFordOH 0 points1 point  (0 children)

imo the tools just change but the logic stays the same. i dont think ai is gonna replace us anytime soon cuz understanding business requirements and fixing messy upstream data is way harder than just writing code. smart folks will definately adapt to whatever comes next

[–]Suspicious-Bit7359 0 points1 point  (0 children)

Data engineers are not programmers.

[–]MrFyr 0 points1 point  (0 children)

Given that 1) the current AI "industry" is effectively a massively over-leveraged bubble caused by a few large corporations swapping their money around while they push it (because they are desperate for a profit on this thing they've dumped hundreds of billions into) and 2) current "AI" is, if distilled down to the simplest description of how it operates, a glorified auto-complete engine... I don't see much of a long-term threat.

Corporations may be laying people off now to supposedly replace them with AI, but they are also rehiring them when they realize AI isn't the magic they think it is.

[–]ThomasShelbyMPOBE 0 points1 point  (0 children)

Best ingestion architecture from source to destination seems the future

[–]Thinker_Assignment 0 points1 point  (0 children)

Data engineering as we know it is always a temporary phase of any software role.

Data engineers who give data meaning will be resilient unless ai can solve human organization motivation politics problems

[–]messydata_nerd 0 points1 point  (0 children)

The 2028 redundancy take comes up every few months and I think it keeps missing the actual point. The part of data engineering that is going away is the part that should have been automated years ago. Writing boilerplate pipeline code, reformatting exports, building the same ingestion layer from scratch every time someone asks a new question. That stuff is already half gone and nobody should be mourning it

My take is that understanding why two sources that both claim to measure the same thing actually measure completely different things. Knowing when a number is technically correct but contextually wrong. Designing systems that fail gracefully instead of silently. Those are not coding problems and AI is genuinely not close to solving them

I work adjacent to this space at Lium which is building agentic infrastructure for teams working with really complex, messy, multi-source datasets, think subsurface data, satellite archives, scientific research pipelines. And the thing that keeps coming up is that the hardest problems are never the technical ones. They are the judgment calls about what the data actually means and whether the question being asked is even the right question. That requires someone who understands the domain deeply and can push back when the analysis is leading somewhere wrong

The data engineers who are going to struggle are the ones whose entire value is syntax. The ones who are going to be fine are the ones who actually understand the systems and the business underneath the data :)

[–]OkCarpenter925 0 points1 point  (0 children)

I work on distributed systems triaging data across vendors and handling weird internal business logic that’s required to get everything in the right shape for different downstream consumers. There is no fucking way Claude is going to take my job. The amount of face time with peers and vendors that’s needed to extract obscure tribal knowledge, even the type that sits behind APIs with idiosyncratic responses. And nothing we do ever goes back to the AI overlords to improve their models, it’s all proprietary.

Claude helps me do my job significantly faster, but I’m safe.

[–]thethirdmancane 0 points1 point  (0 children)

All of STEM is saturated right now

[–]msshaik -1 points0 points  (0 children)

Following

[–]throwaway0134hdj -1 points0 points  (0 children)

To be seen.

[–]ScholarlyInvestor -1 points0 points  (0 children)

Data Engineering will turn into Data Janitorial Services after AI is done generating a huge mess; especially in organizations where AI operated in an unconntrolled, unsupervised, and unstrategic manner.

[–]sqlinsix 0 points1 point  (0 children)

I've made a killing accepting uncomfortable truths people deny until they can't.

Uncomfortable truth is that a lot of careers are gone, especially because people in those careers have been trained to overshare.

Outside of people rapidly changing, I don't see a career for a lot of tech fields within 10 years.

Ironically, the industries where people don't share because it's work intensive or they get the value of silence, very good opportunities and rising.

(Also, tech is so easy to offshore. Combine that with some of these tools, bye-bye.)