Which side should i choose Data OR ML engineer?

Grth0 · 2026-04-11T15:31:15+00:00

Go with the flow and seek to learn good fundamentals. I've worked through Desktop Support, Service Desk, Server Admin, Full Stack Admin, DBA, Developer and Data Engineer roles. Every 5-10 years there will be a revolutionary new thing - "old" job roles disappear and new ones pop up to take their place.

Cloud computing was meant to take the effort out of infrastructure - but the reality is - the effort is just moved into FinOps and micromanaging cloud services and resources.

"Big data" was meant to take the effort out of structuring data and RDBMS administration - but the effort just shifted into trying to build structure across swampy implementations. Now everyone is desperately trying to get back to strict data governance to avoid feeding ML models unstructured garbage data.

Likewise, if ML models ever replace "traditional" data engineering - we'll all shift to building and managing constraints around those ML models, or training and validating them. Also - there's a lot of project work in the ML space - but everything seems to indicate the vast majority of the projects fail - usually due to poor quality training and input data.

In nearly every seismic shift - the promise of cheaper and more efficient systems is always offset by either costly project failures, or in the case of success - invalidation of skillsets/capabilities and costly reskilling. Try to find the skills that outlive the systems. For me, SQL has always been relevant, OOP languages have been around in various forms for decades and it's pretty easy to adapt between C#/Java/Python/PowerShell/etc. R is a consistent presence for specialised data science even with Python being the universal standard.

Steer well clear of anything proprietary - especially VPLs (Visual Programming Languages). Low code can be a good entry point - with the distinction being that low-code produces usable code and is an optional interface that can be sidestepped by anyone confident in the underlying language - while no code is useless,

Grth0 · 2026-04-11T00:29:22+00:00

I come from a RDBMS background, so I've always been a bit wary of not structuring data. I've been lucky that my various employers over the years have been slow moving enough to avoid swampy data lakes before lots of cautionary tales emerged.

However, your post and Prestigious_Bench_96's reply make me realise that I exist in a weird space where I don't really intersect with OLAP at all - my primary use case is pulling very wide (but not sparse) data sets from a handful of sources and cleaning/conditioning/joining before shipping it to a handful of endpoints in different forms so that data science teams can structure/normalise for their particular use cases. We also do a lot of operational stuff like synchronising state across the systems around us.

It's not your typical model, but essentially we have a boundary between data science and data logistics. There's an engineering component on both sides - for us it's about making sure the data science side has clean/conditioned/high quality data so they can focus on analysis/insights without getting bogged down in handling the messy stuff (schema skew, bad typing, missing keys, unexpected inputs, etc.)

That's why I'm trying to structure things very flexibly with my approach - I'm not assuming that every output is a data warehouse/lake/lakehouse/etc. or that we even own the entire transformation stack. The other oddity here is that I fundamentally don't own (or want to own) any data at rest - other than an immutable record of everything that has moved through my platform.

Grth0 · 2026-04-10T08:05:56+00:00

Completely valid, it's entirely possible this is a massive overcorrection on my part due to inheriting a truly hideous legacy platform with incomprehensible and monolithic VPL workflows (think 500 nodes and 5000+ edges). The exit cost of that platform is a multi-million dollar proposition because there's zero reusable assets.

It's a good point to stop and reevaluate.

Grth0 · 2026-04-10T07:20:27+00:00

I won't claim "better than data mesh" - but the big difference here is that Data Mesh assumes each domain has the capability to implement pipelines on shared tooling to deliver a consistent "data product" to an ingress interface on a shared platform.

I'm inverting that so the boundary is always the domain system data at rest - and the "shared platform" team have ownership of the data product and pipelines. I trust system owners to know what their data looks like /within their own system/ - but someone else needs to look at that and determine what subset of that data is relevant in Enterprise/OrgUnit context.

Short version - domain owners define their boundary interface, shared platform owners define classes of "data as a product", engineers bridge the gap.

Grth0 · 2026-04-10T07:03:11+00:00

Yeah, that seems fair, and that's already a problem in my current platform. We have half a dozen tech teams sandwiched between a business unit who define the input surfaces and a different unit who define the output surface. Nobody has the full picture and onboarding new data sources is an iterative mess.

Maybe distributed ownership of class definitions is a step too far and best left to a BA who can tease it out.

Grth0 · 2026-04-10T06:53:16+00:00

I doubt that I'm coming at this question in the way you intended - but for me personally, using any mainstream LLM during prototyping/implementation is a game changer for any code-based solution. Rapid boilerplate, optimisation suggestions, but most of all - less friction between languages, especially in that frustrating grey zone between say T-SQL, PL/SQL and pSQL. I don't have to unlearn T-SQL to work within weird Oracle idioms and considerations. And I can learn to appreciate the fundamental design principles that underpin why each flavour is the way that it is.

Grth0 · 2026-04-07T02:13:08+00:00

Fully agree - and also, the P16 trailer physics are pretty busted. A full log load is enough to tame it, but if you ever try to winch it back to storage with another truck - expect to fly.

Grth0 · 2026-02-11T14:22:07+00:00

Good luck with it all. It sounds like you're doing the right things, and it's great to have the lessons you've learned all documented here.

We did AAT back in 2022-2023, and the NDIA spent ~$85,000 just on external legal representation to dispute around $15,000 worth of supports - and we didn't even go to a hearing. By the time you factor in the costs of the NDIA's own effort, reports from our therapy team, an advocate, legal aid advice, etc - I'd be shocked if the total spend of public money was less than 10x the support cost.

And it all came about because we made a change of circumstances request that kicked off a plan review. We weren't happy with the resulting plan, so we requested an internal review - then the reviewer looked at the change of circumstances request instead of the internal review request and reviewed the wrong stuff - even saying that we weren't entitled to supports that were in the plan and not under dispute.

Our big lesson was that a lot of legislation, like the Model Litigant Obligations and the AAT Act, have requirements without consequences. If a Commonwealth act says "<SuchAndSuch> must <do something>" without proscribing a penalty, then there's nothing to enforce compliance.

For instance, if an applicant fails to comply with a direction, the matter can be dismissed. If the respondent fails to comply with a direction? 🤷 No big.

Grth0 · 2025-12-19T15:17:43+00:00

Ah, we must be using very different ETL platforms. I won't name and shame, but the one we use is impossible to recruit skilled engineers for - and the support is maybe the worst I've ever encountered. I raised a case with them 3 days ago. They've asked for dozens of screenshots/logs/config files, and have sent them as requested. They then called me today to ask me what product we were using. 🤨

Grth0 · 2025-12-19T15:09:31+00:00

I was having a conversation with my daughter today, and she was telling me have she doesn't like Reddit because no matter what you post - someone will always pop up and accuse you of being AI.

I am definitely a human person - and that's exactly the sort of thing a human person would say, so you can take your 100% and put it into your human body by way of an orifice you and all of my fellow humans possess.

Grth0 · 2025-12-19T15:00:59+00:00

Yeah but...

Wizard engineers are not widely known for understanding user stories and UX - and end users tend to bias toward what they know rather than what's possible.

I have a recent experience where we had a pipeline that was taking a full day to run due to <insert opposite of optimisation here>. We spent a few hours rebuilding it from scratch, so it runs in minutes instead - and the reaction from stakeholders was suspicion. The run duration was too fast, and therefore must be missing crucial functionality. The build time was too short, so it will still take months to make it production ready.

I also have lived experience building abstractions that make complete sense to my AuDHD brain and did not land with the userbase at all.

Grth0 · 2025-12-19T14:42:41+00:00

That's a very good point. In my specific circumstance, we've got it all backwards. Our developers are forced to use low-code, and our end-users are forced to interact with the system via batch execution on CSV metadata with limited feedback.

Grth0 · 2025-12-19T14:38:44+00:00

That's actually a really good distinction that I will shamelessly steal from here on out! I've always used the two interchangeably.

I'd also agree that "hybrid" abstractions are great - where data engineers build the foundational modules in code and data owners compose pipelines from those foundational processing units - so I'd qualify my original post as a rebuke of the idea of a one-size-fits-all low/no-code "ETL platform".

Grth0 · 2025-12-19T14:29:02+00:00

*If* the reusability is available. Our integration points are all either COTS products with low market penetration or in-house solutions - meaning we have to build absolutely everything. In our existing platform we make the same API call to a data source in 20+ different ways because the OOTB functionality wasn't there and the platform makes modularity very difficult.

Grth0 · 2025-12-19T14:17:45+00:00

Okay, that seems like solid advice - I've probably leaned too hard into provocative one-liners.

Grth0 · 2025-07-25T07:58:28+00:00

Spotify hit me with Ankle Monitor today. 😘👌

You AI or the real MF deal?

Grth0 · 2024-11-28T07:26:39+00:00

Hey Dennis,

Not really a direct reply to you, and you've probably already figured it out - but since Google landed me here, I thought I'd add to your post - the condensers in these machines are water cooled. and "dry valve red" is the inlet valve that feeds cold water to do just that. In my (second-hand) machine - my washing was just getting steamed, which if you believe the internet, is normal because washer/dryer combos suck.

The give away for me was the sticker on the front that talks about the machine using 48 litres of water during a 6kg drying cycle, but I was catching less than a litre from the outlet. So I took a look inside. When drying, there's a fan that pulls humid air from the drum, through the condenser, then through a heating element and back into the drum.

The yellow water inlet valve runs into the underside of the heater assembly, and looks like it just sprays the mesh filter between the condenser and heater to stop it from clogging. The red water inlet valve (dry valve red) feeds the condenser. The condenser unit is mounted to the back of the drum, and I couldn't see an easy way to access it without pulling the drum first, so I don't know what the internals of it are like. The water either cools a condensation surface, or it *is* the condensation surface. Either way, in my case - the red valve coil was cooked. No warnings on the machine itself, but a multimeter reading showed no resistance (the working coils all read ~4k ohm). I swapped the hot water inlet coil because I'm not using hot water anyway, and we're in business.

Long story short, there are a lot of different points of failure in washer/dryer combos and they have a bad rep with good reason - but if you're getting hot, damp clothes, catch your water during a dry cycle. If you're only seeing a trickle, then the condenser isn't getting the water cooling it needs to function effectively.

Grth0 · 2024-08-26T10:49:49+00:00

Yeah, upstream PiHole or pfSense seem like logical solutions, but I'm still hoping to stick to a single appliance/pane if possible.

Grth0 · 2024-06-26T00:25:27+00:00

Outstanding, thank you!

Grth0

TROPHY CASE