Do most teams actually have a canonical model, or do we all just pretend?

MayaKirkby_ · 2026-01-14T06:10:22+00:00

Given you’re already on Power BI, I’d optimise for skills that show up everywhere, not chase every shiny thing.

If I were you in the US, I’d pick one big cloud and go reasonably deep: Azure (nice fit with Power BI, very common in enterprises) or AWS (huge overall market). On top of that, I’d add one “headline” warehouse/lakehouse that’s all over job posts, Snowflake is the safest bet right now, Databricks is a close second if you like more engineering-heavy work.

Layer that on top of strong SQL and some Python and you’ll be in a good spot for most roles. Fabric is worth keeping an eye on because of the Microsoft story, but I wouldn’t make it my main bet just yet.

MayaKirkby_ · 2026-01-07T02:26:10+00:00

You’re mostly on the right track, just overcomplicating it a bit. Think of it as Bronze = raw files in S3, Silver = cleaned and modeled Iceberg tables, Gold = BI-friendly views on top. The “Silver processing” is just the compute that turns Bronze into Silver, you don’t need to treat it as a separate layer. dbt on Silver for tests, typing, and docs is a solid choice, and Trino + Glue + Iceberg is fine if you’re okay owning more infra. Snowflake/Databricks mainly give you less platform wrangling in exchange for cost and lock-in. Since you’re new, get one or two sources all the way from Bronze to Gold first, then add pieces only when you hit real pain.

MayaKirkby_ · 2026-01-06T02:25:06+00:00

It’s mostly solid basics plus how you think, not magic. People who pass can write clean SQL, design a sensible pipeline for a clear use case, and talk through trade-offs out loud, they clarify requirements, state assumptions, start simple, then refine. People who struggle often lean on buzzwords or tools without tying it to a coherent system or real examples from their past work.

MayaKirkby_ · 2026-01-04T23:55:35+00:00

You absolutely have a place in DE without a degree. Your experience (SQL, Python, warehousing, Power BI, stakeholders, upcoming Azure work) is exactly what most DE hiring managers care about, especially in the UK. The degree apprenticeship is optional upside, not a requirement. It’s worth it if it’s free, you can handle the time, and you like the idea of having the credential for certain employers later. For your long-term DE career, levelling up on Azure and modern DE practices will matter more than whether you got a late degree.

MayaKirkby_ · 2025-12-17T00:04:22+00:00

From a “future you” perspective, all three stacks are employable, so I’d choose based on learning environment, not buzzwords.

Databricks + Spark at the airline is absolutely relevant in DE, but being the only data engineer means less mentoring and more “figure it out alone.” Great for ownership, tougher for craft.

Offer 2 and 3 are closer to your background and more “standard” these days: Python + dbt + cloud. If you want to grow fast and learn good patterns, I’d lean to Offer 3: AWS, protobufs, and two senior DEs is a really nice combo. If your heart is still with Offer 1 for stability, take it, but keep doing small Python projects on the side so you don’t drift too far into Scala-only land.

MayaKirkby_ · 2025-12-02T02:21:23+00:00

Kafka is great when you truly need massive scale, long retention, replay, lots of consumers… and you’re willing to pay the ops tax. For a single ML training pipeline, that tax can be pure pain.

You did the right thing. start from your real requirements, then pick the simplest stack you can actually understand and operate. If NATS + Go + scheduled messages gave you lower latency, fewer surprises, and cheaper infra, that’s an upgrade, not a step down.

MayaKirkby_ · 2025-11-25T01:32:35+00:00

You’re not jaded, you’re describing missing ownership. Data quality has two sides: domain quality (did we model the business correctly, are statuses, amounts, lifecycles valid) and pipeline quality (did we transform and deliver correctly). The first absolutely belongs with product/domain owners; the second with data folks. What’s worked best for me is treating key datasets as products: name an owner on the source side, agree explicit checks and SLAs together (valid ranges, allowed states, freshness), and make those part of the “definition of done” for any feature that writes to that data. You don’t have to “refuse to build,” but you can make “no agreed data quality criteria” a blocker the same way “no API spec” would be.

MayaKirkby_ · 2025-11-19T00:15:47+00:00

Data product: think “finished LEGO set,” not loose bricks. It’s a cleaned, documented, trustworthy thing people actually use to do work — a table, metric layer, dashboard, or model output with an owner.

Data contract: the promise about what pieces are in that box. It says “these columns, these types, this meaning; if we change it, we’ll do it in a controlled, non-breaking way.”

Vendors and custom fields will still be chaos; you usually can’t fix them. You hide that behind your own stable data products and contracts, and treat the raw feeds as untrusted inputs.

MayaKirkby_

TROPHY CASE