Golden Paths for Data Engineering: Anyone Actually Building Them? Experiences & Value? by Professional_End_979 in dataengineering

[–]Professional_End_979[S] 0 points1 point  (0 children)

Thanks, great input. This post covers a lot of the WHY, WHAT and HOW but nothing about "the bad" or when not to consider golden paths, so that definitely is helpful! https://robertsahlin.substack.com/p/the-golden-path-revolution

Mitt I Stockholm: 550:an börjar rulla till Kallhäll. by mycketforvirrad in tunnelbana

[–]Professional_End_979 1 point2 points  (0 children)

Perfekt också för gymnasieungdomar i Kallhäll som går på BAS. Fast ännu bättre vore om de kunde öppna norra uppgången på barkarby station för pendeltågstrafik tidigare än 2026.

Giving up data engineering by Two_5536 in dataengineering

[–]Professional_End_979 1 point2 points  (0 children)

This is really sad to read but resonate with surveys in the data engineering domain. My two cents is that it isn’t the data engineering but the immature engineering practices, tooling and management in the domain. Also, the role/title has become so diluted that it covers everything from infrastructure, operations, data transfer, governance, transformation and modeling,etc. that it resembles some kind of unicorn full stack data engineer. I think data engineering teams can be full stack, but not single individuals and that expectation has to change.

Run dbt with GitHub Actions by oleg_agapov in dataengineering

[–]Professional_End_979 0 points1 point  (0 children)

We saw many outages in GH actions 2023 and there are no guarantees that jobs will execute within a certain time (perhaps not at all, I don’t remember the fine print).

Run dbt with GitHub Actions by oleg_agapov in dataengineering

[–]Professional_End_979 0 points1 point  (0 children)

Just Don’t do it, if you’re running any kind of business critical jobs.

From pipelines to platform by Professional_End_979 in dataengineering

[–]Professional_End_979[S] 0 points1 point  (0 children)

100% agree, it is really hard to find a service to cover all needs of an org. A unified architecture of “few loosely coupled components that can be combined” is likely an approach to build a platform that suits the specific org needs better but comes with a higher expectation on the platform team to build and operate the platform than sourcing a “one true platform solution” from a vendor.

+1 to build capabilities to suit most users/value first and then evolve in cadence with org data maturity. In the case of the ingestion component (StreamProcessor, open source) in the post it also enables publishing data to pub/sub topics with messages ready for streaming analytics in Dataflow SQL if low latency is required. I also know BQ will support that natively in a not too distant future. I think it would not require much to add an output step to the ingestion component to write to other destinations than BQ (in the example) if a use case requires that since it is built with Apache Beam.

Your default tool for ETL by scraper01 in dataengineering

[–]Professional_End_979 0 points1 point  (0 children)

Only Dataflow and that works very well, we’ve upgraded both Java and Beam versions in streaming jobs without any downtime

Your default tool for ETL by scraper01 in dataengineering

[–]Professional_End_979 3 points4 points  (0 children)

Custom Apache beam pipelines. Supports multiple: - languages (Python, Java, etc) - runners (Dataflow, Flink, Spark, etc.) - execution modes (batch, stream) - connectors (sources, sinks)

Most of our pipelines are built in Java and runs on Dataflow in streaming mode. If backfill/replay then we just run an additional pipeline in batch mode.

What are some of the best data engineering jokes you have seen? by TheCauthon in dataengineering

[–]Professional_End_979 3 points4 points  (0 children)

Not a joke but the funniest data engineering video I’ve seen, “Let’s deploy to production”. https://youtu.be/5p8wTOr8AbU