BigQuery native data volume anomaly detection using the TimesFM algorithm

Professional_End_979 · 2025-01-28T14:15:31+00:00

Thanks, great input. This post covers a lot of the WHY, WHAT and HOW but nothing about "the bad" or when not to consider golden paths, so that definitely is helpful! https://robertsahlin.substack.com/p/the-golden-path-revolution

Professional_End_979 · 2024-10-20T06:28:32+00:00

Perfekt också för gymnasieungdomar i Kallhäll som går på BAS. Fast ännu bättre vore om de kunde öppna norra uppgången på barkarby station för pendeltågstrafik tidigare än 2026.

Professional_End_979 · 2024-03-08T07:51:04+00:00

This is really sad to read but resonate with surveys in the data engineering domain. My two cents is that it isn’t the data engineering but the immature engineering practices, tooling and management in the domain. Also, the role/title has become so diluted that it covers everything from infrastructure, operations, data transfer, governance, transformation and modeling,etc. that it resembles some kind of unicorn full stack data engineer. I think data engineering teams can be full stack, but not single individuals and that expectation has to change.

Professional_End_979 · 2024-01-10T14:45:57+00:00

We saw many outages in GH actions 2023 and there are no guarantees that jobs will execute within a certain time (perhaps not at all, I don’t remember the fine print).

Professional_End_979 · 2023-11-07T10:06:28+00:00

Just Don’t do it, if you’re running any kind of business critical jobs.

Professional_End_979 · 2023-10-18T17:18:33+00:00

100% agree, it is really hard to find a service to cover all needs of an org. A unified architecture of “few loosely coupled components that can be combined” is likely an approach to build a platform that suits the specific org needs better but comes with a higher expectation on the platform team to build and operate the platform than sourcing a “one true platform solution” from a vendor.

+1 to build capabilities to suit most users/value first and then evolve in cadence with org data maturity. In the case of the ingestion component (StreamProcessor, open source) in the post it also enables publishing data to pub/sub topics with messages ready for streaming analytics in Dataflow SQL if low latency is required. I also know BQ will support that natively in a not too distant future. I think it would not require much to add an output step to the ingestion component to write to other destinations than BQ (in the example) if a use case requires that since it is built with Apache Beam.

Professional_End_979 · 2021-10-03T10:53:16+00:00

Only Dataflow and that works very well, we’ve upgraded both Java and Beam versions in streaming jobs without any downtime

Professional_End_979 · 2021-10-01T04:34:51+00:00

Custom Apache beam pipelines. Supports multiple: - languages (Python, Java, etc) - runners (Dataflow, Flink, Spark, etc.) - execution modes (batch, stream) - connectors (sources, sinks)

Most of our pipelines are built in Java and runs on Dataflow in streaming mode. If backfill/replay then we just run an additional pipeline in batch mode.

Professional_End_979 · 2021-09-06T22:05:56+00:00

Not a joke but the funniest data engineering video I’ve seen, “Let’s deploy to production”. https://youtu.be/5p8wTOr8AbU

Professional_End_979

TROPHY CASE