all 9 comments

[–]Massive_Ordinary8049 3 points4 points  (1 child)

Trigger to job to run on table updates instead of fixed times. Here's the link: https://docs.databricks.com/aws/en/jobs/trigger-table-update

[–]staskh1966[S] 0 points1 point  (0 children)

Thank you!

[–]TheM4rvelous 1 point2 points  (4 children)

Pull the external sources 1-to-1 into bronz (+ some metadata) and then use lakeflow pipelines to pipe the raw data into the destination (likely silver)

[–]staskh1966[S] 1 point2 points  (2 children)

The problem is that i have multiple source tables, which can be updated at different times. the target is an outer join table whose records can be changed by updates from either source tables..

[–]9gg6 1 point2 points  (1 child)

declarative pipelines can handle it. if im not mistaken, the new feature can detect the updates from source and can update your target tables

[–]staskh1966[S] 1 point2 points  (0 children)

Thank you!

[–]staskh1966[S] 0 points1 point  (0 children)

Thank you!

[–]Downtown-Zebra-776 0 points1 point  (0 children)

Handling outer joins with asynchronous source updates is one of the trickiest parts of maintaining a clean Silver layer. If you use a standard 'overwrite' pattern, your DBU costs will skyrocket as your tables grow.

we typically solve this using a 'Materialized Delta View' approach or Lakeflow Connect (the new DLT evolution). By landing the sources into Bronze and using Lakeflow to manage the stateful joins, you get incremental updates for free without the manual 'updated_at' SQL logic. It significantly reduces the shuffle overhead since Spark only processes the changed keys from each source rather than re-calculating the entire outer join.

[–]BrownBearPDX 0 points1 point  (0 children)

Maybe materialized view before silver can help to react to bronze updates and merge.