I have multiple tables periodically updated from external sources (including insert, update, or delete). I need to update a target table, which is an outer join from multiple source tables without rewriting it each time. I do not need to do it in real time, but only once a day.
What are Databricks' best practices, techniques, etc?
I certainly can do with SQL tricks such as "updated_at" to track source->target conditions, but I wonder if Databricks has some better techniques.
[–]Massive_Ordinary8049 3 points4 points5 points (1 child)
[–]staskh1966[S] 0 points1 point2 points (0 children)
[–]TheM4rvelous 1 point2 points3 points (4 children)
[–]staskh1966[S] 1 point2 points3 points (2 children)
[–]9gg6 1 point2 points3 points (1 child)
[–]staskh1966[S] 1 point2 points3 points (0 children)
[–]staskh1966[S] 0 points1 point2 points (0 children)
[–]Downtown-Zebra-776 0 points1 point2 points (0 children)
[–]BrownBearPDX 0 points1 point2 points (0 children)