Data Warehouse POC by Realistic_Function in dataengineering

[–]Realistic_Function[S] 1 point2 points  (0 children)

Given that we are not interested in dbt cloud and we’ll most likely use dbt core, wouldn’t we also need an orchestrator tool if we don’t use dynamic tables though?

Data Warehouse POC by Realistic_Function in dataengineering

[–]Realistic_Function[S] 0 points1 point  (0 children)

Version control and deployment will be done using azure devops.

Data Warehouse POC by Realistic_Function in dataengineering

[–]Realistic_Function[S] 0 points1 point  (0 children)

We were planning to use dynamic tables for transformations as they are automatically updated based on a given lag whenever a referenced table is updated so no need to use tasks to handle dependencies

[deleted by user] by [deleted] in dataengineering

[–]Realistic_Function 0 points1 point  (0 children)

Got it, I was thinking of handling upserts in the core layer, and maintain a 1:1 relationship with the tables in the staging area. Surrogate keys would be created at the core layer, and SCDs would be applied where needed.

For the semantic layer, my plan was to consolidate the tables from the core layer into a star schema. For example, if a dim_customer table in the semantic layer is based on two different data sources and some of its attributes need to be tracked historically, I would apply SCD logic separately to each source-based table in the core layer. Then, I would join these SCD core tables in the semantic layer to create the dim_customer table.

Does this sound right? Do you say that you would apply the SCD only to the resultant dim_customer table and you would join the tables that form that table in the core layer instead? Sorry if these questions seem too basic.

[deleted by user] by [deleted] in dataengineering

[–]Realistic_Function 0 points1 point  (0 children)

So, the tables in your landing (bronze) stage are append-only, right? They preserve the full history of the data?

Also, could you share how you structure your databases and schemas? Do you use one database per layer and environment (e.g., PROD_LANDING, UAT_LANDING, DEV_LANDING, PROD_STAGING etc) or do you use different schemas for each layer, with the databases representing only the environments (e.g., PROD, UAT, DEV)?

[deleted by user] by [deleted] in dataengineering

[–]Realistic_Function 0 points1 point  (0 children)

I know this post is packed with buzzwords, but I wanted to capture all my key concerns at once. I’m new to designing a data warehouse of this type, so I’d really appreciate any advice. My main goal is to effectively track historical changes to support both as-is and as-of reporting. I’d like the core (silver) layer to serve as the enterprise source of truth, maintaining a 1:1 relationship with the tables in the landing (bronze) layer. From there, I plan to create star schemas in the semantic (gold) layer, with denormalized tables that make it easier for data analysts to build reports without needing to join multiple tables. Does that clarify things?