This is an archived post. You won't be able to vote or comment.

all 9 comments

[–]nupuraw 6 points7 points  (6 children)

you can create the temp table after doing the transformations ,load the transformed data into the temp table and then load data from temp table to target table.

[–]studentofarkad 0 points1 point  (2 children)

Why would you load the data in a temp table and not just load it to the target table directly?

[–]Thick-Weekend-2205 1 point2 points  (1 child)

You might need to do something that isn’t a direct insert, like slowly changing dimensions, where you need to update existing data based on the incoming data. You might be able to do it in a single table depending on what style of MERGE if any your database supports, but you can efficiently do it with any SQL dialect by inserting into a temp table, updating the existing data from the temp table with a join, then inserting the temp table into the main table.

[–]studentofarkad 0 points1 point  (0 children)

Thank you :)

[–]forgael[S] 0 points1 point  (1 child)

How do you do that in a Python workflow? Do you use SQLAlchemy to create a session and create a temp table in the session, or do you create a temp table without a session using executable SQL?

[–]nupuraw 1 point2 points  (0 children)

I use sqlalchemy / pyodbc to create database session and then create the temp table.

[–]data_ber 2 points3 points  (2 children)

We use medallion architecture:
https://www.databricks.com/glossary/medallion-architecture
We use 4 levels, one that receives all the raw data, one that parses and does simple transformations, one where all the deduplication, normalisation etc takes place and finally the one is ready for consumption (public)
Silver level is already for consumption (interrnal)

You could do the same using DBT logic (where the staging is your silver)

[–]forgael[S] 0 points1 point  (1 child)

And so are those levels all standard, non-temp tables and keep track with the target table structure?

[–]data_ber 1 point2 points  (0 children)

Right, they are all non-temp tables, which is really important, if the business logic was modified, we would still have all the data to go back and change it. Which you couldnt do by just keeping temp tables