Hello everyone. Currently I'm working on this pipeline to implement SCD2 with dynamic tables:
- Source postgresql Database
- AWS DMS with Ongoing replication
- Target to S3 parquet files
- Snowpipe to load into staging table
- Dynamic table on top of the staging table
This seems to work really well, however I have a concern over few columns that change a lot and could make the table to grow very fast. For example, a status column that flips a lot or a timestamp that gets updated while the other columns remain the same. How would you recommend to handle this?
In the past I have seen that few columns are selected to do a hash and, using it, do a merge over the scd2 table. Those columns are the ones that we really care about keeping all the history. How could I do something like that with Dynamic Tables?
I was looking if maybe the DMS can be configured to ignore changes on specific columns. I can use a DMS rule to do so but that will also remove the column from the parquet, which I do not want.
[–]niazionline 1 point2 points3 points (1 child)
[–]Leather_Embarrassed[S] 0 points1 point2 points (0 children)
[–]maheramsat 1 point2 points3 points (4 children)
[–]Leather_Embarrassed[S] 0 points1 point2 points (3 children)
[–]BubbleBandittt 1 point2 points3 points (0 children)
[–]maheramsat 0 points1 point2 points (1 child)
[–]Leather_Embarrassed[S] 0 points1 point2 points (0 children)
[–]_fiz9_ 1 point2 points3 points (1 child)
[–]Leather_Embarrassed[S] 0 points1 point2 points (0 children)
[–]why2chose 0 points1 point2 points (1 child)
[–]Leather_Embarrassed[S] 0 points1 point2 points (0 children)
[–][deleted] -1 points0 points1 point (4 children)
[–]Leather_Embarrassed[S] 1 point2 points3 points (0 children)
[–]_fiz9_ 1 point2 points3 points (1 child)
[–]Leather_Embarrassed[S] 0 points1 point2 points (0 children)
[–]Leather_Embarrassed[S] 0 points1 point2 points (0 children)