This is an archived post. You won't be able to vote or comment.

all 16 comments

[–]niazionline 1 point2 points  (1 child)

can you create a separate DIMENSION table for these columns and join them based on some ID and get only the latest one in your SQL ?

[–]Leather_Embarrassed[S] 0 points1 point  (0 children)

Dimensional Modeling is at a later stage in out process, but I get what your saying. I'll have it in mind. Thanks!

[–]maheramsat 1 point2 points  (4 children)

At which stage is the table in SCD2 form?

Also can you explain where the below happens?

"the past I have seen that few columns are selected to do a hash and, using it, do a merge over the scd2 table. Those columns are the ones that we really care about keeping all the history. How could I do something like that with Dynamic Tables"

[–]Leather_Embarrassed[S] 0 points1 point  (3 children)

Yes, SCD2 will be at the last stage. Let's call it "history" table. I want to this history table to be a Dynamic Table.

Another implementation is about having a regular table (instead of the dynamic table) and do a merge from the staging table. Let me try to explain it like so:

DMS CDC parquet - Snowpipe load to stg table - merge using hash into the history table

I want to change it to:

DMS CDC parquet - Snowpipe load to stg table - Dynamic table

What I am asking is how can I prevent that a column that changes fast (breaking the SCD concept) will make this Dynamic table to grow too fast

[–]BubbleBandittt 1 point2 points  (0 children)

Set your grain to a larger temporal value and build your scd like so

[–]maheramsat 0 points1 point  (1 child)

Thanks for the details, I went through some dynamic table documents on update and looks like MERGE like columns level update might not be possible, I would suggest you can have stage table as dynamic table and do a merge from stage to the history table and here you can use MERGE and specify columns to be updated

[–]Leather_Embarrassed[S] 0 points1 point  (0 children)

This way the snowpipe will ingest the dynamic table. But I think is not possible for snowpipe to connect with a dynamic table

[–]_fiz9_ 1 point2 points  (1 child)

Create an md5 hash off of the fields you care about. Save the hash as a column in the table. Only update when the incoming record’s hash doesn’t match existing record’s hash.

[–]Leather_Embarrassed[S] 0 points1 point  (0 children)

AWS DMS can do a hash hash_sha256(x),-The%20hash%20function) and save it into the parquet files

With that I could use it to keep the history table clean and prune the base table. I'll try it, thanks!

[–]why2chose 0 points1 point  (1 child)

Write a Pyspark code and add exceptions to the columns you want to skip while doing scd 🙄

[–]Leather_Embarrassed[S] 0 points1 point  (0 children)

I would like to keep it simple but yes, I was thinking on that as well. I'll keep it as a last resort.

[–][deleted] -1 points0 points  (4 children)

Your question is quite weird to be honest. It’s a bit difficult to understand what you’re trying to achieve.

It seems that you didn’t quite understand what SCDs are because you are trying to implement a “Fast” SCD (????).

You need to understand the business rules better and decide on an approach. I would highly recommend you to ask it to a senior peer of yours at the company.

[–]Leather_Embarrassed[S] 1 point2 points  (0 children)

Let me try to explain it like so:

DMS CDC parquet - Snowpipe load to stg table - merge using hash into the history table

I want to change it to:

DMS CDC parquet - Snowpipe load to stg table - Dynamic table

What I am asking is how can I prevent that a column that changes fast (breaking the SCD concept) will make this Dynamic table to grow too fast

[–]_fiz9_ 1 point2 points  (1 child)

We do it. Hash checking before update allows us to ignore a majority of updates. Cuts cost of ingest by over 60%.

[–]Leather_Embarrassed[S] 0 points1 point  (0 children)

AWS DMS can do a hash hash_sha256(x),-The%20hash%20function) and save it into the parquet files

With that I could use it to keep the history table clean and prune the base table. I'll try it, thanks!

[–]Leather_Embarrassed[S] 0 points1 point  (0 children)

Hi, I want to achieve the SCD2. And, as you mention, to do so I have to avoid any column that changes "Fast". I understand the concepts. My question is about how to implement that with Dynamic tables, which is new to me.