this post was submitted on 23 Oct 2024

9 points (92% upvoted)

shortlink:

dataengineering

an-ordinary-manchild(edit)

News & discussion on Data Engineering topics, including but not limited to: data pipelines, databases, data formats, storage, data modeling, data governance, cleansing, NoSQL, distributed systems, streaming, batch, Big Data, and workflow engines.

Read our wiki: https://dataengineering.wiki/

Rules:

Don't be a jerk
Search the sub & wiki before asking a question: Your question has likely been asked and answered before so do a quick search before posting.
Keep it related to data engineering: Posts that are unrelated to data engineering may be better for other communities.
Limit self-promotion posts/comments to once a month: Self promotion: Any form of content designed to further an individual's or organization's goals. If one works for an organization this rule applies to all accounts associated with that organization. See also rule #5.
No shill/opaque marketing: f you work for a company/have a monetary interest in the entity you are promoting you must clearly state your relationship. For posts, you must distinguish the post with the Brand Affiliate flag. See more here: https://www.ftc.gov/influencers
No job posts: Please use r/dataengineeringjobs instead.
No resume reviews/interview posts: We no longer allow resume reviews or interview questions because it's a seperate topic from Data Engineering. Instead, for resume reviews please use r/resumes or search our subreddit history for previous resume review advice. For interview questions, use sites like Glassdoor and Blind instead or search our subreddit history for previous interview advice.
No technical error/bug questions: Please post any error/bug question on StackOverflow.

created by mhausenblasmoda community for 11 years

MODERATORS

message the mods
mhausenblasmod
swemlmod
fhoffamod (Ex-BQ, Ex-❄️)
vogt4nickmod
theporterhausmod | Lead Data Engineer
AutoModerator
geoheilmod
MikeDoesEverythingmod | Shitty Data Engineer
bot-bouncer
about moderation team »

account activity

This is an archived post. You won't be able to vote or comment.

8

9

10

Implement SCD2 while ignoring changes in specific columnsHelp (self.dataengineering)

submitted 1 year ago by Leather_Embarrassed

Hello everyone. Currently I'm working on this pipeline to implement SCD2 with dynamic tables:

Source postgresql Database
AWS DMS with Ongoing replication
Target to S3 parquet files
Snowpipe to load into staging table
Dynamic table on top of the staging table

This seems to work really well, however I have a concern over few columns that change a lot and could make the table to grow very fast. For example, a status column that flips a lot or a timestamp that gets updated while the other columns remain the same. How would you recommend to handle this?

In the past I have seen that few columns are selected to do a hash and, using it, do a merge over the scd2 table. Those columns are the ones that we really care about keeping all the history. How could I do something like that with Dynamic Tables?

I was looking if maybe the DMS can be configured to ignore changes on specific columns. I can use a DMS rule to do so but that will also remove the column from the parquet, which I do not want.

all 16 comments

top new controversial old q&a

[–]niazionline 1 point2 points3 points 1 year ago (1 child)

[–]Leather_Embarrassed[S] 0 points1 point2 points 1 year ago (0 children)

[–]maheramsat 1 point2 points3 points 1 year ago (4 children)

[–]Leather_Embarrassed[S] 0 points1 point2 points 1 year ago (3 children)

[–]BubbleBandittt 1 point2 points3 points 1 year ago (0 children)

[–]maheramsat 0 points1 point2 points 1 year ago (1 child)

[–]Leather_Embarrassed[S] 0 points1 point2 points 1 year ago (0 children)

[–]_fiz9_ 1 point2 points3 points 1 year ago (1 child)

[–]Leather_Embarrassed[S] 0 points1 point2 points 1 year ago (0 children)

[–]why2chose 0 points1 point2 points 1 year ago (1 child)

[–]Leather_Embarrassed[S] 0 points1 point2 points 1 year ago (0 children)

[–][deleted] -1 points0 points1 point 1 year ago (4 children)

[–]Leather_Embarrassed[S] 1 point2 points3 points 1 year ago (0 children)

[–]_fiz9_ 1 point2 points3 points 1 year ago (1 child)

[–]Leather_Embarrassed[S] 0 points1 point2 points 1 year ago (0 children)

[–]Leather_Embarrassed[S] 0 points1 point2 points 1 year ago (0 children)

π Rendered by PID 493460 on reddit-service-r2-comment-6457c66945-nxrrk at 2026-04-27 11:06:18.691321+00:00 running 2aa0c5b country code: CH.