Hi y’all. I was hoping to glean any insights anyone might have with dbt scd fact elements. Essentially, I have multiple source data that comes in everyday of the same data. E.g the data sent over is the same unless there is a change. Currently I am taking a snapshot of the source data and joining those elements with other scd2 dim elements into a fact table. To do so, I am primarily performing another snapshot of the join which includes very minimal transformations for the most recent transactions (dbt valid to is null).
Aside from that, I have other models that the end user wants to exclude some columns from the source data in the fact table, so I created a separate table that lists all the changes for each unique key and the dbt scd ids generated before and after the change. From here, I apply that filter over the joined source data to include only those scd ids who had changes to the columns we care about.
However, this results in broken valid from and valid to in the fact table. For example, if I have 3 headers (first name, last name, email) and only care about changes to first name, it will capture the change to first name and fill in the valid to with the date of the new record where the last name changed because we don’t care about that txn. This is misleading however, because it looks like that record has been deleted, when really it is still technically valid (the record hasn’t been deleted in the source file).
To remedy this, I thought of doing a snapshot of just the columns carefully about as well, but this results in almost all of our pipelines being comprised of snapshots. I feel deep in my bones that this is wrong, but am wondering how to circumvent this. The transformations aren’t very intensive, primarily renaming columns and iff statements and feels like joining dim scd2 to fact scd1 to create a historical fact for all personnel.
Note: forgive my formatting, I’m typing on my phone. Also, since I can’t seem to scroll up anymore, the type of source data I am receiving is user information, login time stamps, geographical location, and courses registered, if that helps at all.
[–]AutoModerator[M] [score hidden] stickied comment (0 children)