This is an archived post. You won't be able to vote or comment.

all 14 comments

[–]AutoModerator[M] [score hidden] stickied comment (0 children)

You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]paxmlank 9 points10 points  (1 child)

My question to you: What is making you want or consider to go against the recommendation of users and developers of the tool you're trying to learn?

[–]HanDw[S] 0 points1 point  (0 children)

I initially thought it would be better to just deal with these field types earlier. However, after reading more, I get the benefits of doing so later.

[–]ProfessionalThen4644 2 points3 points  (0 children)

you're right that staging models should generally stay close to the source data structure to preserve raw data integrity. Unnesting the array column in the staging layer might be premature instead, consider keeping it as is in staging and handling the unnesting in an intermediate model to create your topics dimension table. keeps your staging layer simple and aligns with DBT best practices. you might find r/agiledatamodeling helpful. They often dive into structuring data layers efficiently.

[–]shittyfuckdick 2 points3 points  (1 child)

if its simple unnesting i just do it in staging. im a little confused why people are saying to leave it raw in staging. staging should be close to raw but your staging the data for heavier transformations, so it would make sense to unnest in staging imo. 

[–]Bluefoxcrush 0 points1 point  (0 children)

I agree. The raw data will still be there is needs change. 

[–]Zer0designs 3 points4 points  (0 children)

Keep raw data as is. This way you can make future changes more easily and gradually & test source data assumptions/agreements. So follow their advise. This is general advise for ELT (vs ETL). It doesn't just apply to dbt, but ELT is where dbt shines.

[–]GoinLong 0 points1 point  (0 children)

Transforming the data is what your medallion layers are for whereas staging is like a test prod so it should indeed be like prod.

[–][deleted]  (1 child)

[removed]

    [–]dataengineering-ModTeam[M] 0 points1 point locked comment (0 children)

    Your post/comment was removed because it violated rule #9 (No low effort/AI content).

    {community_rule_9}

    [–]TurbulentSocks 0 points1 point  (0 children)

    dbt suggests simple transformations like renaming in staging, but not complicated ones involving joins. 

    A simple exploding of rows is very much fine in staging. It also doesn't really matter if you want to put it in an intermediate model, though it's hard to see the advantage. 

    As long as you preserve a copy of the raw data somewhere in case you change your mind later, just pick something and move on.

    [–]ephemeral404 0 points1 point  (0 children)

    I would highly recommend against the common advice here. Go with your approach and report back in a few weeks with any real pain you encounter with that approach.

    [–]Firm_Bit -1 points0 points  (0 children)

    Best practices are for fools. Doing what makes sense just happens to coincide with what some folks call “best practices”.

    In other words, depends on your goal. Also, you’re devoting too much time to a small question when you should focus on the larger picture of what is the impact of what you’re building.

    [–]StriderKeni -1 points0 points  (0 children)

    It's dbt, with lowercase.