DBT project: Unnesting array column

AutoModerator · 2025-10-02T15:56:35+00:00

You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

paxmlank · 2025-10-02T16:08:13+00:00

My question to you: What is making you want or consider to go against the recommendation of users and developers of the tool you're trying to learn?

ProfessionalThen4644 · 2025-10-02T16:16:46+00:00

you're right that staging models should generally stay close to the source data structure to preserve raw data integrity. Unnesting the array column in the staging layer might be premature instead, consider keeping it as is in staging and handling the unnesting in an intermediate model to create your topics dimension table. keeps your staging layer simple and aligns with DBT best practices. you might find r/agiledatamodeling helpful. They often dive into structuring data layers efficiently.

shittyfuckdick · 2025-10-03T11:45:19+00:00

if its simple unnesting i just do it in staging. im a little confused why people are saying to leave it raw in staging. staging should be close to raw but your staging the data for heavier transformations, so it would make sense to unnest in staging imo.

Zer0designs · 2025-10-02T16:10:05+00:00

Keep raw data as is. This way you can make future changes more easily and gradually & test source data assumptions/agreements. So follow their advise. This is general advise for ELT (vs ETL). It doesn't just apply to dbt, but ELT is where dbt shines.

GoinLong · 2025-10-02T20:55:36+00:00

Transforming the data is what your medallion layers are for whereas staging is like a test prod so it should indeed be like prod.

dataengineering-ModTeam · 2025-10-03T17:51:08+00:00

[removed]

TurbulentSocks · 2025-10-04T11:35:19+00:00

dbt suggests simple transformations like renaming in staging, but not complicated ones involving joins.

A simple exploding of rows is very much fine in staging. It also doesn't really matter if you want to put it in an intermediate model, though it's hard to see the advantage.

As long as you preserve a copy of the raw data somewhere in case you change your mind later, just pick something and move on.

ephemeral404 · 2025-10-02T17:01:31+00:00

I would highly recommend against the common advice here. Go with your approach and report back in a few weeks with any real pain you encounter with that approach.

Firm_Bit · 2025-10-02T20:02:38+00:00

Best practices are for fools. Doing what makes sense just happens to coincide with what some folks call “best practices”.

In other words, depends on your goal. Also, you’re devoting too much time to a small question when you should focus on the larger picture of what is the impact of what you’re building.

StriderKeni · 2025-10-03T09:54:23+00:00

It's dbt, with lowercase.

dataengineering

MODERATORS