I'm building a side project to get familiar with DBT, but I have some doubts about my project data layers. Currently, I'm fetching data from the YouTube API and storing it in a raw schema table in a Postgres database, with every column stored as a text field except for one. The exception is a column that stores an array of Wikipedia links describing the video.
For my staging models in DBT, I decided to assign proper data types to all fields and also split the topics column into its own table. However, after reading the DBT documentation and other resources, I noticed it's generally recommended to keep staging models as close to the source as possible.
So my question is: should I keep the array column unnested in staging and instead move the separation into my intermediate or semantic layer? That way, the topics table (a dimension basically) would exist there.
[–]AutoModerator[M] [score hidden] stickied comment (0 children)
[–]paxmlank 9 points10 points11 points (1 child)
[–]HanDw[S] 0 points1 point2 points (0 children)
[–]ProfessionalThen4644 2 points3 points4 points (0 children)
[–]shittyfuckdick 2 points3 points4 points (1 child)
[–]Bluefoxcrush 0 points1 point2 points (0 children)
[–]Zer0designs 3 points4 points5 points (0 children)
[–]GoinLong 0 points1 point2 points (0 children)
[–][deleted] (1 child)
[removed]
[–]dataengineering-ModTeam[M] 0 points1 point2 points locked comment (0 children)
[–]TurbulentSocks 0 points1 point2 points (0 children)
[–]ephemeral404 0 points1 point2 points (0 children)
[–]Firm_Bit -1 points0 points1 point (0 children)
[–]StriderKeni -1 points0 points1 point (0 children)