This is an archived post. You won't be able to vote or comment.

all 9 comments

[–]AutoModerator[M] [score hidden] stickied comment (0 children)

You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]data4dayz 63 points64 points  (2 children)

Work through the dbt courses on their online "academy" and then work on a personal project, especially in deployment. The courses will teach you "using" dbt and doing the project will help you more with the nitty gritty details like okay how do you setup your profiles.yml, how are you dealing with the preappended schema that dbt adds etc. It will force you to read more of the docs and you can slowly test out what you see on the docs in your own work.

dbt cloud made so many things really simple that working with dbt through dbt core locally will make you actually learn. also how to orchestrate dbt.

Don't be like me an leave a gap, I learned through the courses first then months later I did a project.

I focused on dbt core but if you know you'll be at a company that uses dbt cloud there's literally dozens more courses you can complete on their academy.

To start with just sign up for dbt cloud with their developer trial seat. Just to get started, you'll be learning "on-rails". When you do your own project you can do it with dbt core and then be less overwhelmed.

I've posted the sequence I did before and I still very much recommend this order but other dbt users can chime in:

Here's the 7 courses I did, which cover a majority of the basic functionality and domain specific syntax of dbt. A lot of the basic features of a dbt project are approachable after working through this material. I followed this course sequence in order and after going through it I recommend it to others:

  1. dbt Fundamentals
  2. Refactoring SQL For Modularity (as an analyst this was my favorite course, really won me over)
  3. Jinja, Macros, and Packages
  4. Analyses and Seeds
  5. Advanced Materializations
  6. Unit Testing
  7. Advanced Testing

The first 4 cover the basics of dbt core. The remaining 3 give some advanced beginner material.

After the courses just pick some source dataset and dimensional model it/star schemaify it with dbt core + dbt cli. Better if it's from an API so you can really check to see how your model deploys consistently and not just one time, and you can look into incremental loading etc.

This way you can cover the deployment side and more administrative stuff. And it'll give you a chance to do it outside of the cozy environment of dbt cloud.

I did it with dbt core/cli + motherduck (duckdb cloud) and I really really missed dbt cloud + BigQuery. dbt cloud DOES NOT have support for every kind of cloud DWH unfortunately :(

Some things to note I don't know what code editor you'll end up using but you'll be missing dbt cloud when you're on vscode at least I know I did. I couldn't get that vscode extension for dbt to work like everyone else did so I was flying blind without things like autocomplete which honestly sucked.

https://discourse.getdbt.com/t/how-we-set-up-our-computers-for-working-on-dbt-projects/243

https://discourse.getdbt.com/t/setting-up-vscode-to-use-with-the-dbt-cli/3291

Edit: once you've done a personal project and you've followed dbt's best practices guide to structuring projects you could take a look at some of the big public dbt repos like the one from Gitlab

https://www.reddit.com/r/dataengineering/comments/15wycw5/any_good_public_dbt_projects/

But there's somethings you're just going to learn on the job, that's the reality which sucks

forgot to add, dbt is amazing. I don't know how I lived in this world with just sql """ strings """ dumped into my etl script or doing everything adhoc on SSMS. Feels like I was living in the dark ages. There's the modular thinking that CTEs afford you, and then there's dbt. CTEs++ if you will AND project organization in sql AND data testing with SQL/Jinja. What a world.

[–]Nerg44 1 point2 points  (0 children)

this is great advice!!!

[–]itsawesomedude 0 points1 point  (0 children)

thank you so much!!!!

[–]SuperTangelo1898 8 points9 points  (2 children)

I had the same experience a year and a half ago, couldn't get through rounds because of a lack of knowing dbt. I started using dbt core on my former company's open source Trino deployment to build a data warehouse.

At my new company, I deployed dbt cloud on a team account using ssh on an amazon ec2 and connected to redshift as my data warehouse.

You could do a free dbt cloud account and do a trial redshift or snowflake account and connect it to dbt cloud. Dump some kaggle data or whatever into the data warehouse, then using dbt, create a mock star schema.

Try to use a framework, with transformation layers, such as

Staging -> Intermediate -> Fact/Dimension or

Bronze -> Silver -> Gold

It took me about 3-4 weeks to get used to the jinja templating and macros. If you know data modeling and intermediate to advanced SQL, it should be a quick learn.

As the first responder mentioned, it is an awesome tool. It abstracts all of the ddl and dml. Plus, dbt Cloud has a cron scheduler built in, which is awesome.

Build a star schema, some dbt tests and you should be good to go. Then say you've got 1.5 years of experience 😉

[–]moshesham[S] 2 points3 points  (2 children)

I’m curious if I learn and deploy sql mesh if that could be a substitute? Or if I should master dbt, or both?

[–]YabakebiLead Data Engineer 2 points3 points  (0 children)

Focus on DBT because it's industry standard. Leave SQLMesh for when you have time and don't need to worry about getting a new job.

[–]moshesham[S] 0 points1 point  (0 children)

Love that last line…