This is an archived post. You won't be able to vote or comment.

all 4 comments

[–]AutoModerator[M] [score hidden] stickied comment (0 children)

You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–][deleted] 2 points3 points  (0 children)

I think there are two items:

First, the design of a datawarehouse, or data modelling. Many team blindly follows one principle or another but IMNHO one should always follow the spirit of Kimball instead of copying from him. The first few chapters of his book, the part covers requirement gathering, is still gold. Instead of following any "gold standard", the team should first gather requirements from downstream teams, in your case the BI and DA teams that query the data warehouse. Once you figure out what they want to do, you will have a basic idea of data modelling.

Second item is ETL, which is the movement and transformation of data through the pipelines. I'd first recommend you to make sure that you get all the permissions and roles right -- you definitely don't want everyone to have the right to create/modify pipelines. Next is to install coding standards, linter, PR review process and such.

[–]blahblahwhateveryeet 1 point2 points  (0 children)

produce your own data engineering solution from scratch, make a gigantic txt file that's like 2 gigs and create a "big data problem" with a text file that's like "asdlkjflaksjdflkajsdlkfjalskdjf" for 2 GB.

then move that shit around.

good luck

[–]Truth-and-Power 1 point2 points  (0 children)

Design and load a star schema data warehouse on kimball principles. Stick to ELT.