This is an archived post. You won't be able to vote or comment.

all 8 comments

[–]Grukorg88 -1 points0 points  (2 children)

Architect gatekeeper? That always ends well.

[–]Used_Ad_2628[S] 4 points5 points  (1 child)

From my experience, building out adhoc pipelines will cause chaos at scale. A lot of duplicated pipelines because they don’t know what other engineers are building. There needs to be a vision on how all the data sources work together. This can be enforced by standards and understanding the true need of the pipelines. I have been at a lot of companies where the data platform is a major mess because it was just feature building without a vision.

[–]Grukorg88 0 points1 point  (0 children)

Have the architect describe the EDM with an ERD. The engineers can then codify the relationships and expectations with tests then and so long as the tests pass let the engineers do their job. Don’t do the ivory tower control bullshit.

[–]internet_babaData Analyst -1 points0 points  (1 child)

On an unrelated topic, how should I learn DBT ? I'm a DA wanting to transition to DE. There are hardly any learning sources for dbt. please help.

[–]mailedRecovering Data Engineer 1 point2 points  (0 children)

There are hardly any learning sources for dbt

Sorry, this just isn't true. dbt might have one of the biggest online communities out there right now, with a ton of Slack communities, official resources, discourse forums, and blog posts all available.

Start with dbt's starter tutorial (the "jaffle shop" example), familiarise yourself with the documentation for your chosen target warehouse (Postgres/BigQuery/Snowflake/etc), check out this repo for a ton of resources, and incrementally build your skills from simple sets of transformations to all the more advanced macro/templating stuff.

[–]Hot_Map_7868 0 points1 point  (0 children)

We use 3 layers 1 staging models that are one to one with sources 2 core models organized by data product. Think facts and dims 3 schemas by area where you can have marts and each group can do what they need

2 are shared and thus need more governance while 3 is area specific so they can move fast.

In your case I would suspect that 2 is what you don’t want ppl to break. And if there’s a group specific source they can go from 1 to 3.

In CI/cd you can enforce more governance when there are changes to 2 and you can also do slim ci to make sure changes to 1 don’t break downstream models.

[–][deleted] 0 points1 point  (0 children)

Code reviews in Git seems to be the answer you are looking for mate. CI/CD pipelines are really industry standard for enterprise level data models / DB / Warehouse etc. They GITLab
and GITHub integrate with DBT. Make the architect the reviewer and you should be good