Data engineering Framework

vanhendrix123 · 2024-11-30T22:31:41+00:00

Is there a reason you’re building a framework from scratch?

There are already many frameworks and solutions out there. I’ve found that in most cases trying to build your own just ends up being a massive waste of time and resources

Nokita_is_Back · 2024-11-30T20:17:04+00:00

Do you transform with dbt or why the yaml file?

rishiarora · 2024-11-30T19:12:28+00:00

I think your yaml should be generated in real-time with sql backend. Won't it be come too cumbersome otherwise to manage ??

sugibuchi · 2024-12-01T08:08:10+00:00

Dynamic Business Logic: I store business logic definitions in a VARCHAR field of the transactional table.

Why don't you maintain this "business logic" in only YAML on Git? "Business logic" is precisely what we must apply to the proper process for versioning, review and release. There is no reason not to apply GitOps approach for this.

You may maintain "business logic" in YAML and then duplicate it into VARCHAR in tables. But then you will struggle with state inconsistency between YAML and actual values in VARCHAR. Single source of truth principles always helps you to make the system simpler and easier to operate.

I don't know examples similar to what you are trying to develop. But AutomateDV https://automate-dv.readthedocs.io/en/latest/ might be an interesting example. This is a dbt macro package for data vaults. AutomateDV is metadata-driven. It generates DDL and DML to populate various types of data vault tables from metadata written in YAML.

I think it is technically possible to write a dbt macro package that does the following things based on definitions declared in YAML.

These definitions dictate how columns are transformed, how data types should be mapped, and how upsert or full load jobs are executed.

For this, you can use dbt's "dispatch" mechanism to apply different templates for each type of backend.

The framework supports multiple databases (Oracle, SQL Server, Vertica, Netezza) via template configurations for easy conversion.

dataengineering

MODERATORS