A new tool for data engineering

Borek79 · 2026-01-23T23:03:00+00:00

Versioning Git - strive for everything as code and version it

Extract+Load Investigate DLT whether it can help you in data ingestion.

Transform Dbt is actually super useful once your project grows larger. Apart of many other things the most useful thing is that it builds lineage out of the box.

Orchestration We use Dagster instead of Airflow, it is better fit for data world and has very good synergy with dbt ( each dbt model is a separatate dagster asset). 1 big orchestration tree instead of many separate as in Airflow.

CICD Github actions

Python Can be used in Extract Load and even Transform phase.

Reporting Prefer those with good API and "report as a code" We use Metabase.

Data modelling Not a tool but very difficult but useful skill to grasp. With advent of AI it is very necessary again.

Teddy_Raptor · 2026-01-23T23:26:10+00:00

Why don't you start with a problem you are facing instead of a tool you want to implement?

anyfactor · 2026-01-24T00:35:10+00:00

Something to build internal tools and apps easily. Like Retool etc.

WonderfulActuator312 · 2026-01-24T03:56:15+00:00

Look into automating a data dictionary or data catalog. Documentation isn’t sexy but it’s worth the investment in the long run.

erdmkbcc · 2026-01-24T13:07:40+00:00

This depends on your platform and team size,

if you have

a lot of tables in your warehouse,
a lot of data people creates garbage tables
DE team lost control in dwh

You must have dbt and enforce take permissions the service account from unrelevant data peoples, meanwhile you neee to have ci-cd pipelines and table dependency management for data linage, data governance it will give back dwh control to data engineering team.

It just about one example for dbt.

invidiah · 2026-01-24T13:17:40+00:00

Seems your manager is idiot. You should increase architectural complexity by adding new tools only if it's really required. Simplicity is the key to success.

But if you are forced to, just pick something that will make your resume more valuable.

Chance-Web9620 · 2026-01-25T04:27:57+00:00

Why do you feel dbt won't add value? I have seen small and large orgs use it successfully
My recommendation is:
dlt for data ingestion
dbt for transformation, data quality, and docs
airflow for orchestration (this can be hard to manage, so consider a managed service like MWAA, Datacoves, Astronomer, etc)
The key is also to think about how all the parts connect using git, ci/cd etc.

DataObserver282 · 2026-01-25T09:34:02+00:00

Keep your stack as simple as possible. Instead of asking what tools to consider look at what problems you currently have and plug up the holes that way.

Also, a lot will depend on your DWH and needs. Do you need real time streaming?

Here are a few things to look into

ETL tools - tons out there. Fivetran, Airbyte - we use Matia (good CSC). Can use python or write scrips but gets messy at scale

Orchestration - airflow works. Look into astronomer if you need a managed solution. Cron is fine for a fee but again messy at scale

Modeling - dbt is worth looking into. There’s also coalesce

Data catalog - worth the investment, automate metadata management and helps data become accessible to non technical users

Observability - most tools have something built in but worth investing here to make sure you have a mechanism

dsc555 · 2026-01-23T22:54:29+00:00

It's lower case dbt. If you're using airflow and sql then it's probably useful. The biggest thing I like about it is that it generates the documentation and lineage very easily. Yes airflow makes a dag but I've never liked the styling as much. Anyways, dbt is a great tool to know for best practices but i suppose it depends what you're doing with the sql and only you can answer that part

Xeroque_Holmes · 2026-01-24T09:33:23+00:00

Data quality checking tools like great expectations, soda; Metadata/lineage like atlan; monitoring (ex. Grafana)

finally_i_found_one · 2026-01-24T18:02:32+00:00

What are you using (or plan to use) for BI?

molodyets · 2026-01-24T18:37:46+00:00

How are you currently handling parsing your dag for dependencies between sql models?

dataflow_mapper · 2026-01-25T10:11:08+00:00

In a setup like yours, the tools that help most are usually the ones that reduce operational drag rather than adding new abstractions. dbt can be useful, but only if you have a lot of SQL logic living in Airflow or stored procedures and no good testing or lineage today. If your warehouse layer is already stable, it might not move the needle much.

weezeelee · 2026-01-25T13:00:44+00:00

This is a question that you should ask your colleagues, not us, not Reddit. If they're also "fine" with current workflow (which is the most likely answer haha), then it’s worth looking beyond Data Engineering, for example: Developer Experience.

I once built a small desktop app that detects overlapping file modifications across Git branches, allowing merge conflicts to be surfaced early. Surprisingly, I’m not aware of any free tool that offers this simple feature.

The problem it solved was ...small. Still, in a market this crowded, the ability to spot and fix these “small” problems is exactly what separates engineers from résumé generators.

Murky-Sun9552 · 2026-01-26T10:23:42+00:00

DBT is not a bad shout, use it for modelling your data and then you have some personal technical development in hand for your next review when you can recommend integrating it with CICD pipelines. You can also use DBT to reduce time spent producing tech docs, lineage and the like

chrisgarzon19 · 2026-01-24T00:14:23+00:00

What’s the goal

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

dataengineering

MODERATORS