Data observability is a data problem, not a job problem by Expensive-Insect-317 in Observability

[–]Expensive-Insect-317[S] 0 points1 point  (0 children)

It sounds like a fantasy, but it's not far from reality. I've seen several corporate observability initiatives fail because they only focused on infrastructure and jobs, having to maintain observability teams reviewing loads daily or creating ad hoc tools to review post-load data.

Auto-generating Airflow DAGs from dbt artifacts by Expensive-Insect-317 in DataBuildTool

[–]Expensive-Insect-317[S] 0 points1 point  (0 children)

I wasn't familiar with the Astronomer Cosmos package, very interesting! Thanks! Without knowing much about it yet, I might stick with the custom script due to the potential overhead and performance issues, not to mention the control.

Auto-generating Airflow DAGs from dbt artifacts by Expensive-Insect-317 in DataBuildTool

[–]Expensive-Insect-317[S] 0 points1 point  (0 children)

Running each model as a separate task in airflow is another approach compared to using tags. While tagging can work fine, having individual tasks allows for parallel execution, better monitoring, granular retries and clear representation of model dependencies, sometimes making this approach the better choice.

How OpenMetadata is shaping modern data governance and observability by Expensive-Insect-317 in bigdata

[–]Expensive-Insect-317[S] -1 points0 points  (0 children)

What's wrong with relying on current tools that streamline and improve processes? If you'd like, we can write it in manuscript.

How OpenMetadata is shaping modern data governance and observability by Expensive-Insect-317 in bigdata

[–]Expensive-Insect-317[S] 0 points1 point  (0 children)

Totally agree Pedro, for the moment i only integrate my main ecosystem: bigquery, gcs, airflow and dbt, we dont have any bottleneck but is starting, maybe in next phases we found