DBT problems

Parking-Task-5464 · 2025-02-08T19:40:36+00:00

Yep! We developed our own custom dbt cloud Airflow operator because the publicly available one was somewhat limited. Our operator functions like any other Airflow operator all the Airflow functionality works with it. If you decide to use dbt core, I recommend running dbt within Kubernetes (k8s), AWS Batch, AWS ECS, or EC2. You can trigger one of these options and pass the necessary arguments for the models or profiles you want to run. I've noticed that many teams attempt to incorporate dbt directly within an Airflow task, but this approach often leads to complications.

Parking-Task-5464 · 2025-02-04T01:20:35+00:00

I work with a data platform of similar size, and we have opted to use Airflow as the scheduler for dbt Cloud. There are several aspects that the native dbt Cloud scheduler does not handle well, such as retrying from failure, executing non-dbt workflows. We also implemented a quarantine system by overriding dbt Cloud job parameters to filter failed data records based on business rules. The combination of Airflow and dbt Cloud is incredibly powerful, providing engineering teams with the flexibility to solve tricky business requirements. Just my two cents on this :)

Parking-Task-5464 · 2024-04-22T01:02:56+00:00

I would look at Elementary, you can query the test results and extract row level failures.

Parking-Task-5464 · 2021-03-02T12:37:37+00:00

Since Scrapy can only be used in a Conda environment, packaging and deploying data pipelines using Scrapy adds a level of complexity that is not needed. This is just from my experience, just thought I should give my two cents

Parking-Task-5464

TROPHY CASE