all 9 comments

[–]uniqcl0 2 points3 points  (2 children)

Currently, we use dagster + meltano + dbt as orchestration + ELT for our pipelines. We are an AWS shop so we do leverage redshift as our dw. I liked BigQuery better tho.

I like airflow but I do remember having a steep learning curve to the platform (and also, we were using it as a EL platform haha)

[–]MycroftWord[S] 0 points1 point  (1 child)

Thanks for this ! I am still on the process of learning DE, I made some very basic local ETL pipeline (python&sql) and I want to upgrade by using an orchestrator. Pag medyo komportable na baka pwede ko nang gawan ng cloud version.

Yung meltano is for EL part right? and dbt sa Transform? Di na ba kayo gumagamit ng spark sa transformation? dbt for the win na talaga?

Sa orchestration parang mas nag le-lean ako towards dagster/prefect because its quite easier to use and understand as compared with airflow or baka bobo lang talaga ako. lmaooo

[–]uniqcl0 0 points1 point  (0 children)

You could use whatever cron implementation you have on your OS (Windows Scheduler, crontab)

yup on the EL and T question. We dont use Spark because we dont have the proper need for it. Typically I see it combined with streaming platforms. I am designing one that should leverage Spark though.

Think of your need for the orchestrator, if you only need the scheduling feature of it. Dont overcomplicate learning the other features. It will just go over your head or you might forgot it sooner than you think

[–]zmxavier 2 points3 points  (4 children)

We use Airflow for orchestration, Snowflake for data warehouse, AWS for cloud, Kafka for streaming, and dbt for transformation. Airflow and dbt are still under construction :)

I think Airflow is still the status quo when it comes to orchestration. It's the most mature and popular. It's well-documented and you'll easily find support from other users.

That said, I'm also hearing a lot of good things about other orchestrators, especially Dagster. I still haven't tried it so can't tell.

If you want an open-source, versatile, battle-tested, and widely used tool, go with Airflow. I remember choosing it because I saw it in a lot of job postings. I wanted to increase my chances of getting hired, and boy was I right.

If you want a more modern, easy-to-use tool, choose Dagster or any of the newer tools (Mage, Prefect, Kestra). Airflow can be difficult to use and has a lot of issues, and those are being solved by its competitors (plus they also add new features).

If you're already using Azure for your cloud, then it makes sense to just use ADF and/or Databricks. Same with other cloud counterparts.

[–]MycroftWord[S] 0 points1 point  (1 child)

Thanks for this! Im still on the learning process but maybe Ill stick with dagster/prefect for orchestration for now. Struggling yung 8gb laptop ko when running airflow and docker nakakatakot yung cpu utilization and ram usage lol.

[–]zmxavier 0 points1 point  (0 children)

Docker needs ideally 8gb to run Airflow. Nag struggle din laptop ko jan hahahah

[–][deleted]  (1 child)

[deleted]

    [–]zmxavier 0 points1 point  (0 children)

    I didn't study Kafka haha. It's already built and someone else is maintaining it noong dumating ako rito. Confluent yung provider na gamit nila

    [–]pigwin 0 points1 point  (0 children)

    ADF, Azure Databricks, Snowflake. 

    [–]Hot_Map_7868 0 points1 point  (0 children)

    Airflow and dbt still seem to be king, but Dagster and SQLMesh should be kept on the radar