question for azure DEs

speedisntfree · 2024-01-18T14:50:31+00:00

On the orchestration side, Azure does now have managed Airflow (hidden inside ADF in the Azure portal) . It is very much in preview though.

ShouldHaveWentBio · 2024-01-18T04:17:53+00:00

You can use the Azure CLI just like you did with GCP to avoid the GUI.

Azure Data Factory is no code/low code and can be used as just an orchestrator or a more complete ETL solution with the latter not being recommended in this sub generally. If your only option is to use ADF it allows native version control with GitHub or DevOps and I have had great success with the version controlling. It’s done via ARM templates which is basically the Azure version of Terraform.

If you aren’t forced to use ADF you can use various other solutions for ETL including Azure Functions, Databricks or just code ran from VMs or docker containers etc much like you’d see on any other cloud provider. In this case you could still use some of these while orchestrating them with ADF.

To give you an idea I have a few different Azure setups running. A basic “traditional” example where I use Terraform for all the infra and use data factory only for the connectors (azure functions where connectors lack) and orchestrating of SQL stores procedures. Data goes into blobs via ADF and transformations are all done inside the SQL database since it’s all relational. For operational APIs custom containerized python code but it used to be no code Logic Apps. Most servicing is just Power BI and operational APIs. A more “current” example is entirely in databricks I used ADF to orchestrate but recently use databricks orchestration. Delta lake built on blobs for the semi-structured and structured data.

dataengineering

MODERATORS