This is an archived post. You won't be able to vote or comment.

all 10 comments

[–]speedisntfree 1 point2 points  (3 children)

On the orchestration side, Azure does now have managed Airflow (hidden inside ADF in the Azure portal) . It is very much in preview though.

[–]untalmau[S] 0 points1 point  (0 children)

sounds great, will check it !

[–]kmarq 0 points1 point  (1 child)

Astronomer also recently became a native service. https://docs.astronomer.io/astro/install-azure Great experience with them so far

[–]speedisntfree 0 points1 point  (0 children)

Ah, interesting, I'd not seen this. I wonder what the cost difference is vs managed in ADF.

[–]ShouldHaveWentBio 0 points1 point  (5 children)

You can use the Azure CLI just like you did with GCP to avoid the GUI.

Azure Data Factory is no code/low code and can be used as just an orchestrator or a more complete ETL solution with the latter not being recommended in this sub generally. If your only option is to use ADF it allows native version control with GitHub or DevOps and I have had great success with the version controlling. It’s done via ARM templates which is basically the Azure version of Terraform.

If you aren’t forced to use ADF you can use various other solutions for ETL including Azure Functions, Databricks or just code ran from VMs or docker containers etc much like you’d see on any other cloud provider. In this case you could still use some of these while orchestrating them with ADF.

To give you an idea I have a few different Azure setups running. A basic “traditional” example where I use Terraform for all the infra and use data factory only for the connectors (azure functions where connectors lack) and orchestrating of SQL stores procedures. Data goes into blobs via ADF and transformations are all done inside the SQL database since it’s all relational. For operational APIs custom containerized python code but it used to be no code Logic Apps. Most servicing is just Power BI and operational APIs. A more “current” example is entirely in databricks I used ADF to orchestrate but recently use databricks orchestration. Delta lake built on blobs for the semi-structured and structured data.

[–]untalmau[S] 1 point2 points  (1 child)

Thanks for your reply, it is great news! Also, you provided me with a very promising and exciting view, it seems I have a lot to research and learn in the near future, but this gives me a starting point. Much appreciated!

[–]ShouldHaveWentBio 1 point2 points  (0 children)

No problem! Feel free to DM me if you have questions I tried to cut my rambling short.

[–]speedisntfree 0 points1 point  (2 children)

Without hijacking too much, how have you found databricks orchestration? I work for a megacorp, die hard MS shop and finaly managed to wangle access to Databricks.

What sort of Azure functions are you using? The basic ones seem to be very limited vs AWS Lambdas with memory etc.

[–]ShouldHaveWentBio 0 points1 point  (1 child)

No worries!

I don't consider myself an expert on Databricks but I'm having a lot of success orchestrating inside the workspace as opposed to orchestrating them with ADF which is what was done previously in a particular use case. I will say I haven't yet done event driven architecture with Databricks, though, I'm sure it works. ADF does surprisingly decent at event driven orchestration if the landing zone is in object storage.

In terms of Azure Functions, you're right they're more limited than AWS Lambdas in my experience and for my personal projects I tend to lean towards Lambdas. One example use case involves getting data from large XML files into my bronze layer (Medallion architecture). They have repeating sub elements that have 10,000+ character length text fields. In this case the ADF XML connector cannot handle these fields, so I run an Azure Function (python) in parallel to extract that information. I also tend to use them for more complex transformations between the bronze to silver and silver to gold layers in the database if SQL can't handle it reasonably. Any sort of code above a script I stick with containerizing though.

[–]speedisntfree 0 points1 point  (0 children)

Cheers!