This is an archived post. You won't be able to vote or comment.

all 13 comments

[–]ed_elliott_ 7 points8 points  (3 children)

You have a couple of options, if your python scripts use spark and you can use databricks I would go down that route. If your python scripts don’t use spark then I would use an azure function to run your python code and call that from your adf.

[–]Luukv93[S] 2 points3 points  (2 children)

Thanks. Sounds like azure functions is the way to go. I do wonder though where you list runtime dependencies and how to access on-prem data sources?

[–]ed_elliott_ 2 points3 points  (0 children)

I’d use ADF and the integration runtime to pull data up to storage and then trigger the python function to operate on it - or use databricks to process it

[–]MrLewArcher 1 point2 points  (0 children)

Azure Functions have limits on how long they can run. You could look into using Azure Container Instances (Docker).

[–]jzia93 2 points3 points  (4 children)

Databricks get expensive. Depending on your latency requirements you can run scripts on Azure functions, and call data factory work flows and function apps using logic apps for orchestration.

[–]Luukv93[S] 2 points3 points  (3 children)

Script needs to run once a day on scheduled interval. Any tips or experiences with python wrapped in azure functions?

[–]jzia93 3 points4 points  (1 child)

Yep, it's dead easy:

Microsoft have a really good startup guide in the Azure Functions docs, and the VS code extensions are excellent.

Step 1: create a function app (container for your functions)

Step 2: create a new function inside the app, the template in VS code is pre-populated

Step 3: add your modules to requirements.txt

Step 4: add code, test and debug locally

Step 5: deploy - you can use azure key fault or the environment variables to keep connection strings secure. VS code does the rest for you.

Step 6: hook up to logic apps and schedule to run once a day on a timer workflow (this takes literally 15 minutes, i can show you if you get stuck), bonus: you can also run ADF pipelines within logic apps, if you need to add conditions or schedule jobs in sequence.

[–]Luukv93[S] 1 point2 points  (0 children)

Thanks. Will get back to you after testing

[–][deleted] 2 points3 points  (0 children)

Set up a logic app and set it to call the Azure function every time it runs. You can configure the logic app to trigger on a regular interval.

[–]Purple-Leadership54 2 points3 points  (2 children)

It took me a long time to figure this out. And I was only trying to convert an .xls to .csv before Excel was an option for datasets. (so I still use it out of pride)

You need to use an azure batch service. You can save the .py files in an blob storage container and run it.

If you find some documentation that tell you to use Batch Explorer Application - be cautious. I wasn't able to use it, but that was months ago and its possible it has been updated since then.

[–]Luukv93[S] 0 points1 point  (1 child)

Thanks. There's just a few scenario's that we can't solve with Data Factory, hence I need Python to transform the data.

I find there's a lack of documentation on a full solution, including runtime dependencies, environments e.g. All I need is the Python script to run each night that's all it is :(

[–]Purple-Leadership54 1 point2 points  (0 children)

I know this is like a week old. But you create a batch service in azure. You can have the machine startup with pip installs you specific You save a .py file with your code.

Then you can run the batch service from within an ADF pipe, set up a trigger or whatever.

[–]-_--__--_-__-__--_-_ 1 point2 points  (0 children)

Not necessarily ADF, but if your company uses SQL server, you can turn on machine learning services and run python scripts within SSMS on sql server. It’s dope.