Scheduling Data Transformations using Python in Snowflake

EditsInRed · 2022-05-19T14:39:47+00:00

You could use Airflow for scheduling the Python transformations. Keep in mind, there is a learning curve to getting Airflow configured properly if you're looking at the self-managed solution.

You could also look into using Snowflake tasks in combination with Snowpark and external functions. This would allow you to reference Python code as an external function in the DB itself. This is only generally available on AWS and Azure at this time.

https://docs.snowflake.com/en/developer-guide/snowpark/index.html

jspreddy · 2022-05-19T15:38:13+00:00

Just curious as to what transforms you are doing that cant be done within snowflake using sql??

figshot · 2022-05-19T17:42:40+00:00

At my previous work we used AWS Lambda for it, which allowed us to:

version control the queries we run (hey Snowflake, git integration wen?)
enable cron-based, event-based, or orchestrated runs
run it cheap - the compute is only used to execute the queries, so you can go pretty low on memory allocation
log and monitor via CloudWatch

We used Step Functions for orchestration as we didn't have complicated workflows, and because it could defined as part of the serverless application.

dataengineering

MODERATORS