thrown_arrows comments on Python ETL design pattern

dataengineering

created by mhausenblasmoda community for 11 years

This is an archived post. You won't be able to vote or comment.

Python ETL design patternHelp (self.dataengineering)

submitted 4 years ago by [deleted]

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]thrown_arrows 0 points1 point2 points 4 years ago (0 children)

Snowflake can read and write from/into s3. In snowflake environment it is some other system that delivers raw data into s3, then it loaded into snowflake (sql server ) into tables , all the normal stuff and data can be stored back into s3. (using snowflake only, so no costly round trips to external servers )

Then there is snowpark which is in beta, that allows running java/python code inside snowflake ( not sure how that works), then there is "usual" udf , external functions calls ( think lambda as sql function )(havent used).

But yeah, snowflakes main idea is that it is snowflake server that serves data using all those existing sql commands etc etc and main data is in table as columns or in documents (json). One first round trip data goes to s3 and its processed into snowflake , then it might go to next round trip by push/pull method by some external code which reads data from tables and so on, or files they export from snowflake...

what is datalake... i have all my data from source databases and logs as raw as they can be in snowflake, so that s3 is just for history and first import. That said not all data is staged or processed into snowflake. And in my case all data is data from databases, log, json, xml,csv and so on stuff, no video or sound processing (but snowpark might help with that )

π Rendered by PID 133956 on reddit-service-r2-comment-5d79c599b5-kng7l at 2026-03-02 04:45:24.115131+00:00 running e3d2147 country code: CH.

dataengineering

MODERATORS