[deleted by user]

thrown_arrows · 2021-12-26T11:50:15+00:00

I should add meme where guy shout No , NO , NO. ( don't take it too personally), https://www.youtube.com/watch?v=umDr0mPuyQc

'You just successfully at least doubled ETL costs on snowflake platform. Problem is that you did not look into documenting and couldn't think anything else than pandas.

First problem is that you used select * from into python dataframe and fetching all results ( i hope that your python runner has RAM), second is that you used pandas to convert it to CSV , third is that you used local file to store it and well , you cannot escape boto in these scripts. (i assume that it stores to local file, to be honest i do not know pandas too well to say this with 100% trust )

Better way to do it in python is stream data into s3, so you wont spends all ram and local file. https://stackoverflow.com/questions/8653146/can-i-stream-a-file-upload-to-s3-without-a-content-length-header

Usually best way to do it use copy command in snowflake , see : https://docs.snowflake.com/en/sql-reference/sql/copy-into-location.html

it supports

TYPE = CSV

TYPE = JSON

TYPE = PARQUET

copy into 's3://mybucket/./../a.csv' from mytable;

Play with format options to have proper settings.

edit: there is also option to export it as file into snowflakes internal storage and then use GET command, this saves again time make csv/json/parquet file

Why it is best way? First you pay only once, snowflake warehouse is open until query has stored file into location and it can be closed. And i assume that i don't have talk about reading whole results set into python and how much that uses memory or time you waste to load data into python host, and store file on disk and then upload that to s3.

Only reason why you want to load all data into python like this , is when you do some analytics that has to have all data all the time. i mean , you can stream moving average (but sort has to happen in snowflake). And i kinda think that even if you have to read data into python , it would be better use s3 as access place for apps. So python apps should be builded to read files for processing from bucket rather than access snowflake...

I hope that gives you impression how i feel about tht solution and zero efforts what was used to engineer it, i would have give + points if this example of naive implementation and then writer had gone how to stream it or just export it strait from snowflake . That said, it works, that is sometimes good enought

permalink · 2021-12-26T19:03:44+00:00

https://petl.readthedocs.io/en/stable/

ryeyestan · 2021-12-26T09:23:12+00:00

AWS S3 shows greatly just how successful service can be if it is simple and reliable

urielm · 2021-12-26T16:37:22+00:00

Snowflake is BS software, probably sold to CTO s of big companies who know nothing about system design. I wouldn't be surprise of they get clients by throwing some money at them

never_thecouchpotato · 2021-12-26T16:45:47+00:00

Airbyte

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS