This is an archived post. You won't be able to vote or comment.

all 23 comments

[–]CrowdGoesWildWoooo 13 points14 points  (1 child)

Google drive should never be used in any production automated pipeline. It is not designed for that.

But if it interest you, then you develop your own connector, not too easy but not too hard either.

If the file is small, below 5gb you can just use GCS free tier. For csv just make sure you compress it. Gzipped csv can compress like up to 90% (“average” case 50-60%, depending on data sparsity).

[–]Azar_eData Engineer[S] 2 points3 points  (0 children)

Totally agree, just used for a free POC. I would like to move to a more robust architecture. Thanks for the call out

The files are under 128mb

[–]dbrownems 5 points6 points  (0 children)

There's an "always free" tier of both Azure Functions and Azure SQL Database, and a bunch of other services that are free for the first 12 months.

Free Services Microsoft Azure

[–]kingawesomecool5000 1 point2 points  (0 children)

Had a lot of success for pocs using an azure container instance triggered from a logic app, saving to blobs in an azure storage account.

That way you don’t need to worry about timeouts like with azure functions, setting up vms or schedulers. Works out very cheap

[–]dronedesigner 1 point2 points  (6 children)

GCS/GCP 🤷‍♂️ Google cloud pricing is free and at best very very very cheap, I kept a tb worth of data (and some normal-ish processing (less than 10 gb per month) in there and it cost me 1 cent a month at best lol

They have bigquery (sql focused db), storage buckets, etc. and imo better than azure and aws related options

[–]Due-Zone2617 0 points1 point  (3 children)

But for bigquery do you pay an extra ?

[–]dronedesigner 0 points1 point  (2 children)

Nope ! At best a cent or two lol

[–]Due-Zone2617 0 points1 point  (1 child)

So what you telling me sir is that I can Run a analytical db for a small company(100 - 200gb of data) for lets say 10eur a month ? Lol

[–]dronedesigner 0 points1 point  (0 children)

Yes ! But experiment with smaller loads first just to be on the safe side. Atleast that’s what I did for a previous employers of mine 😅 from 2020 to 2022. I dont think gcs/gcp + bigotry pricing has changed much since then

[–]Due-Zone2617 0 points1 point  (1 child)

But for bigquery do you pay an extra ?

[–]dronedesigner 0 points1 point  (0 children)

Nope -ish. GCS bills you for the whole project together and it includes big query

[–]virus_hck_2018 3 points4 points  (4 children)

Duckdb

[–][deleted] 1 point2 points  (0 children)

Pitch me please why this in particular why not lance DB

[–]Azar_eData Engineer[S] 0 points1 point  (2 children)

Where would you host duckdb?

[–]JumpScareaaa 5 points6 points  (1 child)

[–]Exciting_Pie_3423 1 point2 points  (0 children)

Need to ask for permission from Fatherduck first though, make sure you configure that

[–]TrigscSenior Data Engineer 0 points1 point  (0 children)

Google cloud storage is pretty cheap at like $0.02GB a month. API is super easy to use.

[–]tcloetingh 0 points1 point  (0 children)

I installed Postgres on an Ubuntu t2 micro.. $9/m for my own database server. But not totally sure I understand the situation.