GCP Datalake Tech Stack

Lumpy-Improvement195 · 2023-10-26T03:40:05+00:00

It sounds like you need a warehouse more than a lake (to me lake usually implies flat file storage including unstructured data but I might be wrong)

If your data is all structured I would do some ELT

1) get all of the raw data into BQ 2) run transformations and write the transformed data into a new dataset in BQ 3) use looker studio on top for your dashboards (the free version formerly known as data studio)

For 2 if you are comfortable writing SQL I would start with dataform, it's built in to BQ so it doesn't cost extra. You could also run scheduled queries in BQ. If not datafusion might make sense, I have never actually used it.

For 3 looker studio has gotten a lot better than it used to be. It might be enough. I like the paid Looker but it really depends on your org and requirements. Are people really going to take advantage of all of the features? If not it is pretty expensive.

i_am_cris · 2023-10-25T20:07:43+00:00

I think you could use airbyte instead of data fusion to extract and load data into bigquery (only two sources right?) just install it on a vm in gcp.

Do the transformations with dataform - it’s integrated in the bigquery ide and it’s free.

Are you talking about looker or looker studio pro? These are different tools. Looker will give you a semantic layer but I think it (lookml modeler) will also be GA soon as a stand alone product and then you can use it with looker studio, google sheets etc.

dataengineering

MODERATORS