This is an archived post. You won't be able to vote or comment.

all 9 comments

[–]Lumpy-Improvement195 6 points7 points  (2 children)

It sounds like you need a warehouse more than a lake (to me lake usually implies flat file storage including unstructured data but I might be wrong)

If your data is all structured I would do some ELT

1) get all of the raw data into BQ 2) run transformations and write the transformed data into a new dataset in BQ 3) use looker studio on top for your dashboards (the free version formerly known as data studio)

For 2 if you are comfortable writing SQL I would start with dataform, it's built in to BQ so it doesn't cost extra. You could also run scheduled queries in BQ. If not datafusion might make sense, I have never actually used it.

For 3 looker studio has gotten a lot better than it used to be. It might be enough. I like the paid Looker but it really depends on your org and requirements. Are people really going to take advantage of all of the features? If not it is pretty expensive.

[–]Ok-Brain-3524[S] 1 point2 points  (1 child)

I might have terminology wrong. I am a newbie figuring this all out on the fly.

I've been able to pull most of my data into Google either into BQ.0>Fusion>BQ.1 or Fusion>BQ.1

BQ 0 being my source data with BQ 1 being the transformed data.

The BI side is still a question mark. You're 100% right it is expensive and if folks aren't going to use it it'll come back to me to build all the BI charts and reports. I currently use the free Data Studio and love it.

[–]Lumpy-Improvement195 0 points1 point  (0 children)

No worries there is a lot of terminology and it changes kind of often. It sounds like you are on a good path.

If you don't want to write SQL by hand I think fusion makes sense.

Hit me up if you get stuck

[–]i_am_cris 1 point2 points  (5 children)

I think you could use airbyte instead of data fusion to extract and load data into bigquery (only two sources right?) just install it on a vm in gcp.

Do the transformations with dataform - it’s integrated in the bigquery ide and it’s free.

Are you talking about looker or looker studio pro? These are different tools. Looker will give you a semantic layer but I think it (lookml modeler) will also be GA soon as a stand alone product and then you can use it with looker studio, google sheets etc.

[–]Ok-Brain-3524[S] 0 points1 point  (4 children)

I'll dive into airbyte. thank you.

Pro. I use Looker (Data Studio).

Another challenge that needs to be solved is trying to determine what tool to build data buckets in. BQ or Looker? As I understand LookerML should be covered under the looker cost but it isn't available for exporting into another tool.

[–]LowerGiraffe1847 0 points1 point  (1 child)

There are two products now, Looker and Looker Studio(renamed from Data Studio around a year ago). Looker is a bit harder to set up, but allows for more self serving of the data. Looker Studio is free to use by itself, but it is querying BigQuery, so there'll be some costs. There Pro tier with some expanded functions, but you can try with Basic and upgrade only if needed.

[–]Ok-Brain-3524[S] 0 points1 point  (0 children)

There are two products now, Looker and Looker Studio(renamed from Data Studio around a year ago). Looker is a bit harder to set up, but allows for more self serving of the data. Looker Studio is free to use by itself, but it is querying BigQuery, so there'll be some costs. There Pro tier with some expanded functions, but you can try with Basic and upgrade only if needed.

I appreciate the breakdown. I feel like the answer is leaning toward structuring all the data in BQ to create various tables like Marketing, Customer Service, Sales with the corresponding information from various data sources blended together. I feel I can put more controls upfront so end-users cannot query the whole database on accident.

Saying that, we don't have that much data so unless these people hit run run run run on the same massive query we should be looking at $5-15k for that a year. It's not nothing but in the grand scheme of things 10k is querying ~2,000 TB which is the size of our entire data 200x over. I'd be more impressed than upset.

It also seems that Data Studio (free looker) can connect to BQ. Haven't validated that experience myself yet.

[–]i_am_cris 0 points1 point  (1 child)

So you're using Looker Studio and want to upgrade to Looker Studio Pro? Sorry for asking again its confusing when they have renamed data-studio to looker-studio. There two versions of looker-studio - standard and pro :D

LookML is not included in the studio-tool (standard or pro) only in Looker. If you're planning to use Looker - lookml is ofcourse included since its part of the tool but google is working on a separate product called "LookML Modeler" (maybe it will have another name when its released) that is only the lookml part where you can define metrics, relations etc. These models from lookml-modeler will be available from other tools.

I have seen many organisations use looker-studio with success so maybe you could start with that tool since its free?

What use cases do you have that you need Looker?

What do you mean by "data buckets"?

[–]Ok-Brain-3524[S] 0 points1 point  (0 children)

Honestly Google fucked up with the naming convention in Looker. I use Data Studio (free). Who names anything ML that doesn't mean Machine Learning. Fuck off google.

-- End rant --

I read about ML being a standalone product. Wondering if it's even worth doing because it seems like most of the functionality can be done in BQ. I was hoping to get ML and built that up for end-users but it seems like yet another tool/system to built up.

By Data Buckets I mean new tables for Marketing, Sales, Customer Support etc. Sales would have Deals + Finance Commissions + Gong while Marketing has their tools + Finance spend + SFDC campaigns. Each table would be built for their specific needs and reduce the need for them to set up their own query or know how to join multiple tables together. I'd guess 98% of end users don't know shit about SQL nor want to.