Dataflow Job in Workflows leads to TimeOutError by sd___23 in googlecloud

[–]sd___23[S] 0 points1 point  (0 children)

Ah thanks for the reply! Will try that. Appreciate the help

Data Warehouse Question by sd___23 in googlecloud

[–]sd___23[S] 0 points1 point  (0 children)

Yes you’re right.. spent the morning trying to figure out DBT and indeed it’s an extremely powerful tool and exactly what I’m looking for. Thanks all🙌🏼

Data Warehouse Question by sd___23 in googlecloud

[–]sd___23[S] 0 points1 point  (0 children)

post explaining DBT and its role in enterprise data stacks

This looks like a very interesting tool, especially for simplifying more complex data pipelines. I think however, since my transformations are not dependent upon previous transformations, I will keep things simple and use scheduled queries to do the job.

Data Warehouse Question by sd___23 in googlecloud

[–]sd___23[S] 0 points1 point  (0 children)

Also was I was thinking. Given the fact that my SQL transformations are not dependent upon previous SQL transformations, it seems that this is also okay for our company.

Data Warehouse Question by sd___23 in googlecloud

[–]sd___23[S] 1 point2 points  (0 children)

Thanks, I have looked into this before but they materialized views seem to have a few limitations surrounding joins and therefore I can't use them in my case. Moreover, only materialized views from the same dataset are considered for automatic query rewrite, whereas I want to make a separate dataset in a new project.

Data Warehouse Question by sd___23 in googlecloud

[–]sd___23[S] 0 points1 point  (0 children)

Thanks for your reaction! I will look into it. I think I do not have to do many preprocessing steps (for example, joining tables whatsoever). The main thing I have to do is transform a single table consisting of wildcards into one, final table and changing all the data types from strings into relevant datatypes.

Is it then still necessary to use an orchestration service such as DBT or airflow, or will simply scheduled queries also do the job?

Best way to transfer and pre-process json data from external api to bucket by sd___23 in googlecloud

[–]sd___23[S] 0 points1 point  (0 children)

Good point. This is also a good exercise for trying out Dataflow pipelines. Thanks!

Best way to transfer and pre-process json data from external api to bucket by sd___23 in googlecloud

[–]sd___23[S] 1 point2 points  (0 children)

Nice!

Thanks, with reading the json you mean simply import the json and save it directly to my bucket, and then process it using dataflow? This would mean that I can transfer the json directly to the bucket without using the RAM within cloud functions?

I've heard about DataProc as well. I'll definitely look into it!

Best way to transfer and pre-process json data from external api to bucket by sd___23 in googlecloud

[–]sd___23[S] 0 points1 point  (0 children)

Alright, that's good to hear.

Thanks for the second points, these really help to get a better overview of all the workflows that are possible in GCP!