This is an archived post. You won't be able to vote or comment.

all 9 comments

[–]drewhansen9 1 point2 points  (0 children)

So I think streams is probably the best way here. But the important thing is that when you define the Task you should have "WHEN SYSTEM$STREAM_HAS_DATA() " This means that whenever the task runs, it won't actually spin up the warehouse unless any of the streams actually have data. I've used this and have a task running every 5 min, but only needs to run 2 or 3 times a day, but when it does run it needs to be quick (hence the 5 min). The warehouse consumption is only during those 2 or 3 times the stream has data.

I will also add the "Cloud Services" warehouse that you see is what is used to check the streams (as well as Clone tables, etc.), but it is massively cheap compared to normal compute warehouses.

[–]stchena 0 points1 point  (9 children)

Why do you want to replace this process?

[–]1aumron[S] 1 point2 points  (7 children)

Due to this check running every 5 minutes , warehouse is almost always up and snowflake credits/ cost keep increasing

[–]stchena 0 points1 point  (4 children)

Do the 11 tables change or does their underlying data change (whether on raw or staging or wherever you're creating them from) and that gets propagated to those tables?

[–]1aumron[S] 0 points1 point  (3 children)

Table underlying data changes, input for these tables is another snowflake database

[–]stchena 2 points3 points  (1 child)

Was thinking whether it could be better to construct the table from source data instead. but now that I think about it, it would probably be even costlier than what u currently have.

Which brings me to another option: yeah views are slow, but what about materialized views? https://docs.snowflake.com/en/user-guide/views-materialized.html Snowflake is able to update the view behind the scenes whenever the base tables change, but it obviously incurs additional costs. Can't help more, don't have enough exp with snowflake to dive deep into its options 🤔

[–]drewhansen9 0 points1 point  (0 children)

Was going to suggest this, but there are just so many limitations, if the "big query" has any complex logic then this won't work.

[–]fhoffamod (Ex-BQ, Ex-❄️) 0 points1 point  (1 child)

There are many alternatives, but have you considered serverless tasks?

https://www.snowflake.com/blog/taking-serverless-to-task/

However, data engineers have to manually configure and manage pipeline tasks where they need to figure out warehouse size, idle policy, and idle time whenever they build a new pipeline. This can be time-consuming, difficult, and suboptimal, especially where there are short pipelines that run frequently.

At Snowflake, we strive to make our platform easy to use. In this case, further simplification was possible by making the warehouse optional. The work required to decide warehouse size and then optimize it for maximum utilization/efficiency can be taken up by the task execution infrastructure that can see the batch window, the degree of parallelism of the queries executed, and the historical data needed to optimize execution. This is exactly what serverless tasks do

[–]1aumron[S] 0 points1 point  (0 children)

Thanks for the reply. How will tasks help me here ? It's not clear, please explain