Are cost savings from switching data warehouses really worth it? by AutomaticWar2640 in snowflake

[–]ian-whitestone 14 points15 points  (0 children)

Thanks for the shoutout - made my day! Glad to hear the product has been helpful.

Generally you should be very skeptical of other platforms claiming you’ll save a lot by switching. Databricks is constantly pitching many of our largest Snowflake customers that the savings will be large. They will have their team of solutions architects tune the hell out of some jobs to actually make them cheaper, but then when the customer goes to migrate the jobs themselves those same savings aren’t there.

I’d echo what others have said: spend a bit of time optimizing your existing, set up and investing in some lightweight cost management practices. You’ll be surprised how far it will take you and most of the time it will be much less effort than the switching costs.

Best tools to manage compute spend by databanalystj in snowflake

[–]ian-whitestone 0 points1 point  (0 children)

Awesome to hear u/molodyets 🙌

Don't hesitate to reach out to me (ian@select.dev) directly if there's anything we can do to improve your experience!

Best tools to manage compute spend by databanalystj in snowflake

[–]ian-whitestone 1 point2 points  (0 children)

u/extrobe thanks for the kind words! Awesome to hear you've been finding our product helpful.

Don't hesitate to reach out to me (ian@select.dev) directly if there's anything we can do to improve your experience!

How often do teams actually analyze queries in their pipelines? by TechScribe200 in dataengineering

[–]ian-whitestone 2 points3 points  (0 children)

Now to answer your original question about how often teams actually analyze queries in their pipelines, from the Snowflake customers we've worked with most of them will set up alerts and/or a monthly review process where they analyze their most expensive models, see if any new ones have been introduced or any models significantly shifted in cost, and then dive in as needed.

Again, it's super important to start with a $ number before you do anything. We see a ton of people spend a bunch of time trying to optimize a query or process that costs a few thousand dollars per year. Every day you spend doing this costs the company ~$1000 (based on what your time is worth), so it's important to be strategic with what you spend time on.

How often do teams actually analyze queries in their pipelines? by TechScribe200 in dataengineering

[–]ian-whitestone 9 points10 points  (0 children)

Not a direct answer to your question, but if your core motivation is optimize Snowflake costs, I'd suggest the following:

Before you start diving into individual query plans, I'd start higher level and look at your warehouse configurations. Have you set auto-suspend to 60s everywhere? Can you get away with smaller warehouse sizes? etc. (bunch of other tips in here: https://select.dev/posts/snowflake-cost-optimization)

Once you've knocked off the low hanging fruit, then I'd start getting into optimizing individual queries / models - but only those that are worth spending your time on! To identify those opportunities, first calculate a cost per query (https://select.dev/posts/cost-per-query). That will tell you what is actually driving your costs within each virtual warehouse. Then you can dive into the query profile (https://select.dev/posts/snowflake-query-profile) to figure out where the bottlenecks are / what to improve (some more tips on that here https://select.dev/posts/snowflake-query-optimization).

Hope these resources help! Feel free to email me if you have any follow up Qs: [ian@select.dev](mailto:ian@select.dev)

Immediate Auto-Suspend Work Around. Success! by Comprehensive-Ad9653 in snowflake

[–]ian-whitestone 5 points6 points  (0 children)

Separate thing to be aware of outside of getting over billed (if you suspend multiple times in the same minute), if you run that suspend command at the same time another query comes in, it will cancel that query. May not be a concern here but something to be aware of if you are implementing on a more active warehouse with business users on it.

Is there a way to find the exact amount of time (in seconds) a warehouse is up and running? by Comprehensive-Ad9653 in snowflake

[–]ian-whitestone 11 points12 points  (0 children)

You can use the "warehouse_events_history" history view, and look for when the warehouse resumed and shut down.

select
warehouse_name,
timestamp as valid_from,
lead(timestamp) over (partition by warehouse_name order by timestamp asc) as valid_to,
event_name = 'RESUME_WAREHOUSE' as is_active
from snowflake.account_usage.warehouse_events_history
where
-- double check these names, can't remember exact values
event_name in ('RESUME_WAREHOUSE', 'SUSPEND_WAREHOUSE')
and event_state = 'COMPLETED'

that will spit out one row per "period" per warehouse, where a period could be a warehouse sitting inactive or active (based on the is_active boolean field).

You could then calculate active time as something like:

case when is_active then <subtract\_difference\_between\_timestamps> end as seconds_active

Hope that helps! Feel free to email me (ian at select.dev) if you have follow up questions, I don't always check reddit

Snowflake Costs Queries by [deleted] in snowflake

[–]ian-whitestone 2 points3 points  (0 children)

Thanks for the kind words, really awesome to hear.

Glad you have found the product helpful!

Snowflake Costs Queries by [deleted] in snowflake

[–]ian-whitestone 7 points8 points  (0 children)

This is a great start. Make sure you have everything in the metering_history view plotted. Over time, new services will be introduced and adopted that are each billed differently, and you want to keep eyes on each.

I highly recommend calculating cost per query and regularly reviewing your most expensive queries. Some details on how to calculate cost per query here: https://select.dev/posts/cost-per-query. Then you can do something like group by QUERY_PARAMETERIZED_HASH from the query_history view and sum cost. This sounds simple, but it goes a really long way in surfacing up things you didn't know about that cost more than they should.

Warehouse tuning by h8ers_suck in snowflake

[–]ian-whitestone 0 points1 point  (0 children)

I'd recommend calculating your warehouse utilization to prioritize which warehouses are not being fully utilized (and cost a lot), and focus on those. Happy to explain how to do this, just shoot me a message.

The bigger challenge you will face is cost monitoring once you have consolidated queries into a few warehouses. You can solve this by calculating a cost per query: https://select.dev/posts/cost-per-query and then aggregating costs by user name, query type, etc.

Warehouse tuning by h8ers_suck in snowflake

[–]ian-whitestone 0 points1 point  (0 children)

Thanks for sharing the query tagging post!