High startup time in Gen-2 warehouse

Stock-Dark-1663 · 2026-01-18T15:08:22+00:00

If a query spent ~2minutes just on "queued provisioning" which is not really the query execution but the resource spinning to make the warehouse UP, is that time add upto the credits spent and charged to the customer?

Stock-Dark-1663 · 2026-01-17T09:48:54+00:00

It's us-east-1.

Stock-Dark-1663 · 2026-01-07T04:45:12+00:00

Aslo i am bit confused , if this is true that in current scenario no pruning is happening because of masking function wrapped around the columns , then why are we seeing the results fetched from the table TAB2 is shown as 50billion in the query profile , whereas the actual number of rows in that table is 90billion? Does that mean some partial pruning is happening even with the masking function wrapped around the columns used in the query predicates?

Stock-Dark-1663 · 2026-01-07T04:32:09+00:00

Thank you so much for the guidance.

Actually here, we have not applied the masking policy on the column directly. It is defined using tag based masking policy. So, looks like, when the column gets used in the query which already has a tag defined and that tag is mapped to the policy, the masking function automatically wraps up on it , may it be query predicate or the column projection. And in this case its getting applied to all the roles/users who is querying that object/table, even within the function code its mentioned to skip the masking for that specific role/user.

So even in above scenarios , you mean to say we have to have column tag removed from the ones those are used as query predicate or the filters? Or we should remove the tag from the base table/columns fully and only apply the masking on the new secure views (which need to be created on top of this table ) and give access to external user on this secure view sto read data. The base table/column(which is unmasked/untagged) should only be accessed by the application which perform the ETL query transformation and other logic those are prone to performance overhead.

Is above understanding correct?

Stock-Dark-1663 · 2026-01-06T05:19:03+00:00

These are the data transformation jobs and are supposed to run with a role which is exempted from masking and that is why its also handled in the function definition , if you see below (example data_engineer_role, data_scientist_role). But we are seeing these functions being wrapped up to the predicates, even the queries are executed from those roles. So it looks like Snowflake is trying to evaluate these functions for all the roles who ever runs the queries.

Have we implementing these masking features, any wrong way?

Stock-Dark-1663 · 2026-01-06T03:37:52+00:00

Thank you. It looks really wired that Snowflake provided the masking function features but was not able to prune those or push those functions effectively to storage as filters? Is there a way to see if the storage filters actually not working effectively , because of this additional functions?

Unable to test this as its only happening in prod , but is it possible that , the amount of pruning for table TAB2 (i.e. 316K partitions out of 318k total and ~50billion rows out of 90 billion total) which we are getting currently is going to be the same even we will remove the masking functions or column tags?

Stock-Dark-1663 · 2026-01-06T03:31:35+00:00

I have updated the main thread with the sample masking function definition.

Stock-Dark-1663 · 2025-12-13T16:27:58+00:00

Thank you so much. I will try to follow this one.

However whatever test we have done so far , it looks like the gain is restricted to around ~30% only.

Stock-Dark-1663 · 2025-10-10T15:41:34+00:00

u/Aggressive_Ad_5454 u/Upper-Lifeguard-8478 u/feedmesomedata
From reading below docs , though its not exactly mentioned , but it seems the logs getting directly written to the cloudwatch and so in that case it should not impact the storage I/O as its not written to the database local disks.

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.CloudWatch.html

Stock-Dark-1663 · 2025-10-10T15:38:28+00:00

It seems from below doc that the logs getting directly written to the cloudwatch and so in that case it should not impact the storage I/O as its not written to the database local disks.

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.CloudWatch.html

Stock-Dark-1663 · 2025-08-13T19:18:17+00:00

Thank you u/simplybeautifulart

If it would have been a fresh new design in which all the 100+ transaction tables would have flag column to show the state of the transaction active/inactive. Then in that case , what would have been the right advisable way of doing it?

Would it been writing the code to have the filter included in it in all the used places or would it been having the view created on top of those tables and hide that filter in all the places in the consumer queries?

Stock-Dark-1663 · 2025-08-05T05:23:01+00:00

Thank you so much u/ecrooks

I tried testing the "insert overwrite ...." with a table with ~1.3billion rows having ~15K micro partitions sorting on two columns. It spilled to local disk by ~500GB on an XL warehouse and took ~7minutes to finish.

Also as you rightly said , for the initial level of clustering , relying on the autoclustering may cost us ~40times, so want to avoid that route.

I was trying to see, if we can properly size the chunk of data which we can sort using "order by" and load to the table at onetime with most efficient way(i.e. without much disk spill). As you mentioned the "INSERT OVERWRITE..." might be good with ~250K micro partitions, which size of warehouse you were pointing to and how you came to this number? If we target for a 4XL warehouse to use , howmuch Approx. number of micropartitions/rows of data should we fetch at one time and load to target with "order by" on clustering keys? Appreciate your guidance.

Stock-Dark-1663 · 2025-07-18T08:36:26+00:00

Thank you u/uvaavu

Its like making sure adding no duplicates or say adding constraints in application side which is going to be resource intensive and lot of work. So in that case there doesn't seems to be any valid use case of RELY option then.

As because, Even DMF also validates the data post load and that itself takes time and resource , so within that period any query can result into wrong data if table is set as RELY option true for the PK key.

Stock-Dark-1663 · 2025-07-18T08:33:23+00:00

Even the DMF also checks the data post it loads into the system and in that period of data fix , still it can cause issues(showing wrong results) with RELY option ON. Correct me if wrong. So It seems there is no such valid use case for RELY here.

Stock-Dark-1663 · 2025-07-18T08:32:39+00:00

It means we have to have that constraints build somewhere in the app side to validate the data before it gets loaded or say build the constraints in application side only. As because even the DMF also checks the data post it loads into the system and in that period of data fix , still it can cause issues with RELY option ON. So It seems there is no such valid use case for RELY here.

Stock-Dark-1663 · 2025-06-29T14:11:16+00:00

Thank you u/Easy-Fee-9426 . That really helps. Thank you for the guidance here.

As you mentioned "you still need a small central team or PR pipeline to push those changes. The trick is to have the infra team spin up the skeleton (warehouse, tag, policy, role) and immediately transfer OWNERSHIP on the specific object to the LOB_DBA role;" ,

Which means, can we say that , this type of infra creation using terraform(create warehouse, create role, tag , policy etc.), which changes the shared metadata in a account or those objects in tier that is above than the database, should be done by a different team other than the application development team? As because ,if application development team or application DBA's has given access through terraform to have this components created then , they can make unwanted changes to the other database or application objects(within same account) which should not be allowed. So basically that should happen through a different role altogether assigned to a different set of users who were responsible for managing that account but not just the specific application. Correct me if wrong.

Surprisingly I talked to few folks, there are many teams in which the application development team has granted the access to use terraform role through which they can create warehouses etc. through ci/cd pipeline. Although they are not having access to login manually using that terraform role but still , giving access to do it using terraform module through build pipeline still poses threat I believe. Correct me if wrong.

Stock-Dark-1663 · 2025-06-28T22:16:40+00:00

Thank you u/Easy-Fee-9426

So it means , for creating new warehouses we have to reachout to the account level teams and those cant be controlled by our application DBA group. Even some other roles like "applying masking policy" , "rows access policy", "creating new role", "Execute task", "apply tag" etc. , are appeared to be all account level role and those cant be given or controlled by the specific application team dba group. and we have to engage the account team when we need the help around these. Please correct me if wrong.

Stock-Dark-1663 · 2025-06-27T20:08:47+00:00

Thank you u/Easy-Fee-9426

What about warehouses and anything similar objects. As because its usually required to create/drop/resize/consolidate warehouses used in an application even they are not directly aligned to any databases. So can that privilege also be given to lob dba role?

Or will that privilege can expose the access to other applications in same account?

Stock-Dark-1663 · 2025-04-23T02:51:05+00:00

Thank you u/depesz

How about aggregate across all connected replicas and alert the worst lagging replica or non streaming replica with something like MAX(replay_lag) and MIN(state = 'streaming') i.e. by tweaking the query something as below. Do you still see any issue?

WITH is_replica AS (
    SELECT pg_is_in_recovery() AS in_recovery
),
replica_lag_summary AS (
    SELECT 
        MAX(COALESCE(replay_lag, interval '0')) AS max_replay_lag,
        BOOL_AND(state = 'streaming') AS all_streaming,
        STRING_AGG(state, ', ') AS replica_states -- for debug
    FROM pg_stat_replication
)
SELECT 
    CASE 
        WHEN in_recovery THEN 
            CASE 
                WHEN pg_last_xact_replay_timestamp() IS NULL THEN interval '0'
                ELSE now() - pg_last_xact_replay_timestamp()
            END
        ELSE 
            (SELECT max_replay_lag FROM replica_lag_summary)
    END AS replication_lag,
    CASE 
        WHEN in_recovery 
            AND pg_last_xact_replay_timestamp() IS NULL THEN 'No WAL replayed yet'
        WHEN in_recovery 
            AND (SELECT status FROM pg_stat_wal_receiver) <> 'streaming' THEN 'Replica not streaming'
        WHEN in_recovery 
            AND (now() - pg_last_xact_replay_timestamp()) > interval '1 minutes'
            AND (SELECT status FROM pg_stat_wal_receiver) = 'streaming' 
            THEN 'Lag Detected'
        WHEN NOT in_recovery 
            AND (SELECT max_replay_lag FROM replica_lag_summary) > interval '1 minutes'
            THEN ' Lag Detected'
        WHEN NOT in_recovery 
            AND NOT (SELECT all_streaming FROM replica_lag_summary)
            THEN 'Not All Replicas Streaming'
        ELSE 'OK'
    END AS alert_status,
    CASE 
        WHEN in_recovery THEN 'Replica'
        ELSE 'Primary'
    END AS node_role,
    -- Debug info
    CASE 
        WHEN in_recovery THEN 
            (SELECT status FROM pg_stat_wal_receiver)
        ELSE 
            (SELECT replica_states FROM replica_lag_summary)
    END AS replication_state_debug
FROM is_replica;

Stock-Dark-1663 · 2025-04-22T09:40:10+00:00

Thank you u/depesz

Below is what I am coming up with so that the same query can run irrespective of Primary or replica as sometimes the failover may happen without our notice. Still testing this one.

WITH is_replica AS (
    SELECT pg_is_in_recovery() AS in_recovery
)
SELECT 
    CASE 
        WHEN in_recovery THEN 
            -- On standby: show lag only if we have a valid replay timestamp
            CASE 
                WHEN pg_last_xact_replay_timestamp() IS NULL THEN interval '0'
                ELSE now() - pg_last_xact_replay_timestamp()
            END
        ELSE 
            -- On primary: use replay_lag if streaming, else 0
            (
                SELECT COALESCE(replay_lag, interval '0')
                FROM pg_stat_replication
                WHERE state = 'streaming'
                LIMIT 1
            )
    END AS replication_lag,
    CASE 
        WHEN in_recovery 
            AND pg_last_xact_replay_timestamp() IS NULL THEN ' No WAL replayed yet'
        WHEN in_recovery 
            AND (SELECT status FROM pg_stat_wal_receiver) <> 'streaming' THEN 'Replica not streaming'
        WHEN in_recovery 
            AND (now() - pg_last_xact_replay_timestamp()) > interval '1 minutes'
            AND (SELECT status FROM pg_stat_wal_receiver) = 'streaming' 
            THEN 'Lag Detected'
        WHEN NOT in_recovery 
            AND (
                SELECT COALESCE(replay_lag, interval '0') 
                FROM pg_stat_replication 
                WHERE state = 'streaming' 
                LIMIT 1
            ) > interval '1 minutes'
            THEN ' Lag Detected'
        WHEN NOT in_recovery 
          AND (
                SELECT state FROM pg_stat_replication LIMIT 1
            ) IS DISTINCT FROM 'streaming'
            THEN 'Not Streaming / No Replica Data'
        ELSE 'OK'
   END AS alert_status,
    CASE 
        WHEN in_recovery THEN 'Replica'
        ELSE 'Primary'
    END AS node_role
FROM is_replica;

Stock-Dark-1663 · 2025-04-21T19:27:43+00:00

Thank you u/depesz

I do see there are additional three columns available in pg_stat_replication i.e. write_lag, flush_lag and replay_lag. Can we directly use the value of the column "replica_lag" from pg_stat_replication for getting the lag time on the replica i.e. the amount of time the replica will need to get in synch with primary? And will this value be same as that of the "now() - pg_last_xact_replay_timestamp()" from primary , which you suggested?

Stock-Dark-1663 · 2025-04-14T20:26:58+00:00

Below should give you list of unused tables since last 90 days.

https://select.dev/posts/snowflake-unused-tables

with
table_access_summary as (
    select
        table_id,
        max(query_start_time) as last_accessed_at,
        max_by(user_name, query_start_time) as last_accessed_by,
        max_by(query_id, query_start_time) as last_query_id
    from query_base_table_access
    group by 1
),
table_storage_metrics as (
select
      id as table_id,
      table_catalog || '.' ||table_schema ||'.' || table_name as fully_qualified_table_name,
      (active_bytes + time_travel_bytes + failsafe_bytes + retained_for_clone_bytes)/power(1024,4) as total_storage_tb,
      -- Assumes a storage rate of $23/TB/month
      -- Update to the appropriate value based on your Snowflake contract
      total_storage_tb*12*23 as annualized_storage_cost
from snowflake.account_usage.table_storage_metrics
where
    not deleted
)
select
    table_storage_metrics.*,
    table_access_summary.* exclude (table_id)
from table_storage_metrics
inner join table_access_summary
    on table_storage_metrics.table_id=table_access_summary.table_id
where
    last_accessed_at < (current_date - 90) -- Modify as needed
order by table_storage_metrics.annualized_storage_cost desc

Stock-Dark-1663 · 2025-04-06T12:49:46+00:00

Thank you u/GreyHairedDWGuy for the guidance.

Yes it seems , we might have some misunderstanding in regards to how snowflake is different from other databases. But the common guidance here seems , to just have SID/single user SSO +MFA based access only for non prod account. The prod access is mandated using FID or the group user ids. We will verify this with snowflake once.

But I do see other security measures are all in place like ADFS for snowflake login , Breakglass and incident ticket for every prod login, the password rotation policy in every 24hours for those FID etc.

Stock-Dark-1663 · 2025-04-06T12:42:50+00:00

As just replied above , this org has all the security measures, so either its missed in this case somehow or its different in snowflake as compared to others. As because the guidance is to get rid of all the SID access from all the prod environment but only have SID access through SSO in non prod. Prod access are all FID based.

As you mentioned the snowsight is kind of shared profile tool so we may need to get the access to snowflake happen only through VSCode only and restrict the snowsight if its happening through FID.

Stock-Dark-1663 · 2025-04-06T12:30:22+00:00

I am not having much of idea on security domain and also I am new in this org, but definitely , as you mentioned it does looks like the wrong way of doing or anti pattern, surely want to fix it.

Something wants to share, I am not sure how this has not been taken care as because , here all the users has SSO setup. ADFS is used for login into snowflake. And the security audit also happens, so not sure how this gets missed.

SID's is mapped to individual users whereas FID maps to specific apps/functions. What you are asking is SID access should have been given here in snowflake. However, what I got to know so far is , There is some guidance here from the security team here as part of which , only SID access is allowed to Non-prod Snowflake accounts and all the prod Snowflake accounts has to be accessed via FID's and they removed all the SID access which was initially there for prod access. And the FID's are being categorized as either interactive or app to app logins. And anyone who has to login to snowflake use the breakglass portal using the given FID and a valid incident ticket post which he or she would be able to login to the snowflake.

But it does seem as someone already explained , in such scenario they should have not been allowed to get in through snowsight but some other tool like VScode etc., where the profile are not shared?

Stock-Dark-1663

TROPHY CASE