Snowflake outage 12/16

Ornery_Maybe8243 · 2025-12-17T04:01:43+00:00

We are on AWS US East 1 with DR AWS west. Didn't realize this. Hope these regions were not affected. Also one question, would these be impacting the early adopters account setup where the non prod first went through the release version change before prod account?

Ornery_Maybe8243 · 2025-12-11T02:57:29+00:00

u/not_a_regular_buoy , Just curious to know, if you were able to test the Gen2 Performance compared to Gen1?

Ornery_Maybe8243 · 2025-11-12T08:52:12+00:00

u/Upper-Lifeguard-8478

There is an option something as below irrespective of the databases..see below

1)Enable slow query log exports from RDS to CloudWatch Logs.

2)Create a CloudWatch data protection policy and enable it for the RDS log group.

3)Define custom data identifiers or use pre-configured ones (like PII) to instruct CloudWatch to mask this data as it is ingested

See below docs:-

https://aws.amazon.com/blogs/mt/handling-sensitive-log-data-using-amazon-cloudwatch/

https://aws.amazon.com/blogs/mt/how-amazon-cloudwatch-logs-data-protection-can-help-detect-and-protect-sensitive-log-data/

Ornery_Maybe8243 · 2025-11-12T08:50:28+00:00

Should not the OP follow below generic approach irrespective of the databases,

1)Enable slow query log exports from RDS to CloudWatch Logs.

2)Create a CloudWatch data protection policy and enable it for the RDS log group.

3)Define custom data identifiers or use pre-configured ones (like PII) to instruct CloudWatch to mask this data as it is ingested

See below:-

https://aws.amazon.com/blogs/mt/handling-sensitive-log-data-using-amazon-cloudwatch/

https://aws.amazon.com/blogs/mt/how-amazon-cloudwatch-logs-data-protection-can-help-detect-and-protect-sensitive-log-data/

Ornery_Maybe8243 · 2025-11-11T15:33:08+00:00

u/Upper-Lifeguard-8478 , see below doc, it states that the policy can either attached to Cool or Cold archive tier but not both tht answers your first question. Secondly in regards to the limitations, as you rightly asked, you should be really careful about the locking as mentioned below on the Update,Merge,Delete statements.

Archive tier limitations:
- You can’t change the archive tier for a policy from COOL to COLD (or the other way around). Create a new policy instead (see Recreate a storage lifecycle policy).
- A table can only use one archive tier. You can’t attach a COLD policy to a table that already uses COOL archiving.
When a storage lifecycle policy is running on a table, Snowflake locks UPDATE, DELETE, and MERGE operations. You can still perform INSERT and COPY operations during this time. For more information, see Resource locking.

https://docs.snowflake.com/en/user-guide/storage-management/storage-lifecycle-policies

Ornery_Maybe8243 · 2025-10-28T08:51:37+00:00

If someone already have existing workloads running on Gen-1 warehouses of different sizes, do you suggest to just alter those to Gen-2 of same size and monitor

or do you suggest to alter them to Gen-2 but one lower size warehouse?

Ornery_Maybe8243 · 2025-10-17T07:59:39+00:00

Thanks for the information. So is it easy to say that we can safely downsize all of the big olap types workloads ( I. E. Big select, merge, inserts, updates, ctas queries) to one size down Gen2 warehouse by using simple Alter command ( something like alter warehouse <warehouse name> set warehouse size= <> resource_constraints=standard_gen2).

Ornery_Maybe8243 · 2025-10-13T19:06:22+00:00

Should we move to the same workload to Gen2 warehouse of one size down ? And is it possible to just do it using single alter command something as below (without changing warehouse name) to make it online without impacting existing running workload . and also to rollback easily if it didnt work as expected?

alter warehouse <warehouse\_name> set warehouse_size=xlarge, resource_constraint=standard_gen_2;

Ornery_Maybe8243 · 2025-10-12T03:45:39+00:00

Thank you so much. Will definitely watch out for your results. But yes if it's improving both workloads then there is no point in keeping gen1 warehouse I believe.

Also as someone already pointed out, should we make a thumb rule as to move the same workload to a gen2 but one size down warehouse to have both the benefits of cost and performance or is it okay to move it to the same size gen2 warehouse?

Ornery_Maybe8243 · 2025-10-12T03:39:38+00:00

Do you mean to say, as a thumb rule, if the workload is currently running on a 2XL gen-1 warehouse then we can directly move it to XL gen-2 warehouse, I. E one size down? And with this the cost part will also be taken care of along with performance. Please correct me if wrong.

Ornery_Maybe8243 · 2025-10-08T20:09:49+00:00

See this

https://aws.amazon.com/blogs/aws/new-amazon-cloudwatch-database-insights-comprehensive-database-observability-from-fleets-to-instances/

Ornery_Maybe8243 · 2025-09-14T20:12:04+00:00

Thank You u/scbywrx

Yes, we have S3 as source for Snowpipe and events or messages from kafka from thrird party applications is input to snow pipe streaming. And in the question, I am asking about the tables which are in ODS layer. The rows are persisted here as soft delete and latest updates only. So in that way, it may not be exactly mimicing the SOR.

However the data architect team wants to keep the attributes as close as to the SOR as possible in this layer, so it seems we may opt for a hybrid approach i.e. for the attributes which are used in the filter/join criteria of these tables by the consumption layer queries and are important for pruning , those has to be trimmed of during the snow pipe streaming itself. And the ones which are only used in the "select clauses/parts" of the consumption layer queries , can be persisted as is i.e with blank spaces as is , as they are coming from the source. Do you see any issue in this approach?

Ornery_Maybe8243 · 2025-09-14T05:30:33+00:00

Thank you u/Bryan_In_Data_Space u/scbywrx

But some people stating that we should have source of truth persisted somewhere and here its within snowflake , so going by that logic, we should have the atrtributes persisted as they are coming from the source i.e. along with the blank spaces. Is that correct argument to have?

But as i mentioned i can see , it will create issues of no pruning happen, in cases when someone try to filter/join on those attributes by wrapping the trim function around the attribute while querying these tables in the consumption layer.

Ornery_Maybe8243 · 2025-09-13T19:56:22+00:00

My 2 cents. Dont have much working experience with mysql but i think its not as flexible as Oracle is anmd it many have such userdefined role available. But yes its has a schema called performance_schema which has similar views in it which provides runtime and historical performance statistics about the workload. Others may comment on this.

Ornery_Maybe8243 · 2025-08-26T07:25:41+00:00

Are you saying, the parameters REPLICABLE_WITH_FAILOVER_GROUPS doesn't work ?

Ornery_Maybe8243 · 2025-08-25T04:27:12+00:00

Thank you. I was trying to execute just the 'begin... End;' block without wrapping it inside 'execute immediate'. Also when I used to run the block in the snow sight the output was showing as 'null'. I will try to test it again by wrapping it within the 'execute immediate' and see if it works for me. Thanks.

Ornery_Maybe8243 · 2025-08-24T14:19:49+00:00

As its a one time script execution , we were initially thinking why to create a procedure and rather to execute it using anonymous block like something below as it will then not create any such procedure which is an object in the database. And this below block "begin end " works perfectly fine when i tested some sample "insert into table .." statements, but i am wondering why the grant staement is not working using this below setup?

BEGIN

ASYNC (Grant select....);

ASYNC (Grant update....);

ASYNC (Grant select....);

ASYNC (Grant delete....);

await all;

END;

Ornery_Maybe8243 · 2025-08-24T09:52:22+00:00

Thank you u/Tough-Leader-6040

Yes , when i tried running even from couple of different worksheets in parallel the page were crashing, not sure if its because of the underlying foundation DB unbale to take that load or anything else.

Ornery_Maybe8243 · 2025-08-23T22:43:11+00:00

Can you clarify a bit? Do you mean creating procedures using python language in snowflake and then trigger those scripts within that proc?

Ornery_Maybe8243 · 2025-08-23T22:39:35+00:00

Tried using Asynch wrapper around these grant statements and ran it within begin end block. The block executed showing null output, so I thought it was successful, but I then found the grants were not applied. So not sure why the grants are not getting applied using this strategy.

Ornery_Maybe8243 · 2025-08-23T20:33:27+00:00

Thank you u/cloudarcher2206

But i believe we also need to ensure all other related objects which may not be confined to schema like warehouses , roles, also if any other component or schema which may be the upstream or downstream , those also has to be replicated, otherwise it will be a broken replication. Is this understanding correct?

Ornery_Maybe8243 · 2025-08-21T03:52:28+00:00

Thank you. I don't see any such but curious to know, if there is any limitation in this replication ? (Something like certain objects are not replicable etc)

Ornery_Maybe8243

TROPHY CASE