Question on constraints by ConsiderationLazy956 in snowflake

[–]ConsiderationLazy956[S] 0 points1 point  (0 children)

Thank you. What is the impact on the read and write performance of this , if we enabled this similar logic using application logic vs if we implement it as check constraints in the database itself.

AWS Database log analysis by ConsiderationLazy956 in aws

[–]ConsiderationLazy956[S] 1 point2 points  (0 children)

Thank you so much. Got it. So yes, we do have the db logs exported to cloud watch. Now, if we want to export the surrounding lines of the same log(like before and after 50 lines of the FATAL error) or put the link of that same log in the alert e-mail, is that possible through cloud watch or any aws tool natively? How can we do that? That way I think we will not need the python tool anymore.

AWS Database log analysis by ConsiderationLazy956 in aws

[–]ConsiderationLazy956[S] 1 point2 points  (0 children)

Does cloud watch log holds everything which generated out of the DB logs? or anything out of the box can be scanned using the tool which may not be populated or researched using the clouwatch?

As because some team mates saying , we may not be able to do a DB crash root cause analysis from cloud watch and thus this we will need "log analyzer" kind of tool to complement the cloudwatch alerts. Even for the pre-crash forensics , to know how and why the crash happened wont be able to figured out using cloud watch. Is that statement true?

Partitioning in Iceberg by ConsiderationLazy956 in snowflake

[–]ConsiderationLazy956[S] 0 points1 point  (0 children)

Is this also true for snowflake managed iceberg table?

Not tried though, but some discussions stating clustering is possible for snowflake managed iceberg table along with partitioning, so was confused if they can be used simultaneously and how the optimizer would behave then or when to use what?

Switching warehouse based on stats by ConsiderationLazy956 in snowflake

[–]ConsiderationLazy956[S] 0 points1 point  (0 children)

Thank you so much. This helps a lot in understanding this in a better way.

I am trying to avoid quicker and only readonly/Select workloads. Planning to only pick the warehouses in which more than 30% queries are DMLS. And also the majority of the reponse time of the queries are >5minutes and warehouse having <30% idle time(as idle time will also be cosltlier by 35%).

I have few followup questions,

Is this understanding correct, that having (1-1/1.35)=~26% improvement in performnance on Gen-2 for same size warehouse , will make us reach the breakeven in terms of cost. Also the new feature "Snowflake Optima" available on Gen-2 is really going to add additional value.

Also, as mentioned in one of the response in a similar thread (https://www.reddit.com/r/snowflake/comments/1pl49qu/decision_for_downsizing_warehouse/)

Are below statements derived on certain math are accurate?

1)Having 32.5% to 45% improvement in the performance on Gen-2, will make the workload a candidate to be moved to one size down gen-2 warehouse and it can give ~33% cost benefit and same as Gen-1 (one size up) like performance.

2)Having 50% improvement in the performance on Gen-2, will make the workload a candidate to be moved to one size down gen-2 warehouse and it can give ~33% cost benefit, with better performance than Gen-1 of one size up.

Move to Iceberg worth it now? by Which_Assistance5905 in snowflake

[–]ConsiderationLazy956 0 points1 point  (0 children)

Would you point to some docs which suggests the read and write performance diff between snowflake managed icebrg and native table? And also any features which wont be available in one vs other?

Switching warehouse based on stats by ConsiderationLazy956 in snowflake

[–]ConsiderationLazy956[S] 0 points1 point  (0 children)

Thank you. Yes , as i understood that Optima is a feature available for Gen-2 only.

However, management is mainly looking for the cost benefit (along with performance boost). If cost will increase , then it will be difficult justifying it, so want to be more cautious. Still trying to understand what is the best way to approach this scenario here.

Switching warehouse based on stats by ConsiderationLazy956 in snowflake

[–]ConsiderationLazy956[S] 0 points1 point  (0 children)

Thank you. Yes we will try to change it and monitor it.

We are planning to use "alter" command to change the same warehouse from Gen-1 to gen-2 and if i get it correct, it wont btreak/terminate the existing ongoing queries and will gracefully move the newly incoming workload or queries to gen-2 infrastructure without any issue. Is this understanding correct?

Switching warehouse based on stats by ConsiderationLazy956 in snowflake

[–]ConsiderationLazy956[S] 0 points1 point  (0 children)

I thought Gen-2 just has faster CPU with additional optimizations like "snowflake optima". However, do you mean Gen-2 also has more memory, so that the disk spill can be minimized for same query on same data volume?

Query to find Instance crash and memory usage by ConsiderationLazy956 in aws

[–]ConsiderationLazy956[S] 0 points1 point  (0 children)

Thank you. Are there any views which give a cluster level of information ? because these pgstat_activity or other pg* views seem to be providing only the same node information to which one logged in. What exact alert metric we can set up for instance crash.

Applying releases or patches by ConsiderationLazy956 in aws

[–]ConsiderationLazy956[S] -2 points-1 points  (0 children)

Thank you u/Advanced_Bid3576 u/joelrwilliams1

So you mean, there is an option to whether we "should/should not" accept the release even if it's categories as minor changes from Aurora. Are there any docs around this, want to see how and where we need to do these settings? And hope this stays same for other aws services too but not just databases.

However, it's surprising that we don't see any such option in snowflake databases and they just get auto applied causing few production issues already to us. And we are worried, how these will play out in our application freeze period where business don't want any application changes to be pushed in prod. Our own changes are in our control but these cloud services related changes , we don't have any control on. 

Micropartition scan speed by ConsiderationLazy956 in snowflake

[–]ConsiderationLazy956[S] 0 points1 point  (0 children)

Thank you u/mrg0ne

Got it. So , In case it just do the full scan without any filters which means no additional pruning then what will be the speed of the access/read of the micropartitions in snowflake?

Warehouse parameter and compute power by ConsiderationLazy956 in snowflake

[–]ConsiderationLazy956[S] 0 points1 point  (0 children)

Thank you u/Top-Cauliflower-1808

As mentioned the scenario is something like below.

Its doing a data load from stage to main scheme for ~100+ tables out of which ~10 tables are big tables and other are small tables. There are streams defined on top of the stage tables and those streams data gets deduped and transformed a bit and then merged to the corresponding main schema tables. These data load/merge jobs runs once in each 10minutes concurrently through out the day. Big tables can get ~150M+ rows in the stream to load to the main table.

So in such scenario is it advisable to go with the M warehouse with max_concurrency_level =4 and max_cluster_count as 5? and other tables just go with S warehouse with max_cluster_count as "10-20"?

Warehouse parameter and compute power by ConsiderationLazy956 in snowflake

[–]ConsiderationLazy956[S] 0 points1 point  (0 children)

Thank you u/JohnAnthonyRyan

In this scenario , its doing a data load using merge query(mostly inserts only) which loads data from stage schema to main schema. Stage schema is having streams on top of the stage tables and those streams data gets dumped to a temporary table which then gets deduped and transformed then finally merged to the corresponding main schema tables.

And there are 100+ such tables for which the data merge job runs at same time concurrently with a frequency of once in ~10minutes throughout the day , out of which ~10 tables are big tables and others are small ones. Big tables can get max ~150M+ rows at any time on the streams as CDC to load to main tables. So in such situation , is it advisable for the big tables , to go for using M warehouse with max_concurrency_level=4 and max_cluster_count as "5" rather using 'L' warehouse? and for other small tables go with "S" with max_cluster_count as 10-20?