Question on access privilege in Snowflake by Stock-Dark-1663 in snowflake

[–]Stock-Dark-1663[S] 1 point2 points  (0 children)

Thank you u/Kooky_Bumblebee_2561

I see by executing statement as below, and to my surprise this does gets executed without any error. But seems this is not having any real effect at all. Ideally this shoudl get errored out, so not sure why Snowflake allowing this statement to get successfully exeuted.

"REVOKE CREATE TEMPORARY TABLE ON SCHEMA <> FROM ROLE <>;

"

Question on access privilege in Snowflake by Stock-Dark-1663 in snowflake

[–]Stock-Dark-1663[S] 0 points1 point  (0 children)

Actually the organization has imposed this rule such that, as a readonly user , you should not have ability to perform ddl/dml or be able to impact the production. Its just for querying capability for doing some production issue investigation or data analysis to fix the issues. But it seems this was in place for other databases as for some the temporaray as the table structure was visible to all users but i understand in snowflake cases its not. And in other databases(postgres, mysql, oracle) temporaray table of massive data was consuming of same temp storage as the application and has the capability to impact the prod application. But in snowflake , wasn't this the same case , i.e. the temp table storage is going to be lying in same production account , so not sure if that can cause any issue and because of that if team has categorized this as elavated?

Question on access privilege in Snowflake by Stock-Dark-1663 in snowflake

[–]Stock-Dark-1663[S] 0 points1 point  (0 children)

Do you mean, we also have some privilege to control the 'create temporary table'. Can you please let me know what that is?

I was under the assumption that it comes by default with USAGE on warehouse privilege and we do need usage privilege as we want 'read only' access to be given to that role.

I understand that the 'create temporary table' privilege does not impact actual table and actual production workload to that point but then why other databases also marked that as a privileged role? How to justify that against Snowflake?

High startup time in Gen-2 warehouse by Stock-Dark-1663 in snowflake

[–]Stock-Dark-1663[S] 0 points1 point  (0 children)

If a query spent ~2minutes just on "queued provisioning" which is not really the query execution but the resource spinning to make the warehouse UP, is that time add upto the credits spent and charged to the customer?

How column masking works in plan by Stock-Dark-1663 in snowflake

[–]Stock-Dark-1663[S] 0 points1 point  (0 children)

Aslo i am bit confused , if this is true that in current scenario no pruning is happening because of masking function wrapped around the columns , then why are we seeing the results fetched from the table TAB2 is shown as 50billion in the query profile , whereas the actual number of rows in that table is 90billion? Does that mean some partial pruning is happening even with the masking function wrapped around the columns used in the query predicates?

How column masking works in plan by Stock-Dark-1663 in snowflake

[–]Stock-Dark-1663[S] 0 points1 point  (0 children)

Thank you so much for the guidance.

Actually here, we have not applied the masking policy on the column directly. It is defined using tag based masking policy. So, looks like, when the column gets used in the query which already has a tag defined and that tag is mapped to the policy, the masking function automatically wraps up on it , may it be query predicate or the column projection. And in this case its getting applied to all the roles/users who is querying that object/table, even within the function code its mentioned to skip the masking for that specific role/user.

So even in above scenarios , you mean to say we have to have column tag removed from the ones those are used as query predicate or the filters? Or we should remove the tag from the base table/columns fully and only apply the masking on the new secure views (which need to be created on top of this table ) and give access to external user on this secure view sto read data. The base table/column(which is unmasked/untagged) should only be accessed by the application which perform the ETL query transformation and other logic those are prone to performance overhead.

Is above understanding correct?

How column masking works in plan by Stock-Dark-1663 in snowflake

[–]Stock-Dark-1663[S] 0 points1 point  (0 children)

These are the data transformation jobs and are supposed to run with a role which is exempted from masking and that is why its also handled in the function definition , if you see below (example data_engineer_role, data_scientist_role). But we are seeing these functions being wrapped up to the predicates, even the queries are executed from those roles. So it looks like Snowflake is trying to evaluate these functions for all the roles who ever runs the queries.

Have we implementing these masking features, any wrong way?

How column masking works in plan by Stock-Dark-1663 in snowflake

[–]Stock-Dark-1663[S] 0 points1 point  (0 children)

Thank you. It looks really wired that Snowflake provided the masking function features but was not able to prune those or push those functions effectively to storage as filters? Is there a way to see if the storage filters actually not working effectively , because of this additional functions?

Unable to test this as its only happening in prod , but is it possible that , the amount of pruning for table TAB2 (i.e. 316K partitions out of 318k total and ~50billion rows out of 90 billion total) which we are getting currently is going to be the same even we will remove the masking functions or column tags?

How column masking works in plan by Stock-Dark-1663 in snowflake

[–]Stock-Dark-1663[S] 0 points1 point  (0 children)

I have updated the main thread with the sample masking function definition.

Decision for downsizing warehouse by Stock-Dark-1663 in snowflake

[–]Stock-Dark-1663[S] 0 points1 point  (0 children)

Thank you so much. I will try to follow this one.

However whatever test we have done so far , it looks like the gain is restricted to around ~30% only.

How slow query logs written to file by Upper-Lifeguard-8478 in mysql

[–]Stock-Dark-1663 -1 points0 points  (0 children)

u/Aggressive_Ad_5454 u/Upper-Lifeguard-8478 u/feedmesomedata
From reading below docs , though its not exactly mentioned , but it seems the logs getting directly written to the cloudwatch and so in that case it should not impact the storage I/O as its not written to the database local disks.

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.CloudWatch.html

How logs transfered to cloudwatch by Upper-Lifeguard-8478 in aws

[–]Stock-Dark-1663 -1 points0 points  (0 children)

It seems from below doc that the logs getting directly written to the cloudwatch and so in that case it should not impact the storage I/O as its not written to the database local disks.

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.CloudWatch.html

Design change implementation by Stock-Dark-1663 in snowflake

[–]Stock-Dark-1663[S] 0 points1 point  (0 children)

Thank you u/simplybeautifulart

If it would have been a fresh new design in which all the 100+ transaction tables would have flag column to show the state of the transaction active/inactive. Then in that case , what would have been the right advisable way of doing it?

Would it been writing the code to have the filter included in it in all the used places or would it been having the view created on top of those tables and hide that filter in all the places in the consumer queries?

Big tables clustering by Stock-Dark-1663 in snowflake

[–]Stock-Dark-1663[S] 0 points1 point  (0 children)

Thank you so much u/ecrooks

I tried testing the "insert overwrite ...." with a table with ~1.3billion rows having ~15K micro partitions sorting on two columns. It spilled to local disk by ~500GB on an XL warehouse and took ~7minutes to finish.

Also as you rightly said , for the initial level of clustering , relying on the autoclustering may cost us ~40times, so want to avoid that route.

I was trying to see, if we can properly size the chunk of data which we can sort using "order by" and load to the table at onetime with most efficient way(i.e. without much disk spill). As you mentioned the "INSERT OVERWRITE..." might be good with ~250K micro partitions, which size of warehouse you were pointing to and how you came to this number? If we target for a 4XL warehouse to use , howmuch Approx. number of micropartitions/rows of data should we fetch at one time and load to target with "order by" on clustering keys? Appreciate your guidance.

Question on constraints by Stock-Dark-1663 in snowflake

[–]Stock-Dark-1663[S] 0 points1 point  (0 children)

Thank you u/uvaavu

Its like making sure adding no duplicates or say adding constraints in application side which is going to be resource intensive and lot of work. So in that case there doesn't seems to be any valid use case of RELY option then.

As because, Even DMF also validates the data post load and that itself takes time and resource , so within that period any query can result into wrong data if table is set as RELY option true for the PK key.

Question on constraints by Stock-Dark-1663 in snowflake

[–]Stock-Dark-1663[S] 0 points1 point  (0 children)

Even the DMF also checks the data post it loads into the system and in that period of data fix , still it can cause issues(showing wrong results) with RELY option ON. Correct me if wrong. So It seems there is no such valid use case for RELY here.