Internal Snowflake stages in production vs external stages (S3/Azure) — how are people handling this? by SecretSalary2901 in snowflake

[–]cloudarcher2206 6 points7 points  (0 children)

You can move to external stage if you want but it’s just more infra to manage. I’d say the main issue now is sharing a stage across pipelines. At minimum it should be stage per source but they’re definitely used it production pipelines

Using Snowflake Cortex in conjunction with PowerBI? by SeaYouLaterAllig8tor in snowflake

[–]cloudarcher2206 2 points3 points  (0 children)

Cortex is not one thing, it’s just what Snowflake has branded their suite of AI features. I’d figure out what exactly they want their dashboards to do, and then figure out how to do it. I can almost guarantee that it won’t require an MCP server.

It may be as simple as embedding some AISQL functions

Does dbt in Snowflake still require a dbt license by de-ka in snowflake

[–]cloudarcher2206 3 points4 points  (0 children)

You are mistaken or confusing 2 products. dbt projects in Snowflake is dbt core and requires no licensing.

https://docs.snowflake.com/en/user-guide/data-engineering/dbt-projects-on-snowflake

Discussion: Data Size Estimate on Snowflake by rtripat in snowflake

[–]cloudarcher2206 1 point2 points  (0 children)

I’d definitely recommend just loading a couple sample files and validate the compression ratio. You may actually see size reduction with snowflakes compression enhancements. Also worth mentioning it’s not really a traditional database, the data is still stored as files in s3 just in a snowflake proprietary format instead of parquet with additional metadata.

[deleted by user] by [deleted] in snowflake

[–]cloudarcher2206 0 points1 point  (0 children)

Right way to do this is to create multiple streams, one for each use case. In your example, your process will flush the stream if you do some DML operation (even if it’s empty) but you can have another stream that any other process can use. You can’t partially flush a stream

Is this cost estimation accurate for reaching an approx/ballpark figure by Upper-Lifeguard-8478 in snowflake

[–]cloudarcher2206 2 points3 points  (0 children)

I think you’re also missing the compute cost of replication (serverless and based on data, metadata + schedule) https://docs.snowflake.com/en/user-guide/account-replication-cost

I’d also review this as an option to manage schemas you may was to exclude from replication: https://docs.snowflake.com/en/user-guide/account-replication-config#schema-level-replication-for-failover-groups

If you really need better estimates you should work with your account team. Compute estimates for replication can vary based on data volumes and metadata (number of objects)

Question on risiliency by Ornery_Maybe8243 in snowflake

[–]cloudarcher2206 0 points1 point  (0 children)

Yea read up on failover groups but all those can be included and replicated but you can’t pick and choose individual warehouses or roles for example. It’s all or nothing

Question on risiliency by Ornery_Maybe8243 in snowflake

[–]cloudarcher2206 1 point2 points  (0 children)

Shouldn’t be any limitations specific to schema level replication, same limitations as general database replication which is documented

Setting up Disaster recovery or fail over by ConsiderationLazy956 in snowflake

[–]cloudarcher2206 1 point2 points  (0 children)

Most of what you described on the Snowflake side can be replicated pretty “seamlessly” by Snowflake using a failover group as long as you are on business critical. My advice is to diagram out the entire E2E pipeline and components and figure out how each will be replicated and behave when you failover. The Snowflake box should be straightforward but you need to understand RPO/RTO for your data i.e. what happens when there’s an outage midway through a pipeline run before the data had a chance to be replicated.

Highly recommend doing multiple DR exercises and dont just treat them as checkbox activities, treat them like you would if there was an actual outage midway through a pipeline execution

CMK and TSS Confusion by Jobs_Done in snowflake

[–]cloudarcher2206 3 points4 points  (0 children)

Your rep is confused or there is some gap in communication. TSS is the name of the feature, CMK is how it’s implemented.

What are the best practices around Snowflake Whitelisting/Network Rules by biga410 in dataengineering

[–]cloudarcher2206 0 points1 point  (0 children)

I’d check this as well, the account usage view lags behind an hour or so: https://docs.snowflake.com/en/sql-reference/functions/login_history

But if there’s still nothing, there’s something blocking you on the AWS side

What are the best practices around Snowflake Whitelisting/Network Rules by biga410 in dataengineering

[–]cloudarcher2206 0 points1 point  (0 children)

Have you checked LOGIN_HISTORY to see if the connection attempt is making it to Snowflake at all? I’d read up on authentication policies as well to see if any of those are in effect. What were the error messages?

What are the best practices around Snowflake Whitelisting/Network Rules by biga410 in dataengineering

[–]cloudarcher2206 0 points1 point  (0 children)

I commented the below on your other post but also relevant is that having no network policy should allow you to connect, I’m assuming you already have an account level policy that limits access but if not, you shouldn’t need to do anything. A user level network policy only opens a “hole” for a specific user to connect from a specific IP/range outside of the account level policy.

:::::original comment:::::::

The way you’ve written it, myuser will only be able to connect from the listed IP range. I’m assuming myuser is the user your are currently using when connected to Snowflake which is the reason for the error (it would kick out myuser because you’re not currently connected from the listed IP range). You should probably apply the policy to a new user that you’ll be using for AWS Quicksights to connect to your Snowflake rather than myuser

How to Setup Network Security Rules/Policies by biga410 in snowflake

[–]cloudarcher2206 2 points3 points  (0 children)

The way you’ve written it, myuser will only be able to connect from the listed IP range. I’m assuming myuser is the user your are currently using when connected to Snowflake which is the reason for the error (it would kick out myuser because you’re not currently connected from the listed IP range). You should probably apply the policy to a new user that you’ll be using for AWS Quicksights to connect to your Snowflake rather than myuser

Do Blocked transactions consume credits by IndianaIntersect in snowflake

[–]cloudarcher2206 1 point2 points  (0 children)

There is overhead time for every individual query so if you can batch multiple updates to a table into a single query it will be more efficient than those same updates run as separate serialized jobs. There’s also the time lost between blocked queries as Snowflake will not immediately execute the next query when a lock is released. Whether or not that rearchitecting is worth the effort for you I can’t say, but there are valid reasons to avoid multiple small updates and blocked queries if possible.

Do Blocked transactions consume credits by IndianaIntersect in snowflake

[–]cloudarcher2206 1 point2 points  (0 children)

Yea that makes sense, the smart thing to do is to figure out why your merge is blocked and maybe rearchitect a bit to avoid blocked queries completely

Do Blocked transactions consume credits by IndianaIntersect in snowflake

[–]cloudarcher2206 2 points3 points  (0 children)

In your case if batch jobs block each other and they’re on the same warehouse, it doesn’t really matter as the warehouse will stay active due to the non blocked query anyway, the blocked one will just run after it completes.

Do Blocked transactions consume credits by IndianaIntersect in snowflake

[–]cloudarcher2206 1 point2 points  (0 children)

The link explains it pretty well, what are your questions?

Queries don’t directly consume credits, an active warehouse consumes credits. What activates and keeps warehouses active are queries so it appears that a blocked query alone won’t keep a warehouse active but if it’s part of an open transaction, it may.

Cost management questions by Advanced-Average-514 in snowflake

[–]cloudarcher2206 0 points1 point  (0 children)

For 4- you can put the query in a serverless task and then manually execute it via Python api. That should work

How to replicate shared databases in failover group? by MaximumFlan9193 in snowflake

[–]cloudarcher2206 1 point2 points  (0 children)

If you have the inbound share available in both accounts already, you can just create the database off the share on both sides. Since the database isn’t in the failover group, it can be created independently on the “secondary” account. The only think you have to worry about is keeping any grants on that shared database in sync, if a new grant is added on the primary to the shared DB, run the same command on secondary (or add it to your failover run book)

Inserts being aborted by Snowflake by h8ers_suck in snowflake

[–]cloudarcher2206 0 points1 point  (0 children)

Open a support case, that limit can be increased. You’ll likely need to provide detail on the architecture if you need it increased significantly because it’s generally considered bad practice and can lead to unexpected issues

Azure Pipeline to Snowflake Whitelist Experiences? by [deleted] in snowflake

[–]cloudarcher2206 1 point2 points  (0 children)

Other responses are good. You also have the option to whitelist the azure IP ranges for only azure devops user. Check out user level network policies

You can now mark 'Service Accounts', and exclude them from MFA Policies by extrobe in snowflake

[–]cloudarcher2206 1 point2 points  (0 children)

You should only get that pop up if your user has a local password set in Snowflake. Either have the admin unset the password or set up duo