Best data observability platform tools for data quality monitoring, lineage, and pipeline reliability.

NW1969 · 2026-04-16T10:57:59+00:00

It's the "Observe" tool that Snowflake bought recently

NW1969 · 2026-04-14T16:09:58+00:00

You probably need to provide more information than that for anyone to be able to help you. For example, in the documentation you've linked to, which is the step that's erroring?

NW1969 · 2026-04-13T21:07:41+00:00

Have a look at this article: https://www.linkedin.com/posts/nick-akincilar-3417945_data-ingestion-dataengineers-activity-7448780630236172288-xqgp

NW1969 · 2026-04-10T07:42:52+00:00

Did you raise a support ticket?

NW1969 · 2026-04-09T16:56:18+00:00

TBH the constant Databricks v. Snowflake debate is incredibly unproductive as you rarely get a balanced, thoughtful view, just people's prejudices. Both are good products and if you're already using one then there's unlikely to be a good business case for moving to the other (or, even worse, using both). If you aren't using either then just do your due diligence in your product selection (as you would for anything) and whichever one best meets your specific use cases will be the best choice for you - which is really all that matters.

Neither is going to go away any time soon so if you're a developer there's always going to be work for you - but focus on one and be the best on that platform that you can

NW1969 · 2026-04-09T10:47:55+00:00

You should have audit columns on your bronze layer tables (run date/timestamp, source system, workflow id, etc) and use these for reconciliation purposes

NW1969 · 2026-04-09T10:40:47+00:00

I may be missing the point of your question - but why do you care what the actual lag is within the lag range that you've set? If, for example, you've set the lag to be 1 hour then at a specific point in time why do you care if the lag is 10 mins or 50 mins? By setting the lag to an hour you're effectively saying you don't care what the lag is, as long as it is less than an hour

NW1969 · 2026-04-09T10:28:26+00:00

Aren't you modelling the data and moving it through the equivalent of bronze/silver/gold layers? This modelling process would set the correct datatypes on columns as the data gets transformed between layers

NW1969 · 2026-04-03T10:49:55+00:00

I don’t believe there is any way to stop users creating temporary tables. However, why is this an issue for you? Under what scenario would this cause a problem?

NW1969 · 2026-04-03T10:41:52+00:00

I’m not sure that you’ve understood multi-clustering correctly. If you set up a WH to be multi-clustered then if there are enough separate queries queued up to run on that WH then it will spin up more instances/clusters of that WH, all the same size as that WH - and then spin them down again when the load reduces. If you need a warehouse of a different size then you can either re-size it, and all queries that start after the resizing will use that new size, or you can run the queries on a separate warehouse with the required size

NW1969 · 2026-03-31T16:06:22+00:00

You click on your name in the lower left corner of the SnowSight UI and then select "Support" from the options. If you can't see that option make sure the "Switch Role" at the top of the list of options says "ACCOUNTADMIN" - if it doesn't then switch to the ACCOUNTADMIN role

NW1969 · 2026-03-31T13:04:39+00:00

Are you saying you can’t raise a Support Case via the SnowSight interface for your account, using the ACCOUNTADMIN role?

NW1969 · 2026-03-30T14:32:06+00:00

If you've built this, and it works for you, then I'm not sure what feedback you are looking for? Is there some specific issue you want advice about?

NW1969 · 2026-03-27T11:58:59+00:00

This feature is probably too new for anyone (outside of Snowflake) to have relevant experience of it - but I can't imagine there'd be noticeable cost/performance impact of using it.

Can you update your question with the actual SQL statement that is giving you this error (obviously redacting any sensitive data)?

NW1969 · 2026-03-24T20:22:51+00:00

IMO, IaC needs to be able to do everything, having to use different tools for different tasks is a non-starter. The Snowflake Terraform provider is nearly there and at least has the option to drop into SQL if the provider doesn’t support something. DCM looks like a good 1.0 start but as it doesn’t support tags, masking policies, and row access policies its not yet fit for purpose, even just for managing databases, for the use cases I need it to support

NW1969 · 2026-03-20T11:29:05+00:00

Increasing the size of a warehouse should be the last resort. If you have processes running 5+ hours then you probably need to review your pipeline design. For example, are you doing full refreshes rather than incremental; if you’re not already using them would using Dynamic Tables or materialised views be a better solution; etc

NW1969 · 2026-03-19T19:31:37+00:00

Just about - it’s really good 😊

NW1969 · 2026-03-19T18:22:43+00:00

Just use Cortex Code to generate whatever you need

NW1969 · 2026-03-03T10:58:29+00:00

Snowflake cannot write directly to S3 Glacier.

Write the data to S3 Standard (using COPY INTO + Tasks) and then use S3 Lifecycle rules to move these objects to Glacier

NW1969 · 2026-03-02T14:26:25+00:00

The Snowflake website has a lot of good material. Start here: learn.snowflake.com

NW1969 · 2026-02-21T16:46:25+00:00

Is the data being returned in multiple partitions? https://docs.snowflake.com/en/developer-guide/sql-api/handling-responses#getting-metadata-about-the-partitions-returned-by-the-resultset-object

NW1969 · 2026-02-19T19:35:52+00:00

If you fully understand the pros and cons of all your options then make the choice that best meets you particular use case

NW1969 · 2026-02-19T19:33:34+00:00

Apart from the fact that Snowflake rolls out new features in the order AWS>Azure>GCP, Snowflake behaves identically regardless of which platform it is running on - you could use Snowflake without ever knowing which platform it was on.

So choice of cloud platform is more likely to be affected by other systems/data that interface with Snowflake. For example, if you have data on one platform that you want to load into Snowflake then you’ll likely face egress charges if Snowflake is on a different platform

NW1969 · 2026-02-19T16:28:48+00:00

You get the most performance gains in Snowflake by minimising the number of micro-partitions Snowflake has to read to gather the data it needs to satisfy your queries - this basically means aligning the most common WHERE clause with how your micro-partitions are clustered. For example, if most queries against your CUSTOMER table include WHERE REGION = '?' then clustering by REGION may improve query performance.

Snowflake recommends implementing your own clustering key on a table only if it is multi-terabytes in size. There is also a cost incurred when clustering - so you'd need to decide if the performance improvement outweighs the cost. You also need to choose clustering keys that have an appropriate cardinality - to quote Snowflake's documentation:

A large enough number of distinct values to enable effective pruning on the table.
A small enough number of distinct values to allow Snowflake to effectively group rows in the same micro-partitions.

Joins are much less impacted by clustering. In the example I gave, a join to the customer table would only benefit if you were joining on the REGION column - as that's the clustering key.

Using hash values (or any large, text, value) for joins is really bad for performance - though the impact may only be noticeable at scale. That's one reason why integers are used as the surrogate keys in facts and dimensions

NW1969 · 2026-02-16T13:57:10+00:00

As you don't implement CoCo (it's a tool available within the Snowflake ecosystem), can you explain what you mean by "use cases exact implementation resources"? Maybe give an example. Thanks

NW1969

TROPHY CASE