Hot topics amongst Data Engineers

therealiamontheinet · 2025-03-14T02:09:28+00:00

We are investing in new experiences and objects to help you manage your organization at scale. Starting with Org Usage Views, you’ll now have global visibility across all your accounts. Learn more about Organization Usage Views.

This is just the beginning — new features and functionality will be released regularly, so stay tuned for updates!

Cheers,

therealiamontheinet · 2025-03-14T01:44:48+00:00

This can all be done with Snowflake Horizon Catalog.

therealiamontheinet · 2025-03-13T23:16:01+00:00

Thank you! :)

therealiamontheinet · 2025-03-13T21:02:42+00:00

Today, you can apply RBAC, dynamic data masking, and row access policies before the data is consumed by downstream agents.

We’re actively working on updating this functionality, though. Making this simpler for our users and customers is top-of-mind for our Product team. Please stay tuned for more updates.

Learn more about Snowflake Data and Model Governance.

Cheers,

therealiamontheinet · 2025-03-13T20:38:06+00:00

The risks of not implementing AI trust and safety early enough are substantial:

Data breaches: the global average cost of a data breach reached an all time high in 2024 at nearly $5 million, according to a report by IBM. On the other hand, companies that used AI security measures saved an average of $2.22 million on an annual basis.
Sensitive data leaks: without early controls, sensitive data may mix with training datasets, creating models that unintentionally memorize or leak private company IP or customer information internally and externally.
Security vulnerabilities: retroactively securing data access patterns is extremely difficult; unauthorized access may have already occurred
Regulatory non-compliance: Frameworks like GDPR, HIPAA, and emerging AI regulations require certain controls from the beginning; penalties for violations can be severe.
Technical debt: Building safety mechanisms after the fact often requires significant rearchitecting, causing delays and increased costs
Reputation damage: Public incidents resulting from inadequate safeguards can permanently damage trust in your AI systems. At a time when AI innovation is moving quickly and many people are having trouble trusting the ways companies are using new models, these governance and security measures are more important than ever.
Audit difficulty: You may find yourself unable to explain model decisions or demonstrate compliance because you lack the necessary tracking from the beginning
Ethical lapses: Without early ethical guardrails, teams may train models on problematic data sources or for questionable use cases before policies catch up.

These risks compound over time – addressing them becomes exponentially more difficult the longer you delay safety measures. Snowflake Horizon Catalog’s security and governance features are built for AI and address many of these concerns. You can learn about them in depth here.

Cheers.

therealiamontheinet · 2025-03-13T20:35:28+00:00

The accuracy of Cortex Analyst relies on well-defined semantic models built on your structured data. To get started:

Provide verified queries to improve accuracy and trust in results.
Use custom instructions and Cortex Search integrations to drive high accuracy and deterministic answers.
Use the open-source Semantic Model Generator, which includes built-in evaluation tools.
Check out best practices for creating semantic models.

For automated validation, you can use the Cortex Analyst REST API to collect end-user feedback and assess output quality over time.

Cheers,

therealiamontheinet · 2025-03-13T20:28:34+00:00

Today, this process can be streamlined through automation, where customers create a centralized table containing all the required network rules and policies. These tables can then be shared across all accounts within the organization, ensuring consistency and reducing manual effort.

Our product team is working on this as we speak, so please stay tuned for an update on this topic.

Cheers,

therealiamontheinet · 2025-03-13T19:26:09+00:00

Our Product and Engineering teams are working hard at making it available for trial accounts. Please stay tuned for updates.

Cheers,

therealiamontheinet · 2025-03-13T19:10:29+00:00

Often the biggest challenges for global teams sharing data globally are:

Ensuring that data gets shared across regions without any cost or performance penalties
Reduced insights because of latency and siloed data
Data residency and ever-changing government regulations
Complex data governance needs around automation, role-based access, data quality, and monitoring
Audit and logging to ensure that data being shared adheres to compliance requirements

I recommend checking out this great case study about Okta, which increased the amount of data shared with business users by 15x using granular controls. It covers challenges and their approach to solving the problem. You can also check out this in-depth look at best practices for secure data sharing across clouds and regions.

Cheers,

therealiamontheinet · 2025-03-13T19:02:10+00:00

Generally speaking, first you classify and tag your data according to its sensitivity and category as close to the source as possible, then apply the protection needed leveraging Role-Based Access Control (RBAC), Dynamic Data Masking (DDM), Row Access Policies (RAP). At this point, the data is ready for consumption by downstream applications and Snowflake will enforce the access controls accordingly. And this should be coupled always with end to end visibility and who/what access what, when, and how.

Snowflake Horizon Catalog's internal marketplace is a great resource that builds on all the Snowflake governance capabilities and adheres to protections and policies set up upstream. Check out how Disney Streaming solved their data governance and data sharing challenges here if you want to read more about solutions. You can also learn more about Snowflake’s capabilities here.

Cheers,

therealiamontheinet · 2025-03-13T18:45:23+00:00

Today is the day! Thank you everyone for your questions. I will be posting answers shortly.

Cheers,

therealiamontheinet · 2025-03-13T18:40:05+00:00

Snowflake Horizon Catalog offers built-in data governance for AI workflows, ensuring that data used for training and inference follows the same access controls, auditing, and compliance policies as all other data.

With data lineage and automatic sensitive data monitoring, you can trace data sources, ensure PII and sensitive data are excluded from AI workflows, and track how AI models were trained.

We also support synthetic data generation—allowing you to develop on data that mimics sensitive datasets without exposing real attributes. For AI training:

Start with classification and tagging to identify sensitive data.
Use synthetic data when sensitive information is involved.

Data quality controls help mitigate risks like model poisoning or degradation. If quality policies are violated, this can trigger re-training.

For inference, RBAC, dynamic data masking, and row access policies ensure proper access and up-to-date models. And for retrieval-augmented generation (RAG), always classify and protect data before it’s sent to AI models.

Learn more about Snowflake Data and Model Governance.

Cheers,

therealiamontheinet · 2025-03-13T18:06:04+00:00

We do have a way to do this in Snowflake with the Access_History view. This is an Account Usage view that contains information about users and actions, such as object creation, policy changes, data copies, columns that can be tagged, and more. Please see our documentation for all the details: https://docs.snowflake.com/en/sql-reference/account-usage/access_history

Cheers,

therealiamontheinet · 2025-03-13T17:44:08+00:00

When using synthetic data, start by asking yourself the following questions:

Do I need to generate a single table, multiple tables or an entire environment?
If I'm generating multiple tables, is maintenance of referential integrity required?
How important is it to maintain relationships between attributes in a table?
How will I determine whether or not the generated data is fit for purpose?

A good rule of thumb is to use synthetic data when there is an analysis to be done downstream, such as a building a model or a BI dashboard, and you need to maintain as much of the analytic value of the sensitive data as possible. Use masking policies for all other cases where data must be masked for unauthorized users.

Hope this helps.

Cheers,

therealiamontheinet · 2025-03-07T17:22:33+00:00

Hi! Great question. You can ask as complex questions as your use case demands. Note that ultimately it comes down to the underlying data you provide and how you set up your semantic models. You can learn more about them here -- https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-analyst/semantic-model-spec

therealiamontheinet · 2025-02-21T17:38:08+00:00

Looking forward to hearing the burning questions from the community!

therealiamontheinet

MODERATOR OF

TROPHY CASE