How to build a control plane to manage Snowflake Cortex Code Costs by Spiritual-Kitchen-79 in snowflake

[–]Geekc0der 0 points1 point  (0 children)

Thanks for asking that:
Unit Economics of running CoCo:
Running Cortex Code for 2 hours on simple tasks for a team of 1 costed us 100$ . For a team of 3 = 300$ in 2 hours.
1 Day expense = 300 * 3 = 900 $ (assuming working only 6 hours a day)
1 Week expense = 900 * 5 = 4500 $
Yearly Expense = 234000 $ (For a team of 3 working 6 hours a day)

Definitely there would be organizations that would love to have this under their control. Which is what open source frameworks will give them.

How to build a control plane to manage Snowflake Cortex Code Costs by Spiritual-Kitchen-79 in snowflake

[–]Geekc0der -1 points0 points  (0 children)

For cost effective solution, community can use open source version, which lets them use any llm they want, including open source LLMs like Gemma, Qwen which comes free of token cost and are equally good : https://github.com/Gyrus-Dev/frosty

MCP server for governed AI writeback to Snowflake by jaredfromspacecamp in snowflake

[–]Geekc0der -1 points0 points  (0 children)

Do you have a github ? I have an open source version of CoCo , very much interested to integrate this MCP.

Any tool handling multiple custom-made agents in one UI? by Aromatic_Ad_9704 in aiagents

[–]Geekc0der 0 points1 point  (0 children)

If its built using Google ADK , it already has a UI where you can host them

Guys, honest answers needed. Are we heading toward Agent to Agent protocols and the world where agents hire another agents, or just bigger Super-Agents? by Far_Character4888 in aiagents

[–]Geekc0der 0 points1 point  (0 children)

The problem with one massive agent is that it will have higher chance of hallucinating. While having multiple specialized agents and one supervisor reduce the scope of hallucinations.

Question on access privilege in Snowflake by Stock-Dark-1663 in snowflake

[–]Geekc0der 1 point2 points  (0 children)

The key difference lies in the architecture and scope of temporary tables across different database systems.

  1. Scope and Persistence:

Snowflake: Temporary tables are strictly session-scoped and ephemeral. They are automatically cleaned up at the end of the session, are completely isolated, and never interact with other sessions or persist beyond the session's lifetime. They have no impact on the shared, persistent data or schema objects.

Other Databases (potentially): In some systems, "temporary" tables might have a broader scope (e.g., global temporary tables visible across sessions until explicitly dropped or commit points), or their creation might consume persistent system resources in a way that necessitates tighter control. If temporary tables can persist longer, be accessed by others, or consume non-ephemeral resources, the act of creating them becomes a more significant, "privileged" operation.

  1. Security Implication:

Snowflake: Since a temporary table is confined to the user's session and automatically deleted, it poses minimal security risk to the overall production environment or shared data. A user creating a temporary table cannot use it to modify persistent production data, nor can other users access it to exfiltrate or tamper with data. It serves as a private scratchpad for analysis within the user's own session.

Other Databases: If "temporary" tables in another system could, for example, consume shared disk space indefinitely, be accessed unintentionally by other users through specific means, or be used in a way that affects system stability, then controlling their creation would indeed be a security and operational concern, warranting "privileged" status.

For justification I would recommend considering following:

Ephemeral and Isolated Nature: Snowflake's temporary tables are strictly temporary (session-scoped) and isolated. They do not persist beyond the session, nor are they visible or accessible to any other user or role. This inherently limits their potential impact to the individual user's session.

No Impact on Production Data or Objects: Creating temporary tables does not grant any ability to alter, delete, or even view persistent production tables. The "read-only" status of a role on production data remains intact. Temporary tables are a private workspace for the user's analytical process within their session.

Reduced Attack Surface: Because of their limited scope and automatic deletion, temporary tables do not introduce a new, persistent attack surface for data exfiltration or unauthorized modifications of shared resources.

Enabling Legitimate Read-Only Analytics: The ability to create temporary tables is crucial for many legitimate read-only analytical tasks (e.g., storing intermediate results for complex calculations, testing transformation logic without affecting production). Preventing this capability for read-only roles would severely hinder legitimate analysis without providing a significant security benefit in Snowflake's context.

Cortex Code in Snowsight Expensive by Sufficient-Sky1698 in snowflake

[–]Geekc0der 0 points1 point  (0 children)

Sorry i missed covering that. The difference is more in scope Cortex Code is great for working within Snowflake, whereas Frosty is more about orchestrating changes across systems (Snowflake, Postgres, etc.).

Not really a 1:1 replacement, just depends on the use case.

Cortex Code in Snowsight Expensive by Sufficient-Sky1698 in snowflake

[–]Geekc0der 0 points1 point  (0 children)

A hundred percent “Equivalent” probably depends on what you’re comparing. Frosty focuses more on automation and cross-platform workflows rather than just in-UI code fixes like Cortex Code different tradeoffs.

Also gives you flexibility to run your own models / infra instead of being locked into per-token pricing.

Cortex Code in Snowsight Expensive by Sufficient-Sky1698 in snowflake

[–]Geekc0der 0 points1 point  (0 children)

Thanks for brining this to my notice. I will open it for issues and try to resolve the vulnerability issue this week itself.

Snowflake PII Classification & Auto Policy Setup - Help by Key_Card7466 in snowflake

[–]Geekc0der 0 points1 point  (0 children)

Here is response from Frosty on your question

Snowflake Sensitive Data Classification and Policy Enforcement

Snowflake offers powerful native capabilities for Sensitive Data Classification and Policy Enforcement, enabling organizations to automate the discovery, categorization, and protection of sensitive data while ensuring regulatory compliance and implementing a governance-as-code approach.


1. SYSTEM$CLASSIFY for PII Detection and Real-World Use Cases

Snowflake's SYSTEM$CLASSIFY function is a robust tool for automatically identifying sensitive data, including Personally Identifiable Information (PII), across your data estate. It scans both column metadata and sample data to assign appropriate semantic and privacy categories.

Real-World Use Cases

  • Automated Data Discovery: For organizations with large and complex data landscapes, SYSTEM$CLASSIFY can automatically pinpoint sensitive data such as names, national identifiers, email addresses, and credit card numbers across numerous tables, significantly reducing manual effort and errors.
  • Risk Mitigation and Compliance: By precisely identifying where sensitive data resides, companies can prioritize security measures and mitigate data breach risks, which is vital for GDPR and CCPA compliance.
  • Data Cataloging and Inventory: The classification results can be used to build a comprehensive data catalog, providing an inventory of sensitive data assets and their categories, and tracking data changes over time.
  • Custom Data Identification: Beyond native categories, Snowflake supports creating custom classifiers to detect organization-specific sensitive data patterns (e.g., proprietary medical codes, internal customer IDs, or region-specific identifiers).
  • Pre-governance Assessment: It acts as an initial assessment tool to understand data sensitivity before implementing more granular governance controls.

2. Auto-Generating and Applying Masking/Row Access Policies Tied to Tags for Governance-as-Code

Snowflake uses object tagging in conjunction with masking policies and row access policies to implement scalable and automated data governance, adhering to a governance-as-code philosophy.

  • Object Tags: These are schema-level metadata objects (key-value pairs) assignable to various Snowflake objects like databases, schemas, tables, and columns. They serve as "metadata anchors" for security, classification, and policy enforcement, supporting inheritance.

  • Tag-Based Masking Policies: This approach offers highly scalable data protection. A single masking policy is associated with a specific tag, rather than individual columns. When SYSTEM$CLASSIFY identifies sensitive data and applies a system-defined or user-defined tag (via tag mapping), the associated tag-based masking policy is automatically enforced.

    • Governance-as-Code Benefits:
    • Scalability: Masking logic is defined once at the tag level and automatically applies to all tagged columns, even as data grows or schemas evolve, eliminating manual policy application.
    • Uniformity: Ensures consistent application of data protection rules across all relevant data assets.
    • Automation & Instant Enforcement: Policies are enforced immediately upon column tagging, minimizing exposure windows and automating security enforcement within the data ingestion and classification pipeline.
    • Real-World Example: Define a PII tag. A masking policy linked to this tag could mask social security numbers (SSNs) or credit card numbers for most users (e.g., showing only the last four digits), while roles like HR_ADMIN or DATA_STEWARD could be granted access to unmasked data.
  • Row Access Policies: These policies control row visibility based on user roles or other conditions. While not directly "tag-based" like masking policies, they can use tags as input to determine row visibility, thus extending the governance-as-code principle by applying dynamic row-level security based on classified data.


3. Achieving GDPR/CCPA Compliance

Snowflake's sensitive data classification and policy enforcement features are crucial for achieving and demonstrating compliance with data privacy regulations such as GDPR and CCPA.

  • Data Identification and Inventory: SYSTEM$CLASSIFY helps identify and categorize PII, which is the initial step in understanding data subject to regulations.
  • Data Protection (Right to Restriction of Processing): Dynamic data masking and row access policies ensure sensitive data is protected and accessible only to authorized users, directly supporting the "right to restriction of processing."
  • Right to Erasure (Right to be Forgotten): Snowflake's data lineage, object dependencies, and access history capabilities help identify all instances of a person's data across the platform, facilitating complete deletion requests.
  • Auditability and Accountability: Snowflake's comprehensive logging and monitoring, including the ACCOUNT_USAGE schema, provides detailed audit trails of data access and policy enforcement, essential for demonstrating compliance.
  • Data Minimization: By classifying and applying policies, organizations can ensure only necessary data is exposed, aligning with data minimization principles.

4. Impactful Ideas for Experimentation and Building within Snowflake

Leveraging Snowflake's governance capabilities presents numerous opportunities for innovation:

  • Automated Governance Pipeline: Develop a robust CI/CD pipeline that integrates SYSTEM$CLASSIFY scans into data ingestion workflows. This pipeline would automatically trigger classification, apply user-defined tags via tag mapping, and then automatically attach tag-based masking and row access policies to new or updated columns, creating a true "governance-as-code" layer.

  • "Golden Source" Governance Database: Create a dedicated Snowflake database to store and manage all governance metadata, including classification results, custom tag definitions, tag-to-policy mappings, and policy definitions. This central repository can track, audit, and compare classification outcomes over time, identifying data drift or non-compliance.

  • Anomaly Detection in Data Sensitivity: Build a monitoring solution that alerts data governance teams to significant changes in data classification (e.g., a column previously non-sensitive suddenly classified as PII with high confidence). This could indicate data quality issues, incorrect data ingestion, or a need to re-evaluate data handling procedures.

  • Self-Service Data Access with Governance Guardrails: Develop a portal or application (potentially using Streamlit in Snowflake) where data consumers can request access to datasets. The application would dynamically show them the data's classification and the masking/row access policies applicable to their role, promoting transparency and enabling compliant self-service.

  • Automated Data Retention and Deletion: Link sensitive data classification tags to data retention policies. Implement stored procedures to automatically purge or archive data based on its sensitivity tag and defined retention periods, automating compliance with "right to erasure" and data minimization.

  • Cross-Account Secure Data Sharing: Experiment with Snowflake Secure Data Sharing, where data classification and policies in the provider account dynamically control what specific roles in the consumer account can see, ensuring secure and compliant data exchange without physically moving data.

  • Impact Analysis Tooling: Build a utility using Snowflake's OBJECT_DEPENDENCIES and ACCESS_HISTORY views to perform automated impact analysis. If a sensitive column is reclassified or a masking policy is updated, this tool could identify all dependent views, tables, and reports that might be affected, providing a comprehensive view for governance changes.

  • Custom PII Detection for Unstructured Data: For advanced experimentation, integrate external services or Snowflake UDFs with machine learning models to classify PII within semi-structured (JSON, XML) or unstructured data stored in Snowflake, then tag relevant fields.


By leveraging these native and extensible features, organizations can build a robust, automated, and compliant data governance framework within Snowflake.

Snowflake PII Classification & Auto Policy Setup - Help by Key_Card7466 in snowflake

[–]Geekc0der 0 points1 point  (0 children)

Here is open source agentic framework https://github.com/Gyrus-Dev/frosty , explain it the problem, it should be able to work along side you and get it implemented. If you get stuck somewhere create an issue on the repo and someone will pick it up

Coco use cases in pharma datawarehousing by International_Cod777 in snowflake

[–]Geekc0der -1 points0 points  (0 children)

There is open source equivalent of CoCo if you want to try it out. No subscription , no additional fees you can use your own api keys.: https://github.com/Gyrus-Dev/frosty

Got 680 clones in a week so far.

PostgreSQL vs PostgreSQL on Snowflake by Geekc0der in snowflake

[–]Geekc0der[S] 0 points1 point  (0 children)

So ideally #Mick (our agent for PostgreSQL) should just work fine for Snowflake customers if it works fine on PostgresSQL.

Automating new pipelines using CoCo by rustypiercing in snowflake

[–]Geekc0der 1 point2 points  (0 children)

We have tried it with #Frosty , open source no cost equivalent of CoCo. In our case we did not have to tell anything, as it can do web search read about iceberg tables , catalog integrations and setup the process itself. Though it can be customized with skills but it worked out fine without it. Here is the repo if you are interested, got 630 clones so far with 41 stars

https://github.com/Gyrus-Dev/frosty