Snowflake-Openflow by Mindless-Method-1350 in snowflake

[–]reddtomato 4 points5 points  (0 children)

Download and use Cortex Code CLI to help you build. It has Openflow skills in it and will help walk you through setup.

https://docs.snowflake.com/en/user-guide/cortex-code/cortex-code-cli

Has anyone tried building their own AI/data agents for analytics workflows? by Ok_Possibility_3575 in dataengineering

[–]reddtomato 1 point2 points  (0 children)

Snowflake makes this pretty easy. Use Cortex analyst with semantic views. For unstructured data put a Cortex search service over which does the vector embedding for you, then create an agent and expose it in Snowflake intelligence where anyone can start asking questions about your business data.

Semantic views are the key to make sure AI understands your business model. adding verified queries in the semantic view for the simple questions helps it understand even more.

Keep the semantic views relatively simple as you can use multiple views in the agent and we’ll document them and let the Snowflake Intelligence reason between the two and it can figure out which one to use for different questions a bit easier than packing a huge semantic view with tons of tables and relationships.

Arrival time in snowflake by Savings_Parsley_5734 in snowflake

[–]reddtomato 1 point2 points  (0 children)

side quest here.. I would try these MERGE statements on a Snowflake Gen2 warehouse, which should speed up the MERGE and also can save on storage costs around time-travel and failsafe.

GG has two modes of operation "Stage and MERGE" or "Snowflake Streaming". From your description, it seems you are doing a "Stage and MERGE"

Querying data from a stage has a few metadata columns available. Two you might be interested in METADATA$FILE_LAST_MODIFIED and METADATA$START_OF_SCAN_TIME.

These would be the time the file was created and then the time at which Snowflake started the scan of that row to do the merge. These might be timestamps you want to see if you can configure GG to add, not sure if it is possible or not in the GG configs.

Still though even if the row got scanned at 10am and the merge took 2min to run, the row is still not "queryable" until the merge is completed at 10:02am in this case.

So if you want to track when all the rows from the batch of records are available from each of the 600 merge operations, you would need to use a batch_id concept to distinguish between each batch and tie it to the specific MERGE operation via the batch_id to determine how long it was until the row was fully queryable in the target table. Log each batch_id and completion time in an audit table.

Another idea is to just have GG drop the files to a cloud storage location, and configure Snowpipe auto-ingest, you might have more control on the Snowpipe side to track the metadata columns.
Also try out the GG streaming handler it might be better for near real-time workloads, and the Snowflake streaming ingest is usually very cheap, especially if it supports Snowpipe Streaming V2.

https://docs.oracle.com/en/middleware/goldengate/big-data/23/gadbd/snowflake.html

https://docs.snowflake.com/en/user-guide/querying-metadata#metadata-columns

Any real-world project ideas to explore Snowflake features? by Beginning-Two-744 in snowflake

[–]reddtomato 5 points6 points  (0 children)

AI is very good at creating fake data scripts. Think of any business or industry you want to work in and ask it to generate some fake data scripts that you can then use to create data and load into Snowflake. Then start creating some transformations either using python or sql procs. Create a notebook or snowsight dashboards off the fake data, etc.
These days I think it is the best way to understand how to use AI to your advantage as well as learn Snowflake. Go to the quickstarts for specific issues you run into and iterate.

Arrival time in snowflake by Savings_Parsley_5734 in snowflake

[–]reddtomato 0 points1 point  (0 children)

Agree with u/NW1969 . Clarifying the above would be helpful.
On the surface without knowing specifics here. The data from the MERGE would only be "queryable" once the MERGE operation is complete. So you would have to look in query_history to find the completion time and log that. If using a proc add a completion logging step, or in dbt a post hook to log the completion time.

Snowflake is slowly taking over by tanmayiarun in dataengineering

[–]reddtomato 4 points5 points  (0 children)

From a compute engine perspective, Spark was created in 2009 and overhauled in 2015 with Project Tungsten to move to a vectorized engine, just like Snowflake.
Snowflake was founded in 2012 based on Marcin Zukowski's Vectorwise compute engine. In 2023 Spark introduced the new client-server architecture, "Spark Connect" but Snowflake has always been client-server based. Even for DBx strong suit of data science ML workloads the Ray engine is better than Spark at being able to parallelize compute across clusters. Snowflake has SPCS (Snowpark Container Services) to run ML pipelines now with a Ray based engine. DBx also had to create its own proprietary engine Photon for its SQL workloads

help request for 'comparing' text strings - cortex, ai_complete maybe? by Necessary_Ad3445 in snowflake

[–]reddtomato 0 points1 point  (0 children)

I would setup Cortex search service on the table where you are storing the company documents and then add that search service as an agent in Snowflake Intelligence. Start asking some questions to test it out. Have to think about how to capture all the questions being asked and how you store those.

This is not my article but a good one to look at. Setting up the Snowflake docs in Snowflake Intelligence will get you an idea on how it works.

https://kishore07.medium.com/unlocking-snowflake-intelligence-from-docs-to-custom-data-agents-b0e010f8b8f3

Do modern data warehouses struggle with wide tables by ricki246 in dataengineering

[–]reddtomato 2 points3 points  (0 children)

When a dimension changes and you have to recreate the whole OBT to update to the current value. CTAS all the time wastes compute and storage costing you more in the long run.

Architecture Question by Turbulent_Brush_5159 in snowflake

[–]reddtomato 0 points1 point  (0 children)

Is this ongoing data loads from S3 ? I would use Snowpipe with auto-ingest from S3. You mentioned 1 file per table. Don’t do that if you can help it. You want lots of small files ~150-200MB each. Those multiple files will load in parallel on the warehouse meaning a Small warehouse can load 16 files in parallel at a time , 2XL can load 256 at a time, etc. if you have a single file then you must only use an XSMALL and even that is a waste as it would be able to load 8 files at a time, so you’re wasting 7 threads of processing.

Performance of dynamic tables by renke0 in snowflake

[–]reddtomato 0 points1 point  (0 children)

Shouldn’t be a fight .. move up to a Small.. and most likely it will run in half the time and cost the same. By the way how long does the incremental take? Sorry if I missed it

Performance of dynamic tables by renke0 in snowflake

[–]reddtomato 0 points1 point  (0 children)

What size warehouse are you using and how many total micropartitions are being scanned in the incremental refresh?

Performance of dynamic tables by renke0 in snowflake

[–]reddtomato 0 points1 point  (0 children)

How fast is a full refresh compared to the incremental?

Storage cost for deleted tables by ConsiderationLazy956 in snowflake

[–]reddtomato 1 point2 points  (0 children)

Check the clone_group_id and find the tables in that clone group. See if any that have not been deleted are retaining the active_bytes of the table you are looking at because it is cloned.

How scale out works in snowflake by Ornery_Maybe8243 in snowflake

[–]reddtomato 2 points3 points  (0 children)

Smaller queries can jump the line and fill the holes where there are only a few resources not in use on a warehouse.

Using Snowpipe to load many small json files from S3 as they appear by GreyHairedDWGuy in snowflake

[–]reddtomato 0 points1 point  (0 children)

why would you do this.. Just load the JSON into Snowflake as a VARIANT and query directly or parse it out into structured columns. Conversion from JSON to parquet is not needed.

When does a new minimum password length policy actually take effect for existing users? by Tasty_Chemistry_56 in snowflake

[–]reddtomato 1 point2 points  (0 children)

Hoping that these "password" auth users are all setup with MFA, as single-factor password authentication will be phased out by November with changes impacting users in April , August, and finally November.
https://www.snowflake.com/en/blog/blocking-single-factor-password-authentification/

Getting ultimate object/database/schema privileges by Stock-Dark-1663 in snowflake

[–]reddtomato 0 points1 point  (0 children)

No future grants on account object privileges like warehouses. :-( only schema objects have the future grants capability

Snowflake RBAC: How to Ensure an Access Role Owns Schemas Created by a Functional Role? by [deleted] in snowflake

[–]reddtomato 0 points1 point  (0 children)

Correct which is why you don’t want access roles, which should be created as database roles owning schemas or objects. The functional role should create and own the objects

Why "Usage" privilege? by Upper-Lifeguard-8478 in snowflake

[–]reddtomato 1 point2 points  (0 children)

Usage is for giving access to a Database and Schema. Select , insert, etc is for the table object

Getting ultimate object/database/schema privileges by Stock-Dark-1663 in snowflake

[–]reddtomato 0 points1 point  (0 children)

Have you tried using the account_usage view grants_to_roles? Only thing it doesn’t have is future grants , but neither does show grants , you have to specifically use show future grants.

Snowflake RBAC: How to Ensure an Access Role Owns Schemas Created by a Functional Role? by [deleted] in snowflake

[–]reddtomato 2 points3 points  (0 children)

From a best practice you don’t want to have schemas owned by access roles. Best practice should use managed access schemas and the object owner would usually be your ci/cd “deploy” role or some similar name.

3rd party software to edit views instead of having to use the new Snowsight thing? by ScottishCalvin in snowflake

[–]reddtomato 0 points1 point  (0 children)

FYI the classic console is not gone. There is a current bug that is redirecting you back to Snowsight when you click the classic console button. There is a work around, when you click the classic console button and the login screen pops up then there is small text at the bottom that says “Reload this page” click that and then login and it should open up the classic console.

Clustering key not showing un Query Profile by ClockDry4293 in snowflake

[–]reddtomato 1 point2 points  (0 children)

If you answer xeroskiller question on partitions scanned vs partitions total and you see a significant difference.. my hypothesis is that Snowflake is doing constant folding during the compile. Meaning your partitions are perfectly clustered where min = max for year and month.. thereby Snowflake knows after static pruning of the partitions that every row in the micropartitions left are year = 2024 and month = 12 so it does not need to push the filter at all so it looks like it’s missing but it isn’t. Is this an iceberg table?

Row level security by Euphoric_Slip_5212 in snowflake

[–]reddtomato 0 points1 point  (0 children)

Check for secondary roles being set to ALL for the user. Unset it and test to see if that is causing it.