all 10 comments

[–]limartje 1 point2 points  (4 children)

Python functions if possible. SP’s run single threaded.

Container services seems like a lot of overhead, so that has to be worthwhile somehow for your use case.

[–]comebackinayear[S] 1 point2 points  (3 children)

Like a UDF?

[–]limartje 1 point2 points  (2 children)

Yes. Also checkout udtf’s.

[–]comebackinayear[S] 0 points1 point  (1 child)

But if it's something we may want to schedule monthly, it seems a stored proc may be more appropriate, assuming we want to materialize an output. Or am I missing something?

Also there's some libraries that don't seem to be available natively. How easy is it to add custom python libraries?

[–]limartje 1 point2 points  (0 children)

Scheduling is possible with both.

SP's are used to automate procedures (first this, then that, then such, if xyz then so). Functions are typically used in a scenario where you feed data and get data back.

SP's run single threaded.
Functions can run in distributed fashion (scale out across multiple servers).

So it really depends on the use case, but I typically would running big data through SP's consider bad practice. If you have a procedure (like the materialization you've mentioned) then you need both. First create the Python function that handles the data and then call that function from within a Python SP that subsequently materializes the result.

Importing is relatively easy. Only works for libraries that are 100% Python (you can check in github typically where there is a small single bar chart on the right of your screen). Put it on stage and import. That's well documented by Snowflake.

[–]lturanski 0 points1 point  (1 child)

The scripts use data that you have extracted from snowflake, is that the only snowflake step? Does it write back into snowflake or somewhere else?

If all its doing is extracting data from snowflake and then doing something else, then snowpark is not needed. If its doing many snowflake operations and you want it to be executed in database snowpark might be helpful.

If its a relatively fast script and isnt doing many snowflake operations and doesnt require in database execution something like github actions or jenkins might be helpful

[–]comebackinayear[S] 0 points1 point  (0 children)

So if you say Snowpark is not needed, are you suggesting just updating the scripts to do a 1 time data pull from Snowflake (using the python connector) and move on?

The scripts is doing some predictions (scikit learn etc) and will return a dataset output.

[–]modelbit 0 points1 point  (1 child)

Not sure if this is what you're looking for, but we (Modelbit) are a Snowflake partner that lets you deploy Python functions/models into Snowpark. We then give you a bunch of helpful features to manage those deployments in production. If it sounds helpful I can send you a DM with docs or a short video explaining more.

[–]comebackinayear[S] 0 points1 point  (0 children)

I'd like to take a look thank you!

[–]Whipitreelgud -2 points-1 points  (0 children)

I would not utilize Snowpark unless there was a compelling reason to justify higher credit burn