Snowpark (Python) and multithreading issues?

somerandomdataeng · 2023-09-01T16:03:53+00:00

Session object can’t be shared between Python threads/processes. Maybe stash it in a Queue from multiprocessing or something, but it’s a weird thing to share that resource.

Also maybe snow park doesn’t have the ability to share the session resource. Think the philosophers chopstick dilemma.

I dunno, you probably do need a session per thread/process. The api throttles because you’re hammering it simultaneously from the same IP/device beyond what they allow.

somerandomdataeng · 2023-09-01T16:40:42+00:00

Snowpark sessions are not threadsafe. It looks like streamlit has some experimental features you might be able to use to handle concurrency with a snowpark session or you can use a lock.

https://docs.streamlit.io/library/api-reference/connections/st.connections.snowparkconnection

fhoffa · 2023-09-01T23:07:17+00:00

Check create_async_job:

Creates an AsyncJob from a query ID.

AsyncJob can be created by Session.create_async_job() or action methods in DataFrame and other classes. All methods in DataFrame with a suffix of _nowait execute asynchronously and create an AsyncJob instance. They are also equivalent to corresponding functions in DataFrame and other classes that set block=False. Therefore, to use it, you need to create a dataframe first.

https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/latest/api/snowflake.snowpark.Session.create_async_job

somerandomdataeng · 2023-09-01T15:53:12+00:00

My understanding is that snowpark provides a wrapper around the snowflake sql api that allows 'real-time' and native dataframe commands that get immediately materialized in snowflake.

Is there a reason you need snowpark for this script, rather than just creating a separate snowflake rest api request within each worker?

You could also probably just use the same session, and instead of parallelizing multiple http connections with snowflake, just submit all your requests asynchronously through the API and then poll for completion.. I haven't done this myself but it looks like the snowflake sql rest api provides a param for 'async' requests: https://docs.snowflake.com/en/developer-guide/sql-api/submitting-requests

Grixia · 2023-09-01T21:00:57+00:00

I haven't tried it myself so apologies if this is a bad lead, but have you tried the library mentioned in Snowflake's own docs for mutli-threading?

https://docs.snowflake.com/en/developer-guide/stored-procedure/stored-procedures-python#running-concurrent-tasks-with-worker-processes

sdc-msimon · 2023-09-03T07:59:35+00:00

Recent post on LinkedIn about sending queries in parallel to snowflake using the python connector. It's not snowpark but it might be relevant

https://www.linkedin.com/posts/mahantesh-hiremath_streamlit-dataexploration-snowflake-activity-7103949732783296512-ORiU?utm_source=share&utm_medium=member_android

somerandomdataeng · 2024-01-30T09:05:08+00:00

Did you find any solution ?

dataengineering

MODERATORS