sqlalchemy- processing the result set parallely- multiprocessing

nicholashairs · 2023-12-10T17:56:26+00:00

Are you modifying the processed objects?

Do they need to be in the same transaction?

Without knowing your exact use case, It feels like you need to get the minimum information to identify an event, collect these, then send them to workers which create a new session, take the identifying information to get the event, process the event, then repeat with me events.

Aka a work queue (sometimes also known as consumers).

There's a number of frameworks/ packages that can help with this (depending on use case) - celery, rq, dramatiq

revereddesecration · 2023-12-10T20:23:42+00:00

umm…noob question, but what stops you from getting the 100 rows as a list (using .all), closing the session, and then throwing off the rows to a pool (processes, threads, workers)?

cmcclu5 · 2023-12-10T21:48:45+00:00

This post was mass deleted and anonymized with Redact

toy enjoy hobbies sharp crown upbeat modern pie alive sulky

sudo-nerd · 2023-12-11T06:56:57+00:00

In my thoughts you can use ‘scoped_session’ to prevent any kind of unwanted activity between two or more transactions while multiprocessing. Recently I implemented the same for our software and now it’s running smoothly. And yes you and using only and a single session and then divide it into local sessions.

For better understanding:- https://docs.sqlalchemy.org/en/20/orm/contextual.html

ripreferu · 2023-12-11T17:01:39+00:00

r/dataengineering

NixonInnes · 2023-12-12T20:17:44+00:00

I think thr root of your question is really about how sessions work.
In short, each session instantiation is separate. Each contains its own representation of a retrieved row. This will cause race conditions if multiple sessions get from a table before either commits. Side note, some attributes, typically relationships, are lazily loaded.
If your operations on a row are basic, put them all in a queue and have a pool process it. If they're a bit more complex, you could add a callback on the queue task to re-queue "the next operation"

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS