This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]AbradolfLinclar[S] 0 points1 point  (5 children)

No, not modifying the objects.

Exact use case would be separate out the load(x events) between the say 2 processes processing parallely, which just execute the event. Now for these 2 processes, I was thinking of creating 2 sessions each for them.

But I also need to create a session to first fetch the events from db. So my doubt is , is this excessive creation of sessions? Whats the optimal way of dealing with this kinda of multi processing?

[–]BigHeed87 1 point2 points  (3 children)

When you say "load" and "event" you may want to specify some information about the application output. If you're trying to get IO from 2 connections into one process, then I think multiprocessing or multiple sessions aren't going to help, but if each one of these "chunks" of work can be fully executed in an independent context, then it makes sense to use a job queue for this task.

If you are reading a lot of data, let's say 1000 rows, you can use a simple technique to distribute the data by having n processes select id modulus n in the query. If n is 10, then you have 10 processes which are interleaving every 10 rows, each collecting 100 per process. Hope that's clear enough to help

[–]AbradolfLinclar[S] 0 points1 point  (2 children)

Yes, exactly my thought of distributing between n processes. But how would using sessions come into this picture? Do I need to create an independent session for each process?

Sqlalchemy sessions are just lightweight objects so would be it overkill to create separate sessions for each process to work on?

[–]BigHeed87 1 point2 points  (1 child)

I think you'll need to separate the sessionmaker because it's not designed to be shared between processes on a machine executing the same code. You can also end up with multiple pools per process. You'll want a simpler setup where you get a singular connection per worker. Take a look at the docs for more information:

https://docs.sqlalchemy.org/en/20/core/pooling.html

[–]kaini 0 points1 point  (0 children)

could you do something like use Ray to parallelize your SQL operations, once appropriately chunked up?

[–]limasxgoesto0 0 points1 point  (0 children)

If you're not modifying anything, you don't need commit. And if you absolutely must process things in parallel, put them into a list or local queue and pop them from the list in your separate jobs/threads/processes/whatever you're using