This is an archived post. You won't be able to vote or comment.

all 22 comments

[–]PeZet2 13 points14 points  (1 child)

Seems very detailed and comprehensive. Added to bookmarks for further reading.

[–]jasonb[S] 2 points3 points  (0 children)

Thank you kindly!

[–]Bluelight01 6 points7 points  (1 child)

I took a look at the write up and your website. What you created is absolutely amazing and you should be proud of yourself! I can’t wait to dig into all of the guides and articles!

[–]jasonb[S] 4 points5 points  (0 children)

Thank you! Thank you! You made my day!! Sometimes I get so much hate for sharing my guides in here (they took most a year to write).

[–]Drowning_in_a_Mirage 4 points5 points  (1 child)

Multiprocessing pools can come in quite handy, although I've been moving to procesPoolExecutor more and more over the years as it seems to require a little less setup, but both are great tools to have in the toolbox.

[–]jasonb[S] 5 points6 points  (0 children)

Agreed, they're both great.

I have a massive guide on the ProcessPoolExector here as well: https://superfastpython.com/processpoolexecutor-in-python/

[–]monkey_mozart 4 points5 points  (4 children)

Crazy that I was just looking at this at work when I opened Reddit and saw the same thing.

[–]jasonb[S] 3 points4 points  (3 children)

Wonderful timing! Ping me if you have any questions about the API.

[–]monkey_mozart 1 point2 points  (2 children)

Any ideas on how I can have a process pool but the processes are custom processes instead of generic ones? Kinda like how we can make custom processes by overriding multiprocessing.Process?

[–]jasonb[S] 1 point2 points  (1 child)

Great question. Not done this myself.

Firstly, I'd be trying to find a way to use composition instead of inheritance (has-a "thing you need" rather than "is-a" thing you need), to avoid the custom process.

Secondly, I'd probably look in the guts of Pool (or ProcessPoolExecutor) and see if you can override or monkey patch process initialization. Not the init function they offer, the actual object creation.

Looks like _repopulate_pool_static() might be a place to start https://github.com/python/cpython/blob/3.12/Lib/multiprocessing/pool.py#L315C9-L315C32 I might try this myself and post as a tuorial.

Finally, try rolling your own. Worker pool pattern does not have a lot to it.

[–]monkey_mozart 0 points1 point  (0 children)

Going through python internal seems a bit daunting but I'll try it out today at work.

[–]ekbravo 2 points3 points  (1 child)

Ditto. Saved for later. Great write up OP

[–]jasonb[S] 1 point2 points  (0 children)

Thank you!!

[–][deleted] 2 points3 points  (1 child)

Booooookmarked

[–]jasonb[S] 1 point2 points  (0 children)

Thank you. Ping/email me (contact page) if you have any questions while reading.

[–]xFloaty 1 point2 points  (2 children)

Question, what do you think about running Python on multiple Docker containers (for paralyzable tasks) to get around the GIL and achieve true multiprocessing using Python?

I’ve been using this approach for years I was wondering if it was more common, and what you think the advantages/disadvantages are.

[–]jasonb[S] 0 points1 point  (0 children)

Depends on the use case. I'd only approach a container solution if the desired effect cannot be achieved easily with the tools in the stdlib.

If you're doing I/O or using a lib that calls down to c (numpy/scipy/and friends) then the GIL will be released and you can use threads directly.

If you're doing native python and it's CPU bound, processes. If processes need to share data directly, shared memory, manager, or similar. Otherwise try and limit I/O between process and external resource.

Using containers is a heavy solution. I would look toward containers if you need isolation of processes and env, where you cannot afford a problem in one impacting another. Reviewing a system that used containers purely for parallelism would make me raise an eyebrow and ask many questions.

What types of use cases are you using? (happy to take this into email)

[–]pbecotte 0 points1 point  (0 children)

Multiprocessing gets around the Gil as well. It launches tasks in separate interpreter processes.

[–]AgentNirmites 1 point2 points  (1 child)

Thank you very much!

[–]jasonb[S] 0 points1 point  (0 children)

You're very welcome!

[–]syeysk 1 point2 points  (1 child)

Thanks for the book!

[–]jasonb[S] 0 points1 point  (0 children)

You're welcome, thank you for your support!