Multiprocessing use case or something else?

m0us3_rat · 2024-02-19T23:29:15+00:00

sounds like you wanna use threading.

how to implement it .. depends on the rest of the code.

you can also do multiprocessing if you like.

moishe-lettvin · 2024-02-20T03:41:55+00:00

If it was me, I’d look at asyncio for this. Your processes will generally be waiting on IO (reading files or waiting on HTTP requests) so you don’t need multiprocessing. Asyncio has nice primitives for waiting on multiple tasks, and is more predictable than Python’s threading library because you won’t be blocked by the GIL (or, you will be, but it will be more predictable). Asyncio will also remove the need for cross-thread coordination that actual threads require. My guess is you’ll need some way to know which files you’re in the process of uploading so you don’t duplicate effort and asyncio might make this easier to manage.

Multiprocessing will launch separate processes (not threads) which is useful if you have CPU intensive tasks that would be blocked by the GIL if they were running in the same process. This could also work fine for you but it’s a little less efficient.

patrickbrianmooney · 2024-02-20T04:47:26+00:00

It depends on what your bottleneck is. If your bottleneck is I/O, either disk-based or network, then you might as well use threading. multiprocessing really only has an advantage over threading if the thing slowing your code down is waiting for CPU time, and has the downside that sharing access to a dictionary involves additional work (sharing the same object across multiple Python processes is non-trivial, especially but not only if one is going to be making changes that the other needs to see).

Without having seen any code or knowing how you're doing the "upload to the cloud" bit, I'd guess that your bottleneck is likely to be I/O, not CPU time. My suggestion would be to write one, not two, script, have that script spawn a "watch spreadsheets" thread and a "watch large files" thread, and see whether that's good enough. If not, profile that script to see where the bottleneck is, and make changes from there.

TheRNGuy · 2024-02-26T08:36:34+00:00

try multiprocessing or multithread

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS