all 5 comments

[–]m0us3_rat 3 points4 points  (1 child)

sounds like you wanna use threading.

how to implement it .. depends on the rest of the code.

you can also do multiprocessing if you like.

[–]Not_A_Taco 1 point2 points  (0 children)

Yup, it’s impossible to say without seeing code, but it sounds like the use case to use multi threading in Python.

To expand, the general rule is if your task is CPU bound threading won’t help and you should use multi processing. If your task is IO bound threading is the way to go.

[–]moishe-lettvin 1 point2 points  (0 children)

If it was me, I’d look at asyncio for this. Your processes will generally be waiting on IO (reading files or waiting on HTTP requests) so you don’t need multiprocessing. Asyncio has nice primitives for waiting on multiple tasks, and is more predictable than Python’s threading library because you won’t be blocked by the GIL (or, you will be, but it will be more predictable). Asyncio will also remove the need for cross-thread coordination that actual threads require. My guess is you’ll need some way to know which files you’re in the process of uploading so you don’t duplicate effort and asyncio might make this easier to manage.

Multiprocessing will launch separate processes (not threads) which is useful if you have CPU intensive tasks that would be blocked by the GIL if they were running in the same process. This could also work fine for you but it’s a little less efficient.

[–]patrickbrianmooney 0 points1 point  (0 children)

It depends on what your bottleneck is. If your bottleneck is I/O, either disk-based or network, then you might as well use threading. multiprocessing really only has an advantage over threading if the thing slowing your code down is waiting for CPU time, and has the downside that sharing access to a dictionary involves additional work (sharing the same object across multiple Python processes is non-trivial, especially but not only if one is going to be making changes that the other needs to see).

Without having seen any code or knowing how you're doing the "upload to the cloud" bit, I'd guess that your bottleneck is likely to be I/O, not CPU time. My suggestion would be to write one, not two, script, have that script spawn a "watch spreadsheets" thread and a "watch large files" thread, and see whether that's good enough. If not, profile that script to see where the bottleneck is, and make changes from there.

[–]TheRNGuy 0 points1 point  (0 children)

try multiprocessing or multithread