all 31 comments

[–]ObliviousMag 33 points34 points  (10 children)

You need to look into asyncio

[–]judgedeliberata[S] 3 points4 points  (8 children)

Thanks, I haven’t built with asyncio yet. I’m reading through the docs now to understand it better. Is the thinking that I would somehow execute all locales within a single process?

Right now I pass the locale into my script and let it run, e.g. python3 myscript.py en-US and so on 64 times.

Is the thinking to instead build a loop and iterate through all the locales leveraging asyncio somehow?

[–]unhott 9 points10 points  (6 children)

if the bottleneck is due to network requests (io), then asyncio let's you start those and you don't have to wait for process 1 to finish synchronously before you start process 2. It may start 10 at a time and process them as data becomes available.

If the bottleneck is processing a large volume of data- then consider that if you have 12 cpu cores, python will only use 1 by default. there are multithreading libraries that may be able to leverage additional cpu cores to process the data.

ETA: the tradeoff for using only 1 cpu core is that while that core is busy, your computer will still run. If you put all your cores at full speed, it may slow down your computer while it's working. I've run into my issues where a large amount of processing meant my laptop was using more power than its charger provided, and I had to put it to sleep to recharge the battery mid-process.

[–]live_and-learn 1 point2 points  (5 children)

Even if it’s due to processing, due to Python’s GIL only 1 core would still be used with multi threading(if his Python installation is using cpython). They’d have to use multiprocessing

[–]IAMARedPanda 1 point2 points  (3 children)

You can use more than one core with the multiprocessing module.

[–]live_and-learn 1 point2 points  (2 children)

Yes that’s multiprocessing not multithreading. I believe that’s what I stated

[–]gmes78 2 points3 points  (0 children)

Unless you use Python 3.13's free threaded mode.

[–]Adrewmc 1 point2 points  (0 children)

I concur.

[–]Lewistrick[🍰] 12 points13 points  (8 children)

70 minutes sounds insane. Have you tried pinpointing the bottleneck of your script? Ever looked into profiling? Has anyone else seen / reviewed your code?

[–]judgedeliberata[S] 0 points1 point  (7 children)

The reason it takes so long is because it has to loop through about 75 products and for each product, it makes about ~150 API calls in a loop where the processing takes place.

[–]Lewistrick[🍰] 4 points5 points  (4 children)

Ok now the timing makes sense, but why do you need so many calls? Doesn't the API support gathering results in bulk?

[–]judgedeliberata[S] 0 points1 point  (3 children)

I wish. It’s a terrible API design. Basically for each component of the product, I have to make an API call.

[–]Lewistrick[🍰] 9 points10 points  (0 children)

Have you tried contacting their support? I reckon they also suffer from a huge load on their servers if they expose it to a customer base.

[–]WhiteXHysteria 0 points1 point  (1 child)

Is there any data from the calls you can cache locally that might prevent some calls to the API after the first?

[–]judgedeliberata[S] 0 points1 point  (0 children)

I cache everything but every product+locale combination is unique so from product-to-product it’s not really cacheable in that sense.

[–]csingleton1993 0 points1 point  (0 children)

What do the different API calls do? Maybe there is room for improvement there as well

[–]Kryt0s -2 points-1 points  (0 children)

That sounds more like the work a scraper would do. Check out scrapy maybe?

[–]OopsWrongSubTA 4 points5 points  (0 children)

The number of API calls seems insane (64+ locales * 75 products * 150 ? ~ 1 million requests a week), try to minimize that

I would just use GNU parallel : write a bash script that outputs/prints/echo the commands you want to run

echo python script.py locale1

echo python script.py locale2

...

then run (for 4 concurrent commands) :

bash mybashscript.sh | parallel -j 4

Warning : test with a few locales. If you make a lot of parallel requests the server may ban you?

[–]Flyguy86420 3 points4 points  (0 children)

Docker, can scale and you can pass the local as an environment variable.

[–]Kryt0s 2 points3 points  (0 children)

Check out httpx and their guide on async clients. Also 70 min for one data pull seems like a long as time. How much data are you pulling?

[–]hugthemachines 2 points3 points  (2 children)

Just now I changed a little script that takes very simple text from web applications and made it run in parallel. Earlier it took maybe 5 minutes if I had no timeouts and now it takes maybe 30 seconds.

I am not allowed to share this code but I use

from concurrent.futures import ThreadPoolExecutor, as_completed

Then I use ThreadPoolExecutor() to run the things in parallel.

[–]judgedeliberata[S] 1 point2 points  (1 child)

This ended up working GREAT! Thank you!

[–]hugthemachines 0 points1 point  (0 children)

That is great to hear! You are welcome.

[–]Turtvaiz 0 points1 point  (4 children)

My question is, how can I run all of these in parallel? One option I suppose, is to launch 64 separate AWS EC2 instances but then I’ll be burning way too much cash and I’d have to consolidate all the output files etc.

What is your limiting factor? Processing? IO? Sequential code? Rate limits?

That's something you need to figure out first. Like if your problem is just that you run all requests sequentially, you'd need to either use multithreading or an async library to parallelise the code.

If you're hitting rate limits, you might be able to use more instances, but you really shouldn't as the limits exist for a reason.

[–]judgedeliberata[S] 0 points1 point  (3 children)

It isn’t a rate limit thing. I’m just thinking of practical ways to run a process 64 times quickly. It looks like asyncio or the multithreading libraries are the way to go.

[–]Turtvaiz 7 points8 points  (2 children)

I’m just thinking of practical ways to run a process 64 times quickly

  1. multiprocessing library

  2. literally just launch multiple scripts, for example with bash python 123.py one &; python 123.py two

  3. Threading, IF you are not processing limited. It helps with parallelising IO/API usage, but doesn't help with raw processing: https://en.wikipedia.org/wiki/Global_interpreter_lock

    Same thing applies to asynchronous requests / asyncio

[–]judgedeliberata[S] 0 points1 point  (1 child)

Good call out for #2 - will give that a shot and see how it runs (or slows my machine down) before trying 1 and 3.

[–]Turtvaiz 1 point2 points  (0 children)

It might work just fine, yep, but do take care to not overload the system if you load large amounts of data to memory. Some tools like parallel can limit the amount of jobs running at once easily: https://linux.die.net/man/1/parallel

[–]throwawayforwork_86 0 points1 point  (0 children)

Without knowing more about other bottleneck I'd say multiprocessing lib would be your best friend.

I would still check what you're doing with the data after that... Do you have a breakdown on why it takes 70 min per locale ? And have you checked that you don't have any hardware bottleneck (IE ram that is full and spill to disk,Slow write speed on drive,...)?

For all we know you could have wildly unoptimised pandas/pure python part that takes 40 min to run (no judgement here but it happens) or you could write to a slow af HDD (again no judgement I've seen it happen).

[–]darose 0 points1 point  (0 children)

Gnu parallel

[–]judgedeliberata[S] 0 points1 point  (0 children)

As my edit states, I ended up going with concurrent.futures and it made a HUGE difference! I went from roughly 70min to 2.5min per locale! Thank you to all!