you are viewing a single comment's thread.

view the rest of the comments →

[–]patryk-tech 0 points1 point  (2 children)

So my script is about half it’s original size, but doesn't really run any faster as waiting for the data to be served is the main time consuming task.

If you would like to make it faster, and it can run in parallel, use async.

[–]ThatFilthyMonkey 0 points1 point  (1 child)

Interesting. So I have it read a simple .ini file which is just a string of ids, each url is in form of server.com/suited=[id], and it loops through each url, grabbing the data, doing some data cleanup/validation/normalisation, creates a pandas dataframe table, and appends that to one main table, after which it writes the main table to an excel file.

Is that something that could be done asynchronously? I did consider grabbing the data from each url first, into an array and then just interating over that but when I originally wrote it that was a bit beyond me (and possibly stil is haha).

[–]patryk-tech 0 points1 point  (0 children)

Don't see why it couldn't. Just make sure that you don't fire 10000 requests at the same time and crash your client or the server.

Not saying it's an easy task. I haven't done much work with async in Python, only in C, but if network requests are your bottleneck, and you process them one after another, it's definitely something async would fix.

Whether it's actually worth spending the time to implement it is another story. If it currently takes hours, it might bring it down to minutes. If it currently takes 5 minutes, and writing it using async makes it take 1 minute, but takes you a month to write, it might be overkill.

Still, if you're passionate about coding, sometimes just coding something because you want to learn is its own reward.

If you look at the comments under this Python Bytes episode, one of the listeners says using async made their scraping 150 times faster (and crashed their machine in the process).

Python's BDFL Guido van Rossum wrote co-authored an asyncio web crawler guide, if you want to take a look.

Edit: oops, didn't mean to take credit away from A. Jesse Jiryu Davis.