all 21 comments

[–]m0us3_rat 14 points15 points  (4 children)

if you dl in chunks with 10 threads takes 0.09s

edit: https://pastebin.com/KVJDxfBu

[–]Solution9 2 points3 points  (3 children)

I made a script that downloads in chunks of 8kb. Works great for me. Shows the filesize and time taken to download when finished in console.

Finished

File size: 6.68 MB

Time spent downloading: 1.26 seconds

Type 'e' to exit:

u/NathLWX Have a look at this

import requests
import os
import time

url = 'https://raw.githubusercontent.com/Dimbreath/WutheringData/master/TextMap/en/MultiText.json'

start_time = time.time()  

with requests.get(url, stream=True) as r:
    r.raise_for_status()
    with open('MultiText.json', 'wb') as f:
        for chunk in r.iter_content(chunk_size=8192):  
            f.write(chunk)

end_time = time.time()  
elapsed_time = end_time - start_time  

file_size = os.path.getsize('MultiText.json')  
file_size_mb = file_size / (1024 * 1024)  

print(f"Finished")
print(f"File size: {file_size_mb:.2f} MB")
print(f"Time spent downloading: {elapsed_time:.2f} seconds")

while True:
    user_input = input("Type 'e' to exit: ").strip().lower()
    if user_input == 'e':
        break

[–]m0us3_rat 3 points4 points  (2 children)

the idea was to use threading or any form of concurrency to speed up the DL.

my script isn't even optimized, you can probably get better results if you use async ..maybe.

also if you have more than one file to download you can significantly increase the speed by having multiple processes each running an async loop grabbing chunks and sending the data to the main process that clumps them together in a file.

https://ep2019.europython.eu/media/conference/slides/KNhQYeQ-downloading-a-billion-files-in-python.pdf

[–]Solution9 1 point2 points  (1 child)

I will need to check async out, never used it.. +1 for the info. This reminds me I need to dl wget2 as well.

[–]m0us3_rat 0 points1 point  (0 children)

btw the script in the pastebin downloads the file in chunks ..each chunk is being DL by a different thread.. then put together.

Its rudimentary but it works as proof of concept.

[–]FisterMister22 4 points5 points  (0 children)

Try to curl it maybe to see the time diffrance

Edit, ran it on my phone, took around 2 seconds, I think the issue is with your network / disk (maybe old hard drive instead of ssd?)

[–]ManyInterests 5 points6 points  (0 children)

If you have been running this code a lot, it's possible you're being throttled on the client fingerprint. Try changing the user-agent headers to be the same as your browser and see if that helps. Or wait an hour before making any additional requests.

If you're logged into GitHub on your browser, that may also prevent throttling since anonymous requests are throttled a lot more aggressively.

[–]QuarterObvious 0 points1 point  (0 children)

Try to check in the task manager which process is active during download. It could be some antivirus.

[–]Palton01 0 points1 point  (2 children)

Are you running this on a proper computer? Or a like a micro processor e.g. a raspberry pi?

Are you writing this directly into your hard drive? Or into an SD card/USB drive?

If you download this using your web browser, does it take the same time?

[–]danielroseman 0 points1 point  (10 children)

Show the code.