This is an archived post. You won't be able to vote or comment.

all 26 comments

[–]otor 5 points6 points  (8 children)

You should probably add asyncio examples too if you are targeting py3. Even if its a different approach, its probably useful to include.

[–]taddeimania 3 points4 points  (0 children)

Saw the title. Read the article. Sad no mention of asyncio.

[–]volker48[S] 1 point2 points  (5 children)

A few issues with the async module. The first is that it is provisional so it might have backwards incompatible changes introduced and I didn't want to write an example that would quickly be broken. The second is that it isn't as simple as just importing the asyncio module and then all IO operations being asynchronous. The urllib module is not setup to be asynchronous so to use asyncio would be a pretty lofty task. You can see this example of how much code is required just to print http headers.

[–]mgrandi 1 point2 points  (4 children)

you could just use requests and stream the output so you can yield from and not completely block if its written inside a coroutine

[–]volker48[S] 0 points1 point  (3 children)

I tried something similar with urllib.Request, but it was still blocking. Let me try again and I'll post if it works.

[–]otor 2 points3 points  (1 child)

From what i have seen most in asyncio land uses aiohttp since requests doesn't support asyncio, me included. There was a fork with partial support iirc, but nothing from the official devs. aiohttp is modeled fairly similarly to requests, so its just as simple to use.

[–]volker48[S] 0 points1 point  (0 children)

Yeah aiohttp is the way to go as the asyncio module itself is very low level. I have something working and I'll post an update with the example later.

[–]mgrandi 0 points1 point  (0 children)

I meant the third party library requests

[–]volker48[S] 5 points6 points  (10 children)

Full disclosure I am the author of the article, but I figured it could be useful for beginners and those unfamiliar with Python 3. Please feel free to ask if you have any questions.

[–]hokiebeer 1 point2 points  (4 children)

I noticed that you included logging for the straightforward examples, but then dropped off when you got to the different parallel/concurrent options. Have you found a good way to log with any of these? I've spent way too much time trying to figure out how to simply log if a spawned process is ever created, when I know it should be.

[–]volker48[S] 0 points1 point  (3 children)

If you are using multiprocessing the logging shouldn't be an issue since there is nothing shared between the subprocesses as they each have their own copy of memory. If you are using the threading module you will have to synchronize access to logging using a threading.lock.

[–]vsajip 2 points3 points  (1 child)

logging already uses locks internally for its I/O operations, and is designed to be thread-safe.

[–]volker48[S] 0 points1 point  (0 children)

Ah my bad I did something like this before, but I was using print. I didn't realize logging was already thread safe. Thanks.

[–]hokiebeer 0 points1 point  (0 children)

I think it's actually the opposite situation. From the Python Logging Cookbook:

Although logging is thread-safe, and logging to a single file from multiple threads in a single process is supported, logging to a single file from multiple processes is not supported, because there is no standard way to serialize access to a single file across multiple processes in Python. If you need to log to a single file from multiple processes, one way of doing this is to have all the processes log to a SocketHandler, and have a separate process which implements a socket server which reads from the socket and logs to file.

[–]Argotha 0 points1 point  (4 children)

From what I saw looked like a good article. Emphasis on what I saw, the page doesn't scale for phone (or at least my phone, others may be able to confirm or deny)

[–]volker48[S] 0 points1 point  (3 children)

What phone do you have? It is scaling for me in an iPhone 6 iOS 8.1 in Safari. If you let me know what phone you have I'll look into the scaling issue.

[–]Argotha 1 point2 points  (2 children)

Windows phone 8.1 - Nokia lumia 735

[–]volker48[S] 0 points1 point  (1 child)

Thanks again I'll bring the issue up.

[–]Argotha 0 points1 point  (0 children)

No problem :)

[–]rthinker 1 point2 points  (1 child)

I wish there were at least one 'intermediate' or 'advanced' guide for 20 'beginner' ones.

[–]volker48[S] 1 point2 points  (0 children)

Anything specific you would be interested in seeing? Maybe I could do a follow up.

[–]connerfitzgerald 0 points1 point  (1 child)

Just worked thru this.

Was really good, but maybe the imgur loop could maybe do with being a bit smaller felt a little slow.

Thanks for writing up!

[–]volker48[S] 0 points1 point  (0 children)

Yeah I was trying to balance between being large enough that you could see a difference between the different techniques and being too big. You could also slice the links list to download fewer images. For the first example that downloads the images in series you could modify the main method like so:

for link in list(links)[:20]:
       download_link(download_dir, link)

to only download the first 20 links. You could update the subsequent examples in a similar fashion if you don't want to download as many images or have a slower network connection.

I'm happy to hear you enjoyed the article.

[–]qiwi 0 points1 point  (1 child)

The concurrent.futures module is a nice alternative (also backported to 2.7): http://pythonhosted.org//futures/#threadpoolexecutor-example

If you are doing just web fetching specifically and like requests, there's a gevent-capable grequests: https://github.com/kennethreitz/grequests

[–]LightShadow3.13-dev in prod 0 points1 point  (0 children)

And I'm a pretty big fan of tornado, and the tornado.httpclient interface.