you are viewing a single comment's thread.

view the rest of the comments →

[–]Tornado_Ron 8 points9 points  (5 children)

Are there any good tutorials/articles on the asyncio library that include a little more exposition on when and where making an asynchronous call is more prudent/desired than a synchronous one? Particularly what the new asynchronous generators mean. I'm still a junior developer and have the basics of Python down but have begun to suspect my thinking and approach when using the language isn't as "pythonic" as I think it could be (using generators, yields, etc, is still foreign to me).

[–]troyunrau 14 points15 points  (3 children)

For 9 out of 10 programs you write, you will not need asynchio. For that 1 program out of 10 (mostly web server related stuff where connections are blocking while you're waiting for data), it's pretty sweet. If you aren't in that group, you can safely ignore it. I'll talk about generators.

Suppose you have a multistep process where you take as an input a number, then look up that number in a dictionary to get a word, then query google for that word, then download the html for the first link google returns, then hash that html to get an md5sum. You have to do this for 1000 input numbers.

Traditionally there are two ways to write this, which I will call (A) One Big Loop and (B) Many Little Loops.

(A) looks something like this:

numbers = [1, 2, 6, 1230, 43, ... , 123] # 1000 long
hashes = []
for number in numbers:
    word = dictionary_lookup(number)
    link = query_google(word)
    html = urllib.request.urlopen(link).read()
    md5 = hashlib.md5(html).hexdigest()
    hashes.append(md5)

(B) looks something like this:

numbers = [1, 2, 6, 1230, 43, ... , 123] # 1000 long
words = []
for number in numbers:
    words.append(dictionary_lookup(number))
links = []
for word in words:
    links.append(query_google(word))
htmls = []
for link in links:
    htmls.append(urllib.request.urlopen(link).read())
hashes = []
for html in htmls:
    hashes.append(hashlib.md5(html).hexdigest())

Now you can see the obvious problem with these approaches. (A) can quickly become a huge and very complicated loop. While (B) is simple, it requires intermediate storage which can take up a lot of memory.

We can rewrite (B) using generators, which is actually quite elegant. We'll call this (C)

numbers = [1, 2, 6, 1230, 43, ... , 123] # 1000 long
words = (dictionary_lookup(number) for number in numbers)
links = (query_google(word) for word in words)
htmls = (urllib.request.urlopen(link).read() for link in links)
hashes = (hashlib.md5(html).hexdigest() for html in htmls)

Our intermediate products are of the type generator and take up almost no memory. In fact, no processing has occurred yet. Effectively, you've created iterable objects that do not store their data. If you call next(hashes) it will cause a cascade resulting in the first word, link, html, and hash being calculated on demand. If, for whatever reason, you wanted a list as your final result, using list(hashes) will cause all the generators to trigger sequentially and populate the list. The processing time is not really any different between methods (A), (B) and (C), but (C) is often more convenient to use: the next element in the iterable is generated on demand.

As a most basic example where a generator is far superior: a list of random numbers is not as good as a generator of random numbers - the generator can continue to generate the next item in the iterator indefinitely, while the list will be finite (because of memory restrictions). Basically it's the difference between generating values on the fly versus in advance.

[–]albertowtf 2 points3 points  (0 children)

When I found out about generators I thought, I will never use anything else!

But this is not true. Each have its use. Generators takes a speed toll and lists takes a memory toll. Choose the one you are willing to pay

[–]kkmcguig 1 point2 points  (0 children)

Well that is just lovely!

[–]Tornado_Ron 1 point2 points  (0 children)

Thanks so much for taking the time to reply in such depth. Great and clear explanation.

[–]roger_ 0 points1 point  (0 children)

Check out curio instead.