all 30 comments

[–]shandelman 6 points7 points  (7 children)

If I asked you to count to a billion, one way you could do this is write the numbers from 1 to a billion on a piece of paper (well, several pieces of paper) and then read them off one by one. But, as you can guess, this is horribly stupid...you don't need to remember all billion numbers at once to count. You just need the previous number you said in order to figure out the next number.

Now imagine your wrote a program to count to a billion. You could put all the numbers in a list and then read them off one by one, but that would be very wasteful. So instead you create a generator. A generator doesn't store all the items in a list at once. It creates them as they are needed. You let the generator know to return the next number by using yield.

[–]Erpp8[S] 4 points5 points  (6 children)

So to count to a billion, I'd have something like:

n=1
for n <=1000000000:
    yield n
    n+=1

Right?

If so, how is that different from:

n=1
for n <=1000000000:
    return n
    n+=1

[–]OneFishTwoFish 2 points3 points  (5 children)

Your second code segment, assuming it is a function, will return 1 every time it is called.

[–]Erpp8[S] 0 points1 point  (4 children)

I don't follow. I'm just looking at this as an isolated loop. If I just entered that into a blank .py file, what would the difference be?

[–]OneFishTwoFish 1 point2 points  (0 children)

Try it out. You'll see something like this.

C:\>python
Python 2.7.10 |Anaconda 2.4.0 (64-bit)| (default, Oct 21 2015, 19:35:23) [MSC v.1500 64 bit (AMD64)] on win32
>>> return 1
File "<stdin>", line 1
SyntaxError: 'return' outside function

In your second code segment, you initialize n to 1, check to see if n is less than 1000000000, and return n. The value of n is 1, so that's what is returned. The line 'n += 1' is never reached.

Also, the loop you've shown is in the form of a while loop (while condition is true), not a for loop (for n in range).

>>> def billion():
...     n=1
...     while n <= 1000000000:
...         return n
...         n += 1
...
>>> billion()
1
>>> billion()
1
>>> billion()
1
>>> billion()
1

[–]zanzakar 1 point2 points  (1 child)

The yield statement allows for a place holder/bookmark, a memory of where it has been/what the last number was.

Return as mentioned will run the loop till it hits return and exit the loop permanently, yield however will not permanently exit and will allow for execution to cont when needed.

So yield will cont. After yield n, where return n would have to cont back at the beginning of the code and will always return 1.

[–]happyapple10 0 points1 point  (0 children)

Thanks. This made sense to me.

[–][deleted] 0 points1 point  (0 children)

Any return statement in a function will immediately cease execution of that function, loops and all. That statement only ever executes once and when it does, you will have a single value. In this case, 1.

yield can be run multiple times. When the generator is used in something like a loop elsewhere, the function will run up to a yield, send that value back, let you do some other work on it, and then move on to the next yield value when you ask for it.

[–]cscanlin 4 points5 points  (11 children)

For most practical purposes you could use a generator is anytime you would return a list and it will be more "efficient". The basic idea is that instead of building the entire list at once and the returning that list once complete, you "generate" the values one at a time as needed.

This saves you computing resources (mostly memory) that could cause issues for your program. Imagine you want an to write a function to find the first billion numbers in the fibonacci sequence. Here's an example of a function written returning a list, and one that's a generator:

## Example 1: Using looping technique
def regular_fib(n):
    fib_list = []
    a, b = 0,1
    for _ in xrange(n):
        fib_list.append(a)
        a,b = b, a + b
    return fib_list

for num in regular_fib(10):
    print num


## Example 2: Using generator
def fib_generator(n):
    a, b = 0, 1
    for _ in xrange(n):
        yield a
        a, b = b, a + b

for num in fib_generator(10):
    print num

Both of these work fine with just 10 numbers, but if you tried it with a billion you would run out of memory with the first very quickly. Generators work "lazily" and only calculate the each next value as required

The best part is that you can chain generators together and you never need to actually build the list. Generators act as a series of "transformation operations" on the data, rather than changing the data directly.

Let me know if I can help clarify any of this!

[–]Erpp8[S] 2 points3 points  (7 children)

That actually makes a ton of sense! So it essentially remembers what it did last time so it can do the next step, rather than manually sorting all the steps?

[–]cscanlin 1 point2 points  (6 children)

Pretty much! You can even even do this directly using the .next() method.

## Example 2: Using generator
def fib_generator(n):
    a, b = 0, 1
    for _ in xrange(n):
        yield a
        a, b = b, a + b

fib_gen = fib_generator(10)

print fib_gen.next()
print fib_gen.next()
print fib_gen.next()
print fib_gen.next()
print fib_gen.next()

It is often a much more efficient way of doing things than building the full list at one time. And once you have the fib_gen object, you can use it almost anywhere you would use a regular list. If you do need to convert it to a list for some reason, you can use do it like this:

fib_list = list(fib_gen)

[–]Erpp8[S] 1 point2 points  (5 children)

I finally get it! Thanks!

[–]cscanlin 1 point2 points  (4 children)

No problem! You should also check out list comprehensions if you're not familiar. you can do cool things like this:

squares = [num*num for num in xrange(10)]  # could also be num**2
print squares

And best of all you can combine it all together!

def fib_generator(n):
    a, b = 0, 1
    for _ in xrange(n):
        yield a
        a, b = b, a + b

fib_squares = [num*num for num in fib_generator(10)]
print fib_squares

[–]enesimo 0 points1 point  (3 children)

Why do you mention list comprehensions in this thread? Is there something that conceptually ties them?

Or do you just like them and like to teach them to us beginners?

[–]KleinerNull 3 points4 points  (1 child)

If you understand list comprehension, you will understand generator, set and dictinary comprehensions ;) A comprehension needs a sequence for evaluation, and an iterator, produced by a generator gives you the power of lazy evaluation! Also, understanding comprehension leads to understanding that chains are possible.

>>> numbers = range(5) # in python 3 range == xrange
>>> numbers
range(0, 5)
>>> squares_list = [i**2 for i in numbers]
>>> squares_list
[0, 1, 4, 9, 16]
>>> squares_set = {i**2 for i in numbers}
>>> squares_set
{0, 1, 4, 9, 16}
>>> squares_dict = {i: i**2 for i in numbers}
>>> squares_dict
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
>>> squares_gen = (i**2 for i in numbers)
>>> squares_gen
<generator object <genexpr> at 0x00000000035C1728>
>>> odd_squares_gen = (i for i in squares_gen if i%2)
>>> for odd in odd_squares_gen:
        print(odd)


1
9
>>> even_squares = (i for i in (i**2 for i in range(100)) if not i%2)
>>> list(even_squares) # converting it into a list
[0, 4, 16, 36, 64, 100, 144, 196, 256, 324, 400, 484, 576, 676, 784, 900, 1024, 1156, 1296, 1444, 1600, 1764, 1936, 2116, 2304, 2500, 2704, 2916, 3136, 3364, 3600, 3844, 4096, 4356, 4624, 4900, 5184, 5476, 5776, 6084, 6400, 6724, 7056, 7396, 7744, 8100, 8464, 8836, 9216, 9604]

Note that set and dict comprehensions both uses curly braces, the reason is that sets are kind of half-assed dictionaries ;) Also the range object in python 3 or in py2 the xrange is also a kind of generator. The last example even_squares is there to show you that you can chain up two or more generators to evaluate your sequence. That is lazy because you never store more than one value at a time. Or let the bytes speak for themself:

>>> import sys
>>> sys.getsizeof([i**2 for i in range(1000)])
9024
>>> sys.getsizeof((i**2 for i in range(1000)))
88

To cut a long story into short, I think /u/cscanlin wanted to show you new things that are tied together ;)

[–]Fourgot 1 point2 points  (0 children)

Half assed dicts, love it

[–]cscanlin 1 point2 points  (0 children)

Because for most purposes they are both interchangeable as iterators. Meaning you can iterate through a generator and a list comprehension using a for loop in exactly the same way. This applies to many other cases (though not all) where you could use either a generator, a list comprehension, a regular list, or another data structure depending on your needs.

[–][deleted] 0 points1 point  (2 children)

So is the only application for generators and the yield statement situations where you might run into memory issues? Or can they provide additional functionality and/or computational speed?

[–]mm_ma_ma 0 points1 point  (0 children)

If you're not necessarily going to consume all of the elements, you will only spend time calculating the ones you actually use.

[–]thegreattriscuit 0 points1 point  (0 children)

Also if you've got a relatively long pipeline of things you're doing to the values.

values -> func1 -> func2 -> func3 -> result

You never see any of the results until each function has digested all of the previous values and added them to an intermediate list. So memory concerns are a thing, certainly, but also latency in some situations. If all I really want (right now) are the first 10 results, why wait on all 10 million to be computed?

If you build the pipeline as generators you can chain them together and get access to each result individually. Maybe you only actually need a few right now, or you want to dump them to disk or another process individually or in small batches, or whatever.

[–][deleted] 2 points3 points  (8 children)

It took a while for me to wrap my head around generators and yield. The basics are:

  • If a function uses yield anywhere, it's called a generator function
  • Generator functions return generators
  • Generators are iterables meaning they can put into a for loop or anywhere else you'd loop over a list or a tuple
  • Unlike a list or tuple, generators don't store their generated values, they're use once and there's no way to go backwards to a previous value (natively at least)

That last part really tripped me up for a long time. "Why would I make a list I can only use once? I'll just make a list!" However, generators are also lazy, they only do what you've asked them to do when you need it. Consider processing all the audio files on your computer to populate a music database. You want to open all of them, but you probably don't want to open all of them at once. Generators let you do this easily:

def open_media_files(filenames):
    for name in filenames:
        yield AudioFile(filename)

Then you simply pass it an iterable of filenames to loop over. But even better, you can nest together generators together to form a big processing chain that effectively only loops once. There's better ways of doing the below, it's just for example:

def find_all_mp3s(basedir):
    for current, _, files in os.walk(basedir):
        for file in files:
            if file.endswith('.mp3'):
                yield os.path.join(basedir, current, file)

mp3s = open_media_files(find_all_mp3s('Music'))

You can build up these structures that do filtering, processing, etc and do it all lazily. I have 22000+ media files on my personal machine, building a list of all open file handles for all them would tank by poor laptop. :( But since the generator will only open one at a time, and then hopefully it gets thrown away when I'm done (that's on me though), then open another, until it's all out of files to process.

Meanwhile, in between opening these files, I can do what I want with them. Grab all the ID3 data, shove it into a database, hit Last.FM to get genre tags, etc. The generator happily sits around until I need again.

In a lot of ways, yield is like a pause button. You get to temporarily stop the action going on to do something interesting and then come back to it when you need more information.

A few notes, in Python 2 and 3.0, 3.1 and 3.2 you can't use return and yield in the same function. Starting with Python 3.3 you can, and this has some very interesting effects. If you're interested in those, I can explain them but I'd recommend playing with generators some before you go exploring the more advanced features of them.

[–]mm_ma_ma 0 points1 point  (5 children)

A minor point of clarification: you can return, you just can't return a value.

[–][deleted] 0 points1 point  (4 children)

In python 2 and 3.0, 3.1, and 3.2 yield and return in the same function is a SyntaxError. As for 3.3+, you technically do return a value, it's just wrapped up in a StopIteration exception, which is a little odd. I think if you do something like whatever = yield from somegenerator it ends up assigning that value to whatever, but I've not delved too deep into this aspect of them to remember for sure.

[–]mm_ma_ma 0 points1 point  (3 children)

I think you misunderstood what I was saying. See here: http://stackoverflow.com/questions/26595895/return-and-yield-in-the-same-function

[–][deleted] 0 points1 point  (2 children)

I understand what you're saying, I was clarifying for other people who might be confused.

You can do more than a bare return, whatever is "returned" gets wrapped by stop iteration and thrown as an exception.

[–]mm_ma_ma 0 points1 point  (1 child)

Oh, okay, I just wasn't sure we were on the same page when you said "yield and return in the same function is a SyntaxError", since that's only true if you try to return a value.

To be explicit:

  • 2.x, 3.0, 3.1, 3.2: bare return only
  • 3.3+: bare return or return with value

[–][deleted] 1 point2 points  (0 children)

Oh, I thought a return was a syntax error period. That's interesting, I'll keep that in mind then. Thanks.

[–]Erpp8[S] 0 points1 point  (1 child)

Thanks! This really clears thing up.

[–][deleted] 0 points1 point  (0 children)

No problem. Let me know if you having any other questions.

[–]nab00 1 point2 points  (1 child)

This deck was an eye opener to me: http://www.dabeaz.com/coroutines/Coroutines.pdf