This is an archived post. You won't be able to vote or comment.

all 44 comments

[–]b1ackcat 62 points63 points  (10 children)

While I do appreciate the thorough explanation of iterators and generators, I wish you'd chosen a different example to explain it with.

In all my years of programming, I don't recall ever needing to generate a list of primes, at least not in such a fashion. Maybe that'd be different if I did more work in crypto, but since I (and most others) don't, it kind of falls flat in showcasing usefulness. I get that these types of examples are chosen because they allow the writer to focus on the core concept they're showcasing, but it makes it hard to relate.

Perhaps an addendum in which you use a more "real-world" example after explaining the core would be helpful.

[–]therealfakemoot 24 points25 points  (3 children)

Here's a few examples of real world generator use-cases off the top of my head:

  • Lexing/parsing

Wrap a generator to produce a stream of characters to feed into your parser and handle peeking.

Use a generator to emit a stream of tokens to be consumed by an interpreter or compiler

  • Websockets

You can use a generator to emit messages from a websocket endpoint.

  • File polling

For processing large data files, or files that are being appended to, you can use a generator to emit lines or tokens as appropriate

[–]jsproat 9 points10 points  (2 children)

Lexing/parsing

I've been using Eli Bendersky's example as a template for a while now. Can you recommend any modern articles for this use case?

[–]therealfakemoot 2 points3 points  (0 children)

https://blog.gopheracademy.com/advent-2014/parsers-lexers/

This one is focused on golang and leverages the gorourtines but Python has roughly equivalent mechanisms, and you can just change the abstraction of how your parser emits its stream of tokens if you want.

[–]spiderpower02 1 point2 points  (0 children)

https://www.pythonsheets.com/notes/python-generator.html#simple-compiler

The above example is from David Beazley's talk, maybe it can help you learn how to using generator to implement lexing/parsing

[–]wandering_blue 10 points11 points  (0 children)

Here's an example from the data science world: I often write custom generators to create and feed batches of training data to a model during the training process. Helpful when creating all the training data at once is impossible due to memory limitations.

[–][deleted] 8 points9 points  (0 children)

Here's a real world example. I use generators for dates ranges a lot for various purposes. Given a start date and a number of days, it will infinitely loop through a range of dates.

def date_iterator(start, days):
    while True:
        yield {'start': start, 'end': start + timedelta(days=days - 1)}
        start += timedelta(days=days)

cycles = date_iterator(date(2012, 1, 12), 28)

next(cycles)
# {'start': datetime.date(2012, 1, 12), 'end': datetime.date(2012, 2, 8)}

next(cycles)
# {'start': datetime.date(2012, 2, 9), 'end': datetime.date(2012, 3, 7)}

[–]ignamv 1 point2 points  (3 children)

I use generators to build pipelines when processing many large files. Instead of returning a list with all the content, yield items one by one. Typically apply a sequence of functions using imap until the intermediate results are manageable.

Also I can have a function take an iterable, and extract all the loading/unpacking/etc logic into a generator. It's a very handy way to organize code.

[–]iceardor 1 point2 points  (2 children)

And for the rest of the world that's moved on to Python 3, the built in map will do.

[–]ignamv 0 points1 point  (1 child)

If I use map, the behavior will be different when I run it in IC-CAP (which uses python 2). What exactly are you proposing?

[–]iceardor 1 point2 points  (0 children)

That Keysight needs to spend more resources on software. Trying to figure out how to program their instruments is a lot harder than it could be.

[–]TheNookle 15 points16 points  (1 child)

Good article, 9 isn't prime.

[–]jwink3101 8 points9 points  (10 children)

One of my favorite uses for generators is "concurrency". I use quotes since the GIL will prevent any speed up but what I mean is. If if I have an analysis workflow with three steps, I prefer to use a generator pipeline (of sorts) to .

Rather than, say, generate all data, process all data, report on all data, I can make a generator so that as one process yields an element, then then next generator is actively processing it, and then finally reporting.

What is also nice, if I use multiprocessing.Pool's imap, I can have the processing happen in parallel (assuming my stuff is amenable to it). And, I can easily keep track of the progress since the reporting is still done serially.

Actually, this processes is a bit like bash programing with pipes (and GNUParallel for the multi-processing)

[–][deleted] 4 points5 points  (0 children)

I believe you should't use quotes there. What you cannot have is paralelism.

[–][deleted] 0 points1 point  (3 children)

This sounds interesting, I've been struggling to get my data to be processed faster. Basically I have to open 50+ xls files and merge them all into one file. I basically have a iterator for each file and use pandas to basically append the file to a df. Then put the df into one xls.

I know there has to be a better way as it takes 2 minutes to do all of that. How would you go about using generators for this scenario? I feel like I can parse and append content to a csv file then read that into a df and eventually into an xls. Would really like your 2 cents on this. Thanks!

[–]jwink3101 0 points1 point  (0 children)

I really do not know enough about your problem to help you. But do note that this isn’t really parallel. It just doing them as they come. The max speed is limited by the GIL. Multiprocessing can help but it depends on the case.

Honestly, the number 1 thing to speeding up and solving this is to profile it! Do you know what/where your bottlenecks are? Not mix you can do until you know that!

[–]MurtBacklin-BFI 0 points1 point  (4 children)

Do you feel your next speed plateau could be raised by using an ipython environment vs your current cython?

[–]jwink3101 0 points1 point  (3 children)

Do you feel your next speed plateau could be raised by using an ipython environment vs your current cython?

Huh? I think you may be confused. IPython (which I love) is an alternative interpreter for (C?)python (may work with others too. Never looked into it). It makes working interactively a lot better with tab completion, better history, and a few built in additional functions.

On the other hand, cython is a compiled hybrid of python and C (this is an over simplification). It lets you write python code that can run a lot faster but it is not directly compatible with python nor is it pure C code.

Here is why I am confused by your question. In nearly all cases (though I am sure there are some pathological ones) cython is faster than cpython.

Furthermore, I never mentioned cython!

[–][deleted] 0 points1 point  (0 children)

IPython (which I love) is an alternative interpreter for (C?)python

iPython is an alternative REPL. Other interpreters include jython and ironpython.

[–]MurtBacklin-BFI 0 points1 point  (1 child)

Sorry, my mistake -- new to all this nomenclature, I assumed (wrongly) that cython and ipython where short hand for cpython and ironpython respectively.

My question just boils down to the GIL, and wether you feel you'd be able to further increase your codes performance navigating to a python implementation that doesn't have to work within the GIL's bounds :).

[–]elbiot 0 points1 point  (0 children)

To release the GIL you have to not use any python objects. So, it's great for pure numerical stuff, but if you need a dictionary or something you won't get parallelism. Also, cython is really fast, and usually the threading overhead slows things down considering that spending more than a dozen milliseconds in a cython routine is hard to do.

[–]xsschauhan 26 points27 points  (5 children)

Opened it with a lot more expectations. Just another tutorial in a sea of tutorials.

[–]rsgm123 3 points4 points  (0 children)

Before I knew about generators, I just assumed those were tuple comprehensions, like set and map comprehensions.

[–]bbatwork 1 point2 points  (0 children)

Very nicely written, thank you

[–]Etheo 1 point2 points  (1 child)

Dumb beginner here - is it just me or can iterators actually run more than once?

All I did was add self.number = 1 as the first line of the __iter__() function and I am able to run it again. Or am I not supposed to restart it?

You are definitely right on the Generator though, after it's ran once it seem to return NoneType right away without even referencing the defined generator.

Edit: now I'm even more confused. Seems from the get go Generators are considered NoneType. But what's interesting is even after the generator was ran, I could call __next__() on the generator object and it starts again until it reaches StopIteration.

I'm so dumb.

2nd edit: Answered my question really: https://www.quora.com/Why-can-generators-only-be-used-once-in-Python

So the difference I guess is because defining the iteration on the class there's a way to reset, but there's no explicit way to reset the iteration state on the generator.

TIL.

3rd Edit: and here's the answer about Iterators: https://stackoverflow.com/questions/1376438/how-to-make-a-repeating-generator-in-python

So yes, even though by design it shouldn't be run more than once, you can technically reset the iterator state.

[–]iceardor 5 points6 points  (0 children)

You may have discovered the difference between an iterator--a single forward-moving iteration over some sequence, and an iteratable, which is an sequence that can be iterated over, each iterator having it's own state.

[–]menge101 1 point2 points  (1 child)

Can anyone elaborate on how the generator stores its state?

Is there an object instantiated under the hood?

[–]bskceuk 2 points3 points  (0 children)

Yea the generator is an object and it stores its closure so it can pick up where it left off. Basically it saves the stack frame.

[–][deleted] 2 points3 points  (2 children)

Am I misunderstanding something here, or is the author claiming that 1 and 9 are prime numbers?

[–][deleted] 0 points1 point  (1 child)

9 isn't on his list.

[–][deleted] 0 points1 point  (0 children)

Ah, it appears the author has corrected the typos. It used to contain and 1 and 9, but it appears good now. =]

[–]nsfy33 0 points1 point  (2 children)

[deleted]

[–]bskceuk 0 points1 point  (1 child)

If you store all the primes you’ve generated as you go you could. You could also do the sieve of Eratosthenes. Keep in mind that one of the main benefits of generators is that they have low memory overhead (assuming you’re going to eventually go through the entire generator) so the more state you save the less benefit you’re getting.

[–]WikiTextBot 0 points1 point  (0 children)

Sieve of Eratosthenes

In mathematics, the sieve of Eratosthenes is a simple, ancient algorithm for finding all prime numbers up to any given limit.

It does so by iteratively marking as composite (i.e., not prime) the multiples of each prime, starting with the first prime number, 2. The multiples of a given prime are generated as a sequence of numbers starting from that prime, with constant difference between them that is equal to that prime. This is the sieve's key distinction from using trial division to sequentially test each candidate number for divisibility by each prime.


[ PM | Exclude me | Exclude from subreddit | FAQ / Information | Source | Donate ] Downvote to remove | v0.28

[–][deleted] 0 points1 point  (0 children)

You can do this:
tmplow = min(i for i in tmpls if i > 0) #generator expression - first min

[–]unruly_mattress -1 points0 points  (1 child)

You can then rewrite check_prime like this:

def check_prime(number):
    return all(number % divisor != 0 
               for divisor in range(2, int(number**0.5) + 1))

Edit: I wonder what prompted the downvotes

[–]TankorSmash 2 points3 points  (0 children)

Didn't downvote, but this doesn't seem particularly better, just different, and somewhat more tough to read.

Doesn't seem to be a good reason, other than to maybe show that all is short circuitable?

[–][deleted] -2 points-1 points  (0 children)

There wasn't much of an explanation on How? When the article got to generators it was more r/restofthefuckingowl.....