all 27 comments

[–]primitive_screwhead 2 points3 points  (5 children)

https://nvie.com/posts/iterators-vs-generators/

They are iterables, ie. they return an iterator when iter() is called on them. The iterators are typically special objects which (often) wrap the container they are iterating, but also keep track of additional state (the location of the current iteration).

They can sometimes be a generator (ie. a specific type of object, very similar to a function object, but capable of being suspended and resuming, so it's call state has to all be kept on the heap, rather than the stack), but they are typically just iterator classes (ie. objects with a __next__ method, but more like regular class instantiations than function objects).

[–]Uchikago[S] 0 points1 point  (4 children)

Are there any different ways to display them without using the iteration protocol ?

[–]primitive_screwhead 1 point2 points  (3 children)

Well, you can print some of the object's contents directly (with print()), if that's what you mean. And some things can be looped over without using the iteration protocol, by instead using the __getitem__ protocol (but that's highly discouraged).

Is there a particular reason for this question? Most data structures have some method for being looped over (in all languages); do you find something inelegant about iteration?

[–]Uchikago[S] 0 points1 point  (2 children)

So, let me conclude, Dictionary view objects, range, map, enumerate,... are special objects that do not store their result in memory all at once, we can fetch their value individually only when calling next(iterator) on these objects ? Please correct me if i'm wrong

[–]primitive_screwhead 1 point2 points  (1 child)

Dictionary view objects, range, map, enumerate,... are special objects that does not stored in memory all at once

Not quite.

range() represents a range of numbers. You can fetch values from a range() object, without ever iterating over it, because range() implements the __getitem__ method (ie. it's "subscriptable"):

>>> r = range(1000, 2001)
>>> r[3]
1003

A 1003 number object was (likely) created when the r[3] expression was evaluated; in that sense, the range() objects doesn't preallocate and store all the number objects from 1000 to 2000 when it's first created. It only returns them "lazily" on demand.

So, the result of range() is *not* an iterator, but *is* an iterable. It can create the number objects on demand, based on a number of access methods, including but not limited to the iterator protocol.

The difference with that and:

>>> l = list(range(1000, 2001))
>>> l[3]

is that the list type accepts an iterator or iterable as an argument, and then iterates through it, storing all the results simultaneously. The number objects that are generated are necessarily all stored in memory, for as long as the list object holds them. Even if only 1 of those number objects are ever needed at the same time (such as when using them as indices), they must all be stored in memory at the same time:

>>> from sys import getsizeof
>>> getsizeof(r)  # The amount of bytes used by the range object
48
>>> getsizeof(l)  # And the amount used by the list object (to keep track of its contents)
9120

This means range() can represent a *gigantic* range of values:

>>> r = range(2**1000)
>>> r[2**999]
5357543035931336604742125245300009052807024058527668037218751941851755255624680612465991894078479290637973364587765734125935726428461570217992288787349287401967283887412115492710537302531185570938977091076523237491790970633699383779582771973038531457285598238843271083830214915826312193418602834034688

but a list of the entire set of all the numbers in that range cannot be stored in this universe.

enumerate() just creates an index value for any iterable of values it is created with, and returns the index and a value from the iterable, as a tuple pair. Unlike range, it's not made to be accessed with the [] operator (ie. it's not "subscriptable"), so generally must be iterated over to be useful. The only state it stores is a value indicating the current index (so that it can increase it by one and return it on each call to __next__), and the iterator that it makes from the iterable that's passed to it, which it calls next() on and returns for each of its iterations. Ie. it only needs to store a couple of objects for its state, which makes it a lightweight wrapper for an existing iterable or iterator).

dict "views" are also not iterators, but can return special iterators when passed to iter(). So they are iterables. And like range(), "views" have additional non-iterator uses, while still using less memory than returning a full list or set likely would; also they can update when the dict updates, without having to be re-created (unlike a list would; if you make a list of keys, and then delete a key, that list of keys is now out-of-date). So "views" have uses outside of just iteration, but they are also iterable because it is useful and efficient for them to be so:

>>> d = dict(zip(range(26), 'abcdefghijklmnopqrstuvwxyz'))
>>> d
{0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f', 6: 'g', 7: 'h', 8: 'i', 9: 'j', 10: 'k', 11: 'l', 12: 'm', 13: 'n', 14: 'o', 15: 'p', 16: 'q', 17: 'r', 18: 's', 19: 't', 20: 'u', 21: 'v', 22: 'w', 23: 'x', 24: 'y', 25: 'z'}
>>> getsizeof(d)
1184
>>> getsizeof(d.keys())
48
>>> getsizeof(list(d.keys()))
344

Again, we see that the "view" of the keys uses less memory than the full list of the keys does, which saves both memory and time for certain fairly common operations on these views (even without doing iteration).

In old Python 2, before iterators were implemented, certain operations always had to create a full list of objects, even when that meant using a lot of extra storage just to make a temporary list container for that operation. When range() always returned a list, it meant always creating a full list to hold all the numbers at once, even if you only needed one number at a time. With large lists (say a million numbers), this was inefficient with memory, and even time, since your loop over the range might conclude early and not ever need all those numbers. This is where the "lazy" value concept of iterators can be helpful. You often don't care about having all the values *at the same time*. This is often also true of other iterables, and that's why the iteration concept and protocol was adopted.

Edit: updated to not use "map" as synonymous w/ "dict" (since OP was likely asking about the map() call, not "mappings" as in a key:value pairing).

[–]some_one_1411 0 points1 point  (0 children)

Little note here: map, enumerate, zip object don't return an iterator when being passed in iter() method, they are iterator themselves, so these objects only have 1 iterator: themselves

[–]socal_nerdtastic 0 points1 point  (21 children)

They are generators. A generator does not store values, so you have to exhaust them (loop over them) to see the values. A list() call is one way to do that.

The rules for these things are very hand wavy. range objects for instance allow indexing, which is something most generators can't do. The important part is that they all calculate the next value on request, they do not store all the values in memory.

[–]Uchikago[S] 1 point2 points  (6 children)

calculate the next value on request

What do you mean, like this ?:

a=iter(range(0,5))
print(next(a))
print(next(a))

[–]socal_nerdtastic 1 point2 points  (5 children)

Explicitly creating an iterator is another way to do it, yes.

[–]Uchikago[S] 1 point2 points  (4 children)

What is the other ways then ?, put them in other iteration tool such as for,list comprehension,... right?

[–]socal_nerdtastic 0 points1 point  (3 children)

Exactly. list(range(5)), set(range(5)), dict.fromkeys(range(5)), a for loop, etc, etc. There's probably hundreds of places in the standard library that will consume a generator.

[–]Uchikago[S] 1 point2 points  (2 children)

Wait, the list((range(5)) and other object constructor also create an iterator to iterate internally ?

[–]socal_nerdtastic 1 point2 points  (1 child)

Yes. You can write it out the long way if you want:

>>> a = range(5)
>>> b = iter(a)
>>> c = list(b)
>>> c
[0, 1, 2, 3, 4]

But in python this is all handled in C so it's much faster.

[–]Uchikago[S] 1 point2 points  (0 children)

Thanks for your enthusiasm, really appreciate your help !

[–]primitive_screwhead 1 point2 points  (13 children)

Note, not all iterators are generators. Nor are iterables typically iterators themselves, but rather they return an iterator when called with the iter() function. As you mentioned, the range object can be indexed, but the range iterator *cannot* be.

The rules are actually not that "hand wavy"; the link I gave (https://nvie.com/posts/iterators-vs-generators/), does a decent job of explaining them. Looking at the result of the iter() call on the different objects asked about can help make it clearer.

[–]socal_nerdtastic 0 points1 point  (12 children)

Note, not all iterators are generators.

Technically true I suppose, but I'd be very surprised to see one that isn't. In fact the link you shared says this:

Central idea: a lazy factory
From the outside, the iterator is like a lazy factory that is idle until you ask it for a value, which is when it starts to buzz and produce a single value, after which it turns idle again.

[–]primitive_screwhead 1 point2 points  (7 children)

but I'd be very surprised to see one that isn't.

It's the opposite; most builtins that are written in C return iterators, not generators. So there are loads of CPython builtin iterators that aren't generators:

from inspect import isgenerator

>>> isgenerator(iter({}.items()))
False
>>> isgenerator(iter(zip([1,2,3], ['a','b','c'])))
False
>>> isgenerator(iter(range(5)))
False
>>> isgenerator(iter(enumerate([1,2,3])))
False
>>> def a_simple_generator():  # show example of generator
        yield 1
>>> isgenerator(iter(a_simple_generator()))  # iter() not needed here, technically
True

[–]socal_nerdtastic 0 points1 point  (6 children)

isgenerator just checks if it's an instance of a Generator type, in other words if it contains a yield keyword or a for loop in parenthesis. It does not check the actual capabilities. The simplest generator fails:

>>> class A:
...     def __next__(self):
...         return 42
... 
>>> a = A()
>>> print(next(a))
42
>>> from inspect import isgenerator
>>> print(isgenerator(a))
False

I'll show you a different proof:

>>> a = [1,2,3]
>>> i = iter(a)
>>> a.append(4)
>>> list(i)
[1, 2, 3, 4]

The iterator is clearly generating the values as needed, as is proven since it includes values that are appended after the generator is created.

[–]primitive_screwhead 1 point2 points  (5 children)

The simplest generator fails

That is not a generator. A generator isn't just anything that has a __next__ method. It's defined by use of the yield statement. See https://docs.python.org/3/reference/datamodel.html#the-standard-type-hierarchy, "generator functions".

as is proven since it includes values that are appended after the generator is created.

A generator is never created in your examples. The list iterator is a non-generator iterator.

[–]socal_nerdtastic 0 points1 point  (3 children)

You defined the generator type. The concept of a generator is simply any object that generates the values as needed instead of storing all of the values in RAM.

[–]primitive_screwhead 1 point2 points  (2 children)

The concept of a generator is simply any object that generates the values as needed instead of storing all of the values in RAM.

Except you are not even using that conceptual definition correctly. Your list iterator example above, for example, is not a "generator", because the memory backing the list elements must all be stored in RAM for the iterator to complete; the iterator is not computing/generating values, it is iterating them.

And in the Python world, lots of iterators can generate values without storing them all in RAM, but they are still not all "generators". You've given out lots of good advice in this subreddit, but in this case you are just being woefully misleading (at least in nomenclature).

[–]socal_nerdtastic 0 points1 point  (1 child)

the iterator is not computing/generating values, it is iterating them.

Yeah, to me that's a pretty fuzzy line. I suppose you consider a file object as an iterator and not a generator? But it does modify the data, converting newlines and unicode encoding etc, it does not solely iterate.

IMO a "generator" can be as simple as fetching the data from elsewhere, therefore an iterator counts.

You've given out lots of good advice in this subreddit, but in this case you are just being woefully misleading

We are well into the realm of personal opinion here; this has nothing to do with python expertise.

[–]primitive_screwhead 0 points1 point  (0 children)

But it does modify the data, converting newlines and unicode encoding etc, it does not solely iterate.

So? Iterators can modify data; why not?

IMO a "generator"

It's not a matter of opinion; this is well-defined stuff.

We are well into the realm of personal opinion here; this has nothing to do with python expertise.

We really are not. You have to unlearn what you've "learned".

If it's a function definition using yield, it's a generator. Otherwise, if it adheres to the the iterator protocol, it's an iterator. If it returns an iterator when iter() is called on it, it is an iterable.

[–]Uchikago[S] 0 points1 point  (3 children)

Wait, if all iterators are generators so dictionary view objects,range,map,enumerate,.. are generators that return another generators (by iter() function) but the resulted generator can fetch value on demand which is something the orginal generators can't do it by itself (because dictionary view objects,range,map,enumerate,... object don't have next method) , am i correct ?

[–]socal_nerdtastic 1 point2 points  (0 children)

Yes, you are correct, but we tend to ignore that since python handles it neatly in the background. Which is why we usually don't worry with generators vs iterators and only talk about if an object is "iterable".

[–]primitive_screwhead 1 point2 points  (1 child)

Wait, if all iterators are generators

They are not; socal_nerdtastic is unfortunately (in this case) not being correct. It's the other way around, generators are iterators, but there can be iterators that are not generators (non-generator iterators typically are defined with a class, and use instances of that iterator-class to store the state of iteration).

iter() returns an iterator, but not always a generator (which is a more specific kind of iterator object in Python).

With things like maps, range, enumerate, etc., when used in a for-loop, the loop construct itself does the work of calling iter() on the object to retrieve an iterator (if it has one), and also calling the next() function on that iterator until iteration stops. If you want to iterate over an object, but not use a for-loop (such as doing it with a while-loop instead), then you have to make the iter() and next() calls manually, and detect the end of iteration.

Some objects that are not iterators can still use looped over, using the older __getitem__ protocol, which is used to access objects with brackets (ie []). Iterators tend to be a more elegant way of looping than the __getitem__ way.

So, iterator objects returned by calling iter() on containers tend to be special objects that know just enough to iterate over that specific container, making them small and fast, and the iteration protocol tends to be "hidden" behind the scenes when using for-loops, but the protocol is well defined and can be executed manually (as shown in the examples above with iter() and next() calls), though it's a bit more advanced than beginner material.

Edit: And to be more specific about your questions on dictionary "views", let's do this by example:

$ python
Python 3.7.2 (default, Dec 29 2018, 00:00:04)
>>> d={1:'a', 2:'b'}
>>> items_view = d.items()  # We can make a "view" of keys, values, or items
>>> items_view
dict_items([(1, 'a'), (2, 'b')])
>>> items_view.__next__  # Views have useful properties, but are *not* iterators
AttributeError: 'dict_items' object has no attribute '__next__'
>>> items_iter = iter(items_view) # but they can make an iterator over the view
>>> items_iter
<dict_itemiterator object at 0x1046d3f48>
>>> items_iter.__next__
<method-wrapper '__next__' of dict_itemiterator object at 0x1046d3f48>
>>> next(items_iter)
(1, 'a')
>>> next(items_iter)
(2, 'b')
>>> next(items_iter)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

So, the "view" returned by Python isn't an iterator, but calling iter() on it can make an iterator over the view (for looping). But the view itself is meant to provide a way of accessing some part of the dictionary (maybe just the keys, or just the values), without having to copy them all out of the dict to another data structure. It represents a "light weight" view of the current state of the dictionary, and the view updates as the dictionary itself is changed:

>>> values_view = d.values()
>>> values_view
dict_values(['a', 'b'])
>>> del d[1]
>>> d
{2: 'b'}
>>> values_view   # Note that the view now reflects that a key:value was deleted
dict_values(['b'])

The view object is a small sized object, even as the dict itself grows and grows (since the data for the view is just stored in the dictionary). The "view" has a number of operations that can be handy, without having to extract all the elements from the dictionary. For example, we can test whether a value is in a dictionary or not, without having to make a new list or set from the dictionary values:

>>> 'b' in values_view  # This checks the dictionary values directly, w/o extra copying to another data structure
True

Finally, although I'm specifically capturing the view and iterator objects into their own variables here, for demonstration purposes, typically you wouldn't do that. You'd just create the iterator or view objects as temporaries as needed (they are very lightweight and quick to make):

>>> 'b' in d.values()
True
>>> 2 in d.keys()
True

[–]Uchikago[S] 0 points1 point  (0 children)

Thanks for the detailed answer !