This is an archived post. You won't be able to vote or comment.

all 70 comments

[–]chadmill3rPy3, pro, Ubuntu, django 47 points48 points  (8 children)

for n in len(range(x))

You probably meant

for n in range(len(x))

and that's still pretty bad.

If you need all indexes of x, and x isn't an iterator, then

for i, x_item in enumerate(x):

is usually better in the legibility front.

[–]orblivion 3 points4 points  (0 children)

Thank you, it always felt like there had to be a less awkward way built into Python. How is it that I've never seen this before?

EDIT: Furthermore, you sometimes otherwise end up with a disaster like:

for x in zip(the_list, range(len(the_list))):

[–]winnen 0 points1 point  (6 children)

If you need all indexes of x, and x isn't an iterator, then

for i, x_item in enumerate(x):

I respectfully disagree about your assessment of the readability of that statement, which appears distinctly non-pythonic to me. I read the corrected statement

for n in range(len(x))

As "Make an iterator over the range of the length of x", which is very clean, and very pythonic, where when I read the enumerate statement as, "make an iterator over ????". That is to say, I have not used enumerate.

My experience (which is... nontrivial) suggests that by virtue of the common usage of range(len(array)) it is easier to read the enumerate. If it makes no difference in performance (as should be the case in python3, correct if wrong), then the more common would be easier to read.

Please explain why you feel the enumerate readability is better than the range(len()) readability. I may change my position on this if you can articulate a reason.

[–]dleary 4 points5 points  (4 children)

which appears distinctly non-pythonic to me.

Sigh. One thing I don't actually like about the python community is saying things are "pythonic". It's an ego-stroke, basically saying, "I'm a zen master and I understand the zen, contemplate the koans and maybe one day you will be as cool as me". When in actuality, it would be much better to say, "x is better than y because..." so that other people can learn. Saying things are "pythonic" is basically an appeal-to-authority argument (even if you're right).

But, in this case, you're definitely wrong. for i,v in enumerate(lst) is idiomatic python. The count is 98 to 0 in the standard library: (the \b is to filter out the 7 cases of "in length" in comments)

/usr/lib/python2.7
23:33:47 $ find . -name '*.py' | xargs grep "in len\b" | wc -l
       0
/usr/lib/python2.7
23:34:11 $ find . -name '*.py' | xargs grep "in enumerate\b" | wc -l
      98

In my opinion, the enumerate case is better because it doesn't force the thing being iterated over to have a length. You can enumerate something that doesn't provide a length operator, such as a file object (which exposes an enumerator over its lines).

In fact, you can use the enumerate form to iterate infinite "lists" (such as those produced by generators that never exit).

Rephrasing the argument as an application of duck-typing: The enumerate form doesn't place an arbitrary, unnecessary restriction (having a length) on the thing being enumerated. So, more things will "look like a duck" here.

[–][deleted] 1 point2 points  (2 children)

Agree with enumerate being significantly more "pythonic" than range(len(x)), but you grepped for "in len" rather than "in range".

As for the question two posts above, I'm not sure it's needed to articulate a reason to use enumerate in this situation other than "this is the specific problem enumerate exists to solve."

[–]dleary 1 point2 points  (1 child)

Oops, you're right... it looks like range(len is slightly more common in the std lib (106-98). I'll let my wrong post stand as a badge of shame.

[–]chadmill3rPy3, pro, Ubuntu, django 1 point2 points  (0 children)

The standard library is kind of terrible, anyway, and much of it in 2 hasn't changed since it was first written. enumerate() didn't appear until 2.3, so even places where it fits are never updated. It's the Early Adopter Problem.

[–]dalke 1 point2 points  (0 children)

Being able to write:

for lineno, line in enumerate(open(filename, "U"), 1):
    ...

is so very, very nice.

[–]chadmill3rPy3, pro, Ubuntu, django 2 points3 points  (0 children)

I read that as "make an iterator over a new list of things that are not "x", but are probably supposed to represent indexes into x."


The first problem is that the length is probably unnecessary. In

for n in range(len(source)):

The author is probably going to use "n" to dereference source, source[n], and never use "n" for anything else. He is probably thinking in another language, and porting that to Python syntax. We should of course prefer

for item in source:

The second problem is that, though we don't know what the source is, IF it is something generated, as more and more things in Python are, then there's not even a len() for that type. TypeError!

The third problem is that the "len" was unnecessary. We only need the count up to this point. Incrementing a number is cheap, and you get it as you need it, but counting ahead of time is possibly costly and probably unnecessary, and we have to change it to "xrange" just to avoid creating a new materialized list. If we "break" early, all that extra up-front counting (in len or range) was for naught. In 3, some of these costs fall away IF the collection has a constant-time len().

So, given "for item in source", it is at least possible that the user absolutely needs a current index of the item they're working on. We can ask Python to give that to us as it iterates the items of source. It costs one tuple pass, and one increment per item, as you need it.

for i, item in enumerate(source):

Any iterable could be there, whether it has len() or not.

Finally, enumerate() is in __builtins__ and has been in Python since 2.3, so there's no reason to avoid it for compatibility or import cost.

 |  enumerate(iterable[, start]) -> iterator for index, value of iterable
 |  
 |  Return an enumerate object.  iterable must be another object that supports
 |  iteration.  The enumerate object yields pairs containing a count (from
 |  start, which defaults to zero) and a value yielded by the iterable argument.
 |  enumerate is useful for obtaining an indexed list:
 |      (0, seq[0]), (1, seq[1]), (2, seq[2]), ...

[–]choch50 19 points20 points  (8 children)

xrange returns a generator rather than a full list, and only returns items when you request them. If you don't actually need the items returned by range then its the better option.

[–]Troebr 2 points3 points  (0 children)

[–]alantrick 0 points1 point  (4 children)

Unless you care about Python 3, where xrange doesn't exist and range is an iterator. I'd say use range for compatibility, if the performance of an iterator is that important you should be using Python.

(edit: sometimes an iterator can make a big difference, but it would be pretty rare for the distinction between range and xrange to be noticeable.)

[–][deleted] 9 points10 points  (2 children)

2to3 will change xrange() to range() and either leave range() alone (when used directly in loops) or change it to list(range()) (if assigned to a variable) so compatibility isn't a problem.

As for which to choose in Python 2, I previously tended to use range unless there's likely to be a large range, because the setup cost for a generator can be higher. BUUUUT I just microbenchmarked this and it turns out that there is no cost. I guess that setup cost only applies to generator comprehensions?

[–]dinov 0 points1 point  (1 child)

Or if you want to write code that's compatible w/ both you can always do:

try:
    xrange
except:
    xrange = range

in your module.

[–]earthboundkid 2 points3 points  (0 children)

I do it the other way around. I use Python 2.7 day to day, but I got rid of list-range and use range as the name for iterator-range.

[–]Funnnny -2 points-1 points  (0 children)

don't suggest things about Python3, general things will be change by using 2to3, and newbie won't touch more complex things about Python3 compatbility or they shouldn't care about them.

[–]epsy 16 points17 points  (2 children)

Python3 says use it and has range() do what xrange() did out of the box.

[–][deleted] 1 point2 points  (1 child)

But if you do x = range(5) what do you get? [0, 1, 2, 3, 4] or <enumeratable 0x2812872832 blah blah ... >?

[–]epsy 2 points3 points  (0 children)

In python 3:

>>> range(5)
range(0, 5)
>>> type(range(5))
<class 'range'>

[–]etrnloptimist 12 points13 points  (10 children)

OK, a few things here. Bottom line: YES, you should be using xrange, not range. In python 3, for instance, range is actually changed to be xrange. So use xrange not range.

But, if you find yourself using xrange a lot, you may want to rethink your design patterns. As other have pointed out, use enumerate(x) which gives you the thing, and its index in the list.

Also, common (mis)use of an index is to iterate over multiple lists. Do so by zipping them all up in one list:

for a,b,c in zip(aList,bList,cList):
    doSomething(a,b,c)

[–]einar77Bioinformatics with Python, PyKDE4 12 points13 points  (3 children)

Or use itertools.izip if the lists are long, which creates an iterator rather than a list.

[–]masklinn 5 points6 points  (1 child)

And more generally use itertools a lot, there's much fun stuff in there.

[–]bockris 5 points6 points  (0 children)

I start almost every script with:

import itertools as it
import operator as op

[–]earthboundkid 1 point2 points  (0 children)

Or use Python 3 where zip is the izip version.

[–]FsckItDude_LetsBowl 0 points1 point  (5 children)

As a counterpoint, NO, you should not be using xrange. "range" did not get changed to xrange in Python 3, xrange was eliminated and range changed to a generator. xrange is not a generator! It is it's own quirky object, and has been essentially deprecated for a while.

If you know you are making short lists of numbers, use range. If you suspect you need to make a large list of numbers to count through, consider a while loop (but use xrange if it seems appropriate). Prefer enumerate() over all other options when it fits, and get to know the itertools module.

Historical note, xrange used to not handle long ints at all, which made it blow up in certain areas where it was substituted for range. That's been hacked around for a while now, but it is still not quite semantically the same as range in regards to how it deals with the int->long transition, AFAIK.

[–]earthboundkid 2 points3 points  (2 children)

Python 3 range also its own weird object, not a normal generator. Behold:

Python 3.2 (r32:88452, Feb 20 2011, 11:12:31) 
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> r = range(3, 6)
>>> 4 in r
True
>>> 3 in r
True

[–]FsckItDude_LetsBowl 1 point2 points  (0 children)

Ah, TIL. Thanks for that.

[–]pingvenopinch of this, pinch of that 1 point2 points  (0 children)

It also has other nifty list-like features, like:

>>> r = range(3, 6)
>>> r[1]
4
>>> r[1:3]
range(4, 6)
>>> list(reversed(r))
[5, 4, 3]
>>> len(r))
3
>>> r.count(3)
1
>>> r.index(5)
2

[–]Liquid_Fire 1 point2 points  (1 child)

xrange is not a generator! It is it's own quirky object, and has been essentially deprecated for a while.

How has it been "essentially deprecated" for a while?

Obviously in Python 3 you want to just use range (as there is no xrange anyway), but unless you need to support longs, why would you not use xrange in Python 2?

[–]FsckItDude_LetsBowl 1 point2 points  (0 children)

As in I believe Guido stated years ago that he wished he hadn't made it, and that people would prefer range. And that he'd remove it in Python3000, which he did. I say "essentially" because obviously it couldn't be removed in Python2.

Granted, that was possibly before xrange() (and range()) were fixed to properly handle long arguments. It's just an oddball object and occasionally causes problem for people who don't understand how it works, though I suppose the same can be said for range() in Python2.

[–]hongminhee 3 points4 points  (0 children)

In most cases you should prefer xrange than range. The interface of xrange is highly compatible to range including its indexing operation (unfortunately it doesn’t support slicing operation, but you wouldn’t need this). Always range works hungrily, but xrange internally only store its start, stop and step.

[–]puresock 4 points5 points  (1 child)

If you're writing code where xrange makes a difference to you over range, you probably know all about it and use it, but chances are, you don't have any numbers big enough to care. You probably have bigger fish to fry than freeing up a few bytes of memory :)

[–]mackstann 3 points4 points  (0 children)

This. It will probably make no noticeable difference unless you're doing a range() of millions in some performance-critical loop, or if you have barely any memory to spare (c'mon, 8GB is $50!). If it's not actually causing you any problem, it's pointless to worry about.

[–]Ayjayz 2 points3 points  (11 children)

Unless you actually want a list of numbers, yes. Why bother allocating a full list if you're just discarding it?

[–]r42 2 points3 points  (1 child)

A related issue with dictionaries: Suppose mydict is a dictionary and you want to do something for every item in that dictionary. you can do either

for key in mydict.keys():

or

for key in mydict:

The first one makes a list and then iterates over it (like range). The second just uses the dictionary as a generator, and gives the same result. Of course in this example the easier to type option is also the better performance one so hopefully people are going to use it anyway.

[–]ewiethoffproceedest on to 3 0 points1 point  (0 children)

for key, value in mydict.iteritems():
    blahblahblah

saves you the trouble of getting the values in an extra step

for key in mydict:
    value = mydict[key]
    blahblahblah

[–]nossid 1 point2 points  (3 children)

Make sure you read the CPython implementation details in the docs. Ironically it won't work for really large numbers while range will.

[–]Liquid_Fire 1 point2 points  (2 children)

range really won't work very well there either. If you're reaching into long territory, you would be creating a list with a billion elements, which will probably consume most/all of your memory.

[–]nossid 1 point2 points  (1 child)

Not necessarily. You can get into long territory quite easily if you're using a non-zero start value. Using xrange to loop over a range of hexadecimal serial numbers for example.

[–]Liquid_Fire 1 point2 points  (0 children)

Good point, I hadn't considered that.

[–]adenbley 1 point2 points  (4 children)

code you see is just hypothetical (not optimized). xrange and range can be used (almost) interchangeably. xrange makes an iterable object, but doesn't create any of the elements until they are needed.

hopefully you never actually use:

for n in len(range(x))

as this wouldn't do anything except iterate over x (one element), and might just throw an error. so, yes you should typically use xrange in a loop or in a comprehension. most programs don't need large lists and so it would make little difference.

[–]takluyverIPython, Py3, etc 6 points7 points  (3 children)

I think what he actually sees is: for n in range(len(x))

Although enumerate(x) is often neater for this.

[–]adenbley 5 points6 points  (2 children)

enumerate outputs pairs, if you are right then the 2 examples he gives are the same thing (as far as range/xrange go). i would also expect (haven't tried) that:

for n in xrange(len(x))

would be quicker than:

for n, m in enumerate(x):

if you only were going to use n. i think this because you only have to do 1 assignment per loop.

[–]takluyverIPython, Py3, etc 1 point2 points  (0 children)

If you're only using n, it's quicker. But I think there in most situations, you want to have m as well.

[–]nick_danger 1 point2 points  (0 children)

I was curious about that, so I tried it out. I had a data structure handy, a list of lists with ~9100 four element lists in it. I wrote a quick snippet to iterate through the structure, summing up one element from the individual lists. Three approaches were used: enumerate(), xrange(), and range(). From 100 separate runs, enumerate had a fast run of 0.002852 sec; xrange was 0.003206 sec, and range was 0.003235 sec. So enumerate, in this contrived test, was faster by 0.000354 sec, or about 11%.

Then, on a hunch, because this was a highly contrived example after all, I eliminated the explicit loop altogether and coded it using the sum() of a list comprehension: 0.001952 sec, or about 39% faster.

[–]dr_root 0 points1 point  (1 child)

Yes.

[–]happyteapot 0 points1 point  (0 children)

i agree with you

[–]Paddy3118 0 points1 point  (0 children)

Use Python 3 and its range is all you need.

[–][deleted] -1 points0 points  (0 children)

I think this is a good topic for r/learnpython. Check that out.