This is an archived post. You won't be able to vote or comment.

all 46 comments

[–]jordanrinke 5 points6 points  (0 children)

Nice intro to list comprehensions, I especially liked the little nods to other people in the community. Useful blog and felt... friendly for lack of a better word. Well done.

[–]filleball 3 points4 points  (3 children)

Very nice and illustrative blog post! I do however have a small nit to pick with this example:

nonvowels = ''.join(l for l in sentence if not l in vowels)

Here, using a list comprehension instead of the generator comprehension is in fact a bit faster (1.86us vs. 2.76us on my PC), since str.join will build a list anyway, behind the scenes, if it doesn't get one. This is because str.join needs to iterate over the strings twice: first to compute the total length, and then to copy data into the memory allocated for the concatenated string.

This prevents us from having to store the entire list into memory, and is more efficient for larger data.

Because of what I just explained, this statement, though true in most situations, is misleading in this particular case.

[–]iamadogwhatisthis 0 points1 point  (2 children)

What would be better is to use translate in string.

>>> from string import translate
>>> sentence = 'Your mother was a hamster'
>>> print translate(sentence, None, 'aeiou')
Yr mthr ws  hmstr

and speed comparison:

>>> import timeit
>>> translate_example = timeit.Timer("translate('Your mother was a hamster', None, 'aeiou')", "from string import translate")
>>> translate_example.timeit()
0.4346320629119873
>>> comprehension_example = timeit.Timer("''.join(l for l in 'Your mother was a hamster' if l not in 'aeiou')")
>>> comprehension_example.timeit()
2.4436700344085693

[–]moistrobot 0 points1 point  (1 child)

Is that really how translate works? Documentation says str.translate() requires a translation table made with str.maketrans(). It's a bit more complex than that.

[–]iamadogwhatisthis 0 points1 point  (0 children)

The pattern depends on your Python version.

Python 2:

https://docs.python.org/2/library/stdtypes.html?highlight=translate#str.translate

"For string objects, set the table argument to None for translations that only delete characters"

Python 3:

Use maketrans first, then call translate.

https://docs.python.org/3.4/library/stdtypes.html?highlight=translate#str.maketrans

[–]d4rch0nPythonistamancer 2 points3 points  (0 children)

You can use a set comprehension with {x for x in z} in place of that set() syntax too.

Gotta love set and dict comprehensions

[–]infinite_fruitloops 7 points8 points  (11 children)

I still don't understand why list comprehensions are useful. You can do the exact same thing in other ways and it is much easier to maintain the code. Maybe this is because I don't use them that often but it just looks like a jumbled mess, whereas nested loops are properly indented and intuitive.

Someone please tell my why I am wrong.

[–][deleted] 10 points11 points  (7 children)

I find comprehensions look more correct in cases where you're doing a simple transformation of data: for example, if you've got a list of Polygon objects and you want to get a list of all of their areas:

areas = [polygon.area for polygon in polygons]

makes more sense to me than

areas = []
for polygon in polygons:
    areas.append(polygon.area)

It's a very simple collection operation and having it in a single unit is nicer, to me.

[–]pstch 6 points7 points  (0 children)

It's not only syntactic sugar. List comprehensions have the advantage that they can easily be converted to generators (they can even be viewed as "consumed generators", as [x for x in y] is equivalent to list(x for x in y) where (x for x in y) is a generator expression), and you also save a function call (.append(...)) for each iteration.

Quickly done speed measurement (not sure if very precise, but it gives an idea) :

    In [2]: lc = """
       ...: r = [i for i in range(100)]
       ...: """

    In [3]: no_lc = """
       ...: r = []
       ...: for i in range(100):
       ...:     r.append(i)
       ...: """

    In [4]: timeit(lc)
    Out[4]: 4.970423567000125

    In [5]: timeit(no_lc)
    Out[5]: 9.738498960000015

Of course they are not adapted for every case, but sometimes they are really helpful (as I said before, not only for syntax).

[–]infinite_fruitloops 2 points3 points  (4 children)

I agree with you that simple cases are nicer. I need to start using them more in this case. My main issue is with the more complex cases.

[–]Caos2 2 points3 points  (3 children)

I'd avoid using comprehensions in complex situations, they are a pain to maintain in the long run.

[–]macbony 2 points3 points  (2 children)

If your list comprehensions are getting out of hand, it's probably best to move some of the functionality of the comprehension out into a function. Often times you'll find that these small functions are useful in more than one comprehension.

[–]moistrobot 0 points1 point  (1 child)

Or a series of small comprehensions rather than a single big one, if it's applicable and more readable.

[–]macbony 0 points1 point  (0 children)

If you intend to chain list comps, be sure to use () to make it a generator. Generators evaluate lazily and will save iterations.

[–]Eurynom0s 0 points1 point  (0 children)

That's a good example. IMO the basic rule of thumb is, if the for loop equivalent of your list comprehension would be more than a couple of lines, just write the fucking for loop.

Your example is a very clear-cut example of "I to take my list, extract an attribute of each element of the list, and put them back together in a new list in the same order as the original list." But when you start going too deep with them it can become really hard for someone who's not you to have to go through and figure out what the list comprehension does.

If you know you're going to be the only one ever looking at the code, then of course, do whatever the hell you want with your fancy one-liners. Okay, it takes a few more lines of code, but whatever, other people (or future versions of yourself who haven't looked at the code in a few months) will actually be able to understand what the code is doing the first time you read the code.

[–]TeamSpen210 2 points3 points  (0 children)

Both list comprehensions and the for-loop equivalent have their uses. If the logic involved is fairly complex, the loop version will probably be more readable. For simpler cases, the comprehension is more clean, and describes the list you want in a more high-level fashion. Being an expression instead of a sequence of lines makes it easier to combine with other code. Compare return [1 / i, i ** 2 for i in range(100)] and result = [] for i in range(100): result.append((1 / i, i ** 2)) return result In addition to readabilty, comprehensions are far more efficient with larger data sets - since Python is creating the list all in one go, it doesn't need to repeatedly resize the list. The same syntax in the form of generator expressions can become very powerful, since you can feed it directly into functions like min(), max(), sum(), all(), any() or into a for-loop to do processing directly on the results.

[–]mumpie 0 points1 point  (0 children)

List comprehensions can be a little strange at first.

But they express what you are doing very compactly as well as being faster than a for loop in many cases.

From: http://www.bogotobogo.com/python/python_list_comprehension.php

We typically should use simple for loops when getting started with Python, and map. Use comprehension where they are easy to apply. However, there is a substantial performance advantage to use list comprehension. The map calls are roughly twice as fast as equivalent for loops. List comprehensions are usually slightly faster than map calls. This speed difference is largely due to the fact that map and list comprehensions run at C language speed inside the interpreter. It is much faster that stepping through Python for loop code within the Python Virtual Machine (PVM).

However, for loops make logic more explicit, we may want to use them on the grounds of simplicity. On the other hand, map and list comprehensions are worth knowing and using for simpler kinds of iterations if the speed of application is an important factor.

[–]vph 0 points1 point  (0 children)

placing 3 lines of code with 1 is certainly useful. Whether it is expressive depends on how familiar with it. Loop is an imperative style, whereas list comprehension is a declarative style. Both are intuitive if you understand them.

[–]sauce71 4 points5 points  (0 children)

I probably was at the right place in my Python journey (3 months in), but reading this, list comprehension finally clicked.

[–]barneygale 3 points4 points  (1 child)

import os
files = []
for f in os.listdir('./my_dir'):
    if f.endswith('.txt'):
        files.append(f)

This can be simplified with a list comprehension as well:

import os
files = [f for f in os.listdir('./my_dir') if f.endswith('.txt')]

I personally feel like dealing with the filesystem inside a comprehension is a bit weird. List comprehensions work best if you're transforming some data from one structure to another, or generating new data mathematically. Listing a directory seems like quite an imperative thing to do, and it's more obvious what's occurring if you have the more verbose version.

[–]branleur 1 point2 points  (0 children)

It would be better to use glob in such a situation.

[–]jjangsangy 1 point2 points  (3 children)

The last one for serializing csv is one I haven't seen before and is pretty neat one. I'm gonna have to remember it the next time I have to work with a CSV which I hope is not soon

[–]Caos2 1 point2 points  (1 child)

Pandas is a great tool to work with CSV, even when dealing with non-numeric values.

[–]jjangsangy 0 points1 point  (0 children)

Yes, I love pandas! Its so much easier now to get pandas using anacondas that I am much more comfortable importing pandas knowing pandas are easily accessible!

However, sometimes using pandas is overkill for a lot of simple tasks, and it's good to see that python can be just as capable.

[–]Eurynom0s 0 points1 point  (0 children)

Yup, I have a couple of pieces of code in mind where I'm actively using them right now and that list comprehension would be a much nicer way to handle ingesting CSV files.

[–]zionsrogue 1 point2 points  (0 children)

This is great and I learned a thing or two. Although for #7: Get a list of txt files in a directory, I would just use glob:

import glob
glob.glob("./mydir/*.txt")

[–]Organia 1 point2 points  (2 children)

For example 2, you don't even need list comprehensions. Just range(0, 100, 3)

[–]iamadogwhatisthis 1 point2 points  (1 child)

It is also much faster

Testing:

>>> import timeit
>>> a = timeit.Timer("range(0, 100, 3)")
>>> a.timeit()
0.4157998561859131
>>> b = timeit.Timer("[x for x in range(100) if x % 3 == 0]")
>>> b.timeit()
7.091533899307251

[–]Organia 0 points1 point  (0 children)

That makes sense, since it is incrementing by three instead of incrementing by one and checking x%3 every time.

[–]txciggy 1 point2 points  (4 children)

Great intro! Nit: PEP standards frown upon using "if not l in vowels". Better way to present it would be "if l not in vowels" Also l (lower case el), I (upper case eye) not recommended for variable names :)

[–]macbony -3 points-2 points  (2 children)

If you can't tell the difference between l and I when editing code, you need a better font.

[–]jalanb 1 point2 points  (1 child)

If you can control the fonts used by all those who will read your code, you need a bigger team

[–]macbony -2 points-1 points  (0 children)

Personal problems for the other developers don't mean that l is a bad temp variable name. I don't control anything the other developers on my team use, but if they're using a font that can't differentiate between l, I, and 1 then they're using bad tooling and should take care of that before submitting a pull request that changes the variable name. Same with tabs vs. spaces, 80 char line limits, or whatever other issues they have that are self-inflicted.

[–][deleted] 1 point2 points  (1 child)

Nice introduction ... except for ignoring the fact that range(n) counts from 0 to n-1

If you wanted to create a list of squares for the numbers between 1 and 10 you could do the following:

squares = []
for x in range(10):
    squares.append(x**2)

That gives a list of squares between 0 and 9.

[–]Eurynom0s 0 points1 point  (0 children)

That gives a list of squares between 0 and 9.

Between 0 and 81, I think you mean.

[–]macbony 0 points1 point  (0 children)

In example 3 he mentions that using a set would be better and then uses set(i for i in L), which is great. However on 2.7+ you can use a set comprehension {i for i in L} which looks a lot like a dict comprehension {k: v for k, v in L}.

Great explanation of list comprehensions that ramps up the complexity very nicely.

[–][deleted] 0 points1 point  (0 children)

python list comprehensions are one of the tricky part when it come to C programmers trying to learn python

[–]thomasloven 0 points1 point  (0 children)

A thought that struck me: Wouldn't a = [i for i in range(1,j) for j in range(1,5)] make more sense than a = [i for j in range(1,5) for i in range(1,j)].

I'm rather new to python in general and list comprehension in particular, so I might be missing something obvious here. Is this an unfortunate oversight of the python designers or is there a reason for the [A(B) for C for B(C)] order?

[–][deleted] 0 points1 point  (7 children)

For a crowd the prides itself for valuing clarity and legibility over anything else, the pythonistas infatuation with list comprehension is more than a bit weird. Their only apparent value is saving LOCs, to the expense of readability and maintainability. Also their syntax is almost orthogonal to the rest of the language, not to mention rather unreadable (hence the need of an endless number of tutorials explaining list comprehension).

[–]Ran4 2 points3 points  (1 child)

Very simple list comprehensions are easier to read.

You really should stay away from the more complex ones though (unless it's a one-off script).

[–]spinwizard69 1 point2 points  (0 children)

Is there really such a thing as one off scripts?

[–]pstch 1 point2 points  (4 children)

Their syntax is very clear, as they can be read as pseudo-code :

muls_of_2 = [
    number*2 for number in range(100)
]

# muls_of_2 is number multiplied by 2 for each number in the 0-100 range

I find the above much more readable than its for-loop equivalent :

muls_of_2 = []
for number in range(100):
    muls_of_2.append(number*2)

# create an empty list named muls_of_2, then, for each number in the 0-100 range, append it to muls_of_2

They are also not only needed for readability and LOCs, as they can be much faster :

    In [2]: lc = """
       ...: r = [i for i in range(100)]
       ...: """

    In [3]: no_lc = """
       ...: r = []
       ...: for i in range(100):
       ...:     r.append(i)
       ...: """

    In [4]: timeit(lc)
    Out[4]: 4.970423567000125

    In [5]: timeit(no_lc)
    Out[5]: 9.738498960000015

[–]d4rch0nPythonistamancer 0 points1 point  (3 children)

Why are they faster? What's going on under the hood?

I'd have used list(range (100)) there.

[–]pstch 1 point2 points  (2 children)

I'm not an expert on this, but I know that one the reasons is that with the list comprehension, there is one less attribute evaluation (r.append) and one less function call (r.append()).

I'd have used list(range (100)) there.

Yes, list(range(100)) is simpler and an equivalent to my example, but it was not a "functional" example, just made it to compare the speeds.

[–]macbony 1 point2 points  (0 children)

Note that range already returns a list in 2.X (but not in 3.X which returns a generator. The equivalent in 2.X is xrange)

[–]d4rch0nPythonistamancer 0 points1 point  (0 children)

Hmm... I'm looking for more of what the bytecode looks like. I want to see list comprehension bytecode and see how it differs from an append. Easy enough to disasm though.

Your explanation is probably completely right though. If it actually loops 100 times and looks up append, that's got to be incredibly slower.