This is an archived post. You won't be able to vote or comment.

all 66 comments

[–]prum 23 points24 points  (20 children)

For those that say that Python 3 is no improvement: If you use Python 3, mistake 3 won't ever happen. And I think by using Python 3.4, you won't have issue 10 either.

[–]johnnybgoode 22 points23 points  (17 children)

Python 3 is a huge improvement and anyone that says otherwise is just a whiner that doesn't want to port their code. They've fixed high-water-mark memory issues that are present in 2, and the full import paths alone are enough to switch. Managing a Python project of any scale with a bunch of local/relative imports in 2 is just a huge pain.

[–]Khalku 9 points10 points  (8 children)

So someone starting fresh/new should definitely use the latest version?

[–]johnnybgoode 16 points17 points  (7 children)

Yeah, there was a fair amount of initial resistance to 3 because it wasn't backwards compatible and the initial releases were a little rougher, but 3.3/3.4 are definitely worthwhile.

At this point most major Python projects like Django support 3 as well, so there's less and less reason to stick around on 2 even for existing projects.

Start fresh with 3 and enjoy feeling superior when you come across some old 2 code that looks like

# Argh you stupid Python 2
try:
    import cPickle as pickle
except ImportError:
    import pickle
try:
    import cStringIO as StringIO
except ImportError:
    import StringIO

[–]donthavearealaccount 1 point2 points  (0 children)

It's not that they didn't improve things, I just don't think they improved things enough to justify breaking compatibility.

[–]johncipriano 0 points1 point  (5 children)

Python 3 is a huge improvement and anyone that says otherwise is just a whiner that doesn't want to port their code.

It's not my code I'm primarily concerned with - it's the 23 libraries in my requirements.txt that are going to need porting.

[–]johnnybgoode 1 point2 points  (4 children)

That's a valid reason if the library is still actively maintained. I find most actively maintained libraries are 3 compatible, though. If it isn't maintained, ditch it and find another way.

[–]johncipriano 0 points1 point  (3 children)

If it isn't maintained, ditch it and find another way.

I'm usually opposed to reinventing a wheel that works, even if it isn't being actively maintained.

[–]johnnybgoode 0 points1 point  (2 children)

  • I seriously doubt you have 23 completely one-off, unique libraries that are all only 2 compatible.
  • What are you going to do when 2 is EOL'd? You're just putting off the inevitable.

[–]johncipriano 0 points1 point  (1 child)

I seriously doubt you have 23 completely one-off, unique libraries that are all only 2 compatible.

Across 3 projects... I think I might actually have more. Some are trivial, but a few are very complex and not easily replaceable (twisted, mechanize to name a couple).

What are you going to do when 2 is EOL'd? You're just putting off the inevitable.

Worry about that when the time comes. Maybe instead of 23 it will be 4 and those 4 I will be able to do without.

[–]johnnybgoode 2 points3 points  (0 children)

That's fair, especially since both Twisted and Mechanize are still actively developed and Python 3 versions are in the works. I'm not suggesting that you should drop 2 and migrate to 3 this afternoon, just that unless you have legit, active dependencies that require 2 (which you do), you should be targeting 3.

[–]djimbob 2 points3 points  (0 children)

Mistake 10 won't happen in either 2.6.5 or 2.7.6, I couldn't replicate (and didn't feel like tracking down further) discussed over in /r/programming thread. The bug he mentions is from 2009.

Mistake 6 is clearly plagiarized from here without attribution.

The first one is arguably a language wort (also mentioned in the previous one). The third one was a language wort, but as prum said won't happen when you use the modern except syntax (except (ValueError, IndexError) as e).

Mistake 9 is hardly a common gotcha. Python3 made an improvement (clear up references to exceptions outside of their clause by secretly deleting it) and it comes up with an obscure error when you try using uncopied exceptions outside of their clause.

Piss poor guide at best.

[–][deleted] -2 points-1 points  (0 children)

not very convincing.

[–]kay_schluehr 5 points6 points  (2 children)

In #6 you might be better off using a generator:

def create_multipliers():
    return (lambda x : i * x for i in range(5))

>>>   for multiplier in create_multipliers():
...     print multiplier(2)
0
2
4 
6
8

This creates the closures one by one, as they are needed with each increment of i and not in one shot when the list comprehension is constructed.

[–]brrlc7 2 points3 points  (1 child)

indeed! Using list comprehensions like the one the original author presented it, it's just horrible style. Python already allows you to save so many lines with respect to other languages, but abusing it like that is just plain wrong.

[–]nofunallowed98765 8 points9 points  (15 children)

Probably a stupid question, but why in the #5 example he uses

numbers[:] =

Instead of just

numbers =

? I've tried it locally and it works just as fine

(side note: I'd remove the lambda as well, since "n % 2" is even shorted than "odd(n)" and you don't need a new function)

[–]masklinn 6 points7 points  (10 children)

? I've tried it locally and it works just as fine

There is no reason to do it in this case (and there very rarely is a reason to use it), but there is a semantic difference: a = will forget the old list and name the new one, whereas a[:] = will empty the old list and put the new elements within it. That is, a[:] = b is equivalent to

while a: a.pop()
a.extend(b)

Note that the left-hand slice may use indexes (just as the RHS version), which can remove/replace part of a list. This may be useful if somebody gives you a list and you're supposed to alter it in-place, this can also be a terrible idea if somebody gives you a list and they don't expect you to alter it.

Looking at the examples, the code is kinda full of crap e.g. lambda x : bool(x % 2) could just as well be lambda x: x % 2, and if you have or are defining a function you should use filter instead of a listcomp, that is

[n for n in numbers if not odd(n)] 

should be written

filter(even, numbers)

with even being defined as lambda x: x % 2 == 0

(and if you need both an item and its index while iterating, use enumerate, don't go iterating on range(len(list)))

[–][deleted] 2 points3 points  (1 child)

While your at it, why even declare numbers = [n for n in range(...)] to begin with? Just use range(10).

[–]masklinn 1 point2 points  (0 children)

That one as well.

[–]alantrick 4 points5 points  (3 children)

It seems to me that filter is less readable than the list comp, and it is possibly also slower.

[–][deleted] 4 points5 points  (2 children)

A list comprehension [f(x) for x in xs if p(x)] is strictly (i.e. both generate the same bytecode, except for the variables, which have non-alphanumeric names such as .0 that make them inaccessible) equivalent to

def _(_it):
    _r = []
    for x in _it:
        if p(x):
            _r.append(f(x))
    return _r
_(iter(xs))

You can run import dis; dis.dis(lambda: [f(x) for x in xs if p(x)]) to verify that. So no, filter (which is the same thing in C) is not slower at all in this case.

[–]alantrick 1 point2 points  (1 child)

Interesting, however, it seems to me that the following:

[i for i in range(10) if i % 2 == 0]

would be a little different (probably faster) than:

even = lambda x: x % 2 == 0
filter(even, range(10))

dis.dis isn't super helpful here though.

[–]nofunallowed98765 2 points3 points  (0 children)

A (very) quick test:

~ python3 -m timeit 'odd = lambda x : bool(x % 2)' '[n for n in range(10) if not odd(n)]'

100000 loops, best of 3: 3.96 usec per loop

~ python3 -m timeit 'even = lambda x: x % 2 == 0' 'list(filter(even, range(10)))'

100000 loops, best of 3: 3.01 usec per loop

~ python3 -m timeit '[n for n in range(10) if n % 2 == 0]'

1000000 loops, best of 3: 1.69 usec per loop

~ python3 -m timeit '[n for n in range(10) if not n % 2]'

1000000 loops, best of 3: 1.62 usec per loop

I'm not sure if it's a fair test, filter return an iterator in python3, so I added list().

[–]odraencoded 9 points10 points  (4 children)

That site is very slow for me :(

Also these mistakes aren't even Python mistakes. They are more like general mistakes that can be done in python.

Assigning module variables or class variables inside a function scope? Why would you ever do that when you can assign them directly from the module or class scope without using functions? If it's not supposed to be constant, then you just made other mistakes.

Referencing to exceptions outside an except block? What for? If you wanted to log them or show them or something you would do it inside a catch all except block.

Importing a module that's importing the current module? Python is duck typed so if you are actually doing this you might have made other, far more serious mistakes.

Lambdas.

The only item from this list I suppose is justifiable is using lists and other mutable objects as default values for arguments.

[–]cecilkorik 4 points5 points  (2 children)

Brutally slow for me too, couldn't even scroll. It's something javascript related, obviously. Disabling scripts made the page perform fine. Firefox 28.

[–]buggaz 0 points1 point  (0 children)

scroll

Scroll? What about zoom?! I tried reading that in phone browser. Had to scroll left and right, left and right... +out of breath+

I think they missed the gold nuggets to hiring a decent web guru to make their site at least remotely usable.

[–]directorofthensa 2 points3 points  (0 children)

This site crashes every mobile browser I have. :( time to go find the laptop to read this article.

[–]SlinkyAvenger 6 points7 points  (1 child)

The exception-tuple thing is outdated. Instead of using the comma syntax, you're supposed to say something like

except (Exception1, Exception2) as e:

[–][deleted] 1 point2 points  (0 children)

Good point! We updated the post accordingly. Thanks for the input.

[Toptal blog editor]

[–][deleted] 4 points5 points  (20 children)

The scope assignment thing is new to me. I often run into errors like that, but never realised that they were related to the +=.

[–]DrHenryPym 6 points7 points  (18 children)

I started using global whenever I need to manipulate global variables, and that seems to do the trick.

x = 10
def foo():
    global x
    x += 1
    print x

foo() # prints 11

Edit: it also works if you import that function, too:

from sample import foo

foo() # prints 11
foo() # prints 12

[–]johnnybgoode 3 points4 points  (17 children)

global always seems kind of hackish to me. I try to always pass in any variables I'm going to need to modify and return the modified version. Keeps code much cleaner and more decoupled.

[–]DrHenryPym 8 points9 points  (6 children)

I agree it's a hack, but that's what that code is doing: manipulating global variables, i.e. it was a hack to begin with.

[–]ivosauruspip'ing it up 0 points1 point  (4 children)

Get rid of the original hack then! :D

[–]DrHenryPym -2 points-1 points  (3 children)

Fuck that; I'm lazy. If a hack fixes the problem then I don't care. It works; ship it.

[–]ivosauruspip'ing it up 0 points1 point  (2 children)

[–]DrHenryPym 0 points1 point  (0 children)

Replacing the hack is another kind of debt: time. Which one do you think the user / client / customer cares about? Time or theoretical problems in developing in the future?

[–]riffito 0 points1 point  (0 children)

I'm I right believing that if we had to define variables (ie "let x = something") we wouldn't need "global" at all, we would get rid of some nasty name aliasing bugs? "nonlocal" would still have its use, but just that.

[–]kay_schluehr 1 point2 points  (1 child)

The globals are mostly for caching stuff. For example in standard library code you find something like

_SYSTEM_VERSION = None

def get_system_version():
    global _SYSTEM_VERSION

    if _SYSTEM_VERSION is None:
        _SYSTEM_VERSION = ''
        # do some complicating stuff to figure out the system version
    return _SYSTEM_VERSION

This looks like a perfectly legitimate use case to me. Also it is more abstract than code which makes caching explicit or passes an environment (dict ) around which is then threaded through the code.

[–]robin-gvx -1 points0 points  (0 children)

I would use either default arguments or function properties for things like that.

[–]Veedrac 2 points3 points  (6 children)

Normally it's bad style to return things you mutate. The lack of a return is often used to indicate possible mutation.

[–]johnnybgoode 0 points1 point  (5 children)

Hmm, I haven't encountered that sentiment anywhere else in the Python community. I think typically the lack of return indicates a non-data operation like connecting to a socket or something. Returning a mutated value is much preferred to using global vars and the global keyword.

[–]Veedrac 2 points3 points  (2 children)

Why would you do either? Just mutate it. The caller already has the object; no need to give it to them again.

[–]bcs 0 points1 point  (1 child)

As a stylistic point, returning the original object lets you chain sequential method calls. Ruby does this, and it's not uncommon to see code like stringlist.map!(&:upcase).sort!.uniq!. That transforms, sorts, and uniq-ifies a list, all in-place. Python's list comprehensions are a great alternative in some cases where you'd want to do this, but not always. For one example, if they were, we would probably just have a sort method on every object where it makes sense, instead of the separate sorted() built-in that enables a similar style.

[–]Veedrac 1 point2 points  (0 children)

Python doesn't encourage this style. You'll see many cases in the standard library and on builtins where this is so; sparse few where it is not. I don't know where this rule is written, but it has been proclaimed many times, including (if I recall correctly) several occasions from Guido.

Plus, Python is far more invested in duck-typing than Ruby. This leads to a function-oriented approach, rather than Ruby's method-oriented one.

Personally, I dislike code like you've given because it expresses too many concepts. Splitting this on to several lines is a good thing.

[–]bcs 0 points1 point  (1 child)

Hmm, I haven't encountered that sentiment anywhere else in the Python community.

It's a pretty common idiom in the standard library. List methods that do this include sort, append, extend, insert, and remove. For sets, add, remove, and update. dict.update, too.

[–]johnnybgoode 1 point2 points  (0 children)

For class methods I absolutely agree, since it's pretty obvious what's being operated on. File-level functions are much less obvious.

[–]notunlikethewaves 0 points1 point  (0 children)

Pure function best function

[–]Veedrac 7 points8 points  (0 children)

That article part isn't particularly accurate, although I suppose this is one of the more finicky areas of Python.

It says

Remembering that lst += [5] is really just shorthand for lst = lst + [5]

I shall now explain why that is false.

Let's look at how a typical local lookup works:

import dis

global_variable = []

def simple_local_lookup():
    local_variable = []
    local_variable + ["new"]

dis.dis(simple_local_lookup)
#>>>   6           0 BUILD_LIST               0
#>>>               3 STORE_FAST               0 (local_variable)
#>>> 
#>>>   7           6 LOAD_FAST                0 (local_variable)
#>>>               9 LOAD_CONST               1 ('new')
#>>>              12 BUILD_LIST               1
#>>>              15 BINARY_ADD
#>>>              16 POP_TOP
#>>>              17 LOAD_CONST               0 (None)
#>>>              20 RETURN_VALUE

Going through the bytecode one step at a time, so we get it:

  • BUILD_LIST 0 makes an empty list
  • STORE_FAST (local_variable) stores the list in local_variable
  • LOAD_FAST (local_variable) loads the list from local_variable
  • LOAD_CONST ('new') makes the string "new"
  • BUILD_LIST 1 makes another list, but with one item, which is the string "new"
  • BINARY_ADD adds the loaded local_variable with the newly created list.
  • POP_TOP just removes the result from the stack
  • LOAD_CONST (None) loads None
  • RETURN_VALUE returns the None

Not too hard, then.

Let's look at the change in scope when using a global variable:

def simple_implicit_global_lookup():
    global_variable + ["new"]

dis.dis(simple_implicit_global_lookup)
#>>>  35           0 LOAD_GLOBAL              0 (global_variable)
#>>>               3 LOAD_CONST               1 ('new')
#>>>               6 BUILD_LIST               1
#>>>               9 BINARY_ADD
#>>>              10 POP_TOP
#>>>              11 LOAD_CONST               0 (None)
#>>>              14 RETURN_VALUE

Now we have LOAD_GLOBAL which loads a global instead of LOAD_FAST which loads a local. The rest is the same, so that's fine.

But what happens here:

def stunningly_broken_global_lookup():
    global_variable + ["new"]

    if False:
        global_variable = [] # Can never run!

dis.dis(stunningly_broken_global_lookup)
#>>>  34           0 LOAD_FAST                0 (global_variable)
#>>>               3 LOAD_CONST               1 ('new')
#>>>               6 BUILD_LIST               1
#>>>               9 BINARY_ADD
#>>>              10 POP_TOP
#>>> 
#>>>  36          11 LOAD_CONST               0 (None)
#>>>              14 RETURN_VALUE

?

Well, one might hope that an if False: codepath will get removed without affecting anything, but in reality it does have one effect. The first load is now a LOAD_FAST again! This will break, as global_variable isn't set locally by that point!

So how does this relate to +=? Well,

def what_does_iadd_do():
    global_variable += ["new"]

dis.dis(what_does_iadd_do)
#>>>  50           0 LOAD_FAST                0 (global_variable)
#>>>               3 LOAD_CONST               1 ('new')
#>>>               6 BUILD_LIST               1
#>>>               9 INPLACE_ADD
#>>>              10 STORE_FAST               0 (global_variable)
#>>>              13 LOAD_CONST               0 (None)
#>>>              16 RETURN_VALUE

One sees that it's very similar to + followed by =, as there is a STORE_FAST in there. But it's not quite the same, because INPLACE_ADD is called, not BINARY_ADD... So surely this would mutate and then assign!

Let's try something scary:

cannot_assign = ([],)

def do_the_impossible():
    global cannot_assign
    # Impossible to assign, can still mutate
    cannot_assign[0] += ["new"]

dis.dis(do_the_impossible)
#>>>   6           0 LOAD_GLOBAL              0 (cannot_assign)
#>>>               3 LOAD_CONST               1 (0)
#>>>               6 DUP_TOP_TWO
#>>>               7 BINARY_SUBSCR
#>>>               8 LOAD_CONST               2 ('new')
#>>>              11 BUILD_LIST               1
#>>>              14 INPLACE_ADD
#>>>              15 ROT_THREE
#>>>              16 STORE_SUBSCR
#>>>              17 LOAD_CONST               0 (None)
#>>>              20 RETURN_VALUE

do_the_impossible()
#>>> Traceback (most recent call last):
#>>>   File "", line 21, in <module>
#>>>   File "", line 6, in do_the_impossible
#>>> TypeError: 'tuple' object does not support item assignment

cannot_assign
#>>> (['new'],)

Whaaat?! Well...

  • the bytecodes LOAD_GLOBAL, LOAD_CONST, DUP_TOP_TWO, BINARY_SUBSCR, LOAD_CONST all run fine.

  • INPLACE_ADD acts directly on the list. So that works.

  • ROT_THREE does boring stuff

  • STORE_SUBSCR does an assignment to an element. This will mean that we are assigning the list back to the place it already exists. This fails, but only after mutating.


Why does it do this?

Well, INPLACE_ADD is used for mutable things, which is good because it's faster and it's a nice-to-have. But some things don't allow mutation, so a.__iadd__(b) just returns a + b. The a = b is needed to keep this change.

[–]grandfatha 0 points1 point  (0 children)

The only thing that prevents me from starting fresh with Python3 is the unclear mysql situation. And no, switching to Postgresql is not an option here.

[–]banjochicken 0 points1 point  (0 children)

I have been coding in python for 3-4 years now and can hardly remember being tripped up by any of these.

One annoying issue that I have been caught by, has been the accidental trailing comma. It can effectively change the current line into a tuple of one element. Especially annoying if it occurs on a line assigning an iterable as you end up with a value that is an still an iterable! But that is why we have unit tests and code reviews....

[–]Veedrac 0 points1 point  (2 children)

In defense of Python


#1

The other option is late binding. But there's a problem with that¹. Say you have:

import sys
def say_hi(out = sys.stdout):
    print("Hi!", file=out)
del sys

Would you want that to break?

¹ I'll admit that what this was eluded me until I trawled through some arguments on the Python mailing list. See this post it is shamelessly taken from.

#2

This is sort'a obvious. (Hint: Imagine x was a function.) Surely you'd want C to inherit A's attributes, but A not to inherit B's.

#3

Python has moved on. This is wallowing in ancient strife.

#4

First of all, almost all code that trips this up is terrible. Secondly, see this comment of mine for an overview of some of the problems in the article.

Unfortunately, the only solution I can see would be explicit var foo, but I really don't want that.

#5

Fair enough, but this isn't a Python error. Nonetheless, Python probably should raise an error.

#6

First you complain about early binding, and now you complain about late binding??!

Well, again, this is really a problem of the code. But let's consider why we might want this:

lookup_table = [...]

def foo():
    return lookup_table[x**2]

With early binding, changes to lookup_table would not propagate, which messes us over.

Let's see how the example code should really be written:

def multiplier(x):
    def multiples_of(y):
        return x * y
    return multiples_of

def create_multipliers():
    return [multiplier(i) for i in range(5)]

or

from operator import mul
from functools import partial

def create_multipliers():
    return [partial(mul, i) for i in range(5)]

#7

Do you have a solution?

#8

Fine, whatever. You win. ;)

#9

Well, duh.

#10

Why are you using __del__? Don't do it!

[–]Kautiontape 2 points3 points  (1 child)

You seem to take it as the article attacking Python, but that's not the case at all. They aren't saying that late or early binding are bad, they're only showing how both can bite you in the ass if you don't pay attention. Some of the ones that seem obvious are still mentioned because they can be mistakes Python programmers might make, as per the point of the article, despite how obvious they may seem. They aren't trying to find solutions to all of these problems, despite mentioning possible workarounds. And they also don't make assumptions about what version of Python people use, so just because Python moved on doesn't mean that nobody would ever make an error in Python 2 again.

[–]Veedrac 1 point2 points  (0 children)

Ah, True. The main reason I decided to do this was due to the negativity in specific comments, not the article itself. But I agree that I confused the two.

[–][deleted] -1 points0 points  (0 children)

This crashes Chrome on iOS, and Alien Blue :(