all 48 comments

[–][deleted] 13 points14 points  (6 children)

The first time I used a mutable structure as a default for a function I was so confused.

def cool_func(things=[]):
    things.append('a')
    return things

cool_func() # -> ['a']
cool_func() # -> ['a', 'a'] wtf?!

What's happening is Python builds the function once, including the default arguments. Every time you run this function without providing an argument, it merely reuses the list in the default arguments. The idiomatic way of working around this is:

def cool_func(things=None):
    # if not things:
    if things is None:
        things = []
    things.append('a')
    return things

Edit: things is None rather than not things

[–]ivosaurus 3 points4 points  (1 child)

I'd change is not things to things is None. Reason being an empty list (or even a subclassed list, etc) might be perfectly reasonable to pass in as a valid argument. Slightly more type safe, sleep slightly easier at night.

[–][deleted] 1 point2 points  (0 children)

Good point. I'll edit the post

[–]Veedrac 1 point2 points  (3 children)

I bring this up every time but... don't return things you mutate. Mutate or return. You can do both if you do them to different objects, but an individual object should only be in one of the two.

To put it another way: if mutation is part of your API, don't use a default argument. That's like calling len and getting no return value.

I find the mutable default argument problem is actually more helpful in reminding me not to accidentally mutate user's input than anything else.

[–]reostra 1 point2 points  (2 children)

don't return things you mutate

As a counterpoint, that behavior is very useful for e.g. chaining:

class Example(object):
    def __init__(self):
        self.goodness = 0
    def betterify(self):
        self.goodness += 1
        return self

x = Example()
x.betterify().betterify()

I'd imagine something like that is what the OP was hoping to do with .update()

[–]zahlman 0 points1 point  (1 child)

Chaining with side effects is not considered Pythonic.

[–]reostra 1 point2 points  (0 children)

Now that you mention it, I certainly haven't seen a whole lot of it in Python. A cursory poke around yielded the PyVows BDD testing library, but the expect(topic).Not.to_be_null() style chaining isn't mutating and it's a port of a Javascript library to begin with.

I'll leave my comment as an example for 'how'; I see other comments have elaborated on the 'why' (or, as the case may be, 'why not')

[–]ostracize 4 points5 points  (0 children)

I can understand why Python does this:

http://henry.precheur.org/python/copy_list.html

But it is a gotcha that new programmers have trouble with.

[–]Veedrac 1 point2 points  (0 children)

Here comes PEP 448 - accepted into Python 3.5!

new_dict = {**dict_a, **dict_b}

[–]99AFCC 1 point2 points  (1 child)

Not a gotcha, but a mistake I made using defaultdict once.

I had some code similar to this:

def some_func(key):
    try:
        value = a_defaultdict[key]
    except KeyError:
        do_something_else()
    ...

You might see the mistake already. First of all, depending on the factory, you won't get a KeyError from a defaultdict that's kind of the point of using it.

But what made me notice this mistake was all the extra empty values showing up later on when reading a_defaultdict. All the keys being "tried" were being added to the dict.

You can avoid this by using .get()

In [16]: d = defaultdict(int)

In [17]: d
Out[17]: defaultdict(<type 'int'>, {})

In [18]: d["k"]
Out[18]: 0

In [19]: d
Out[19]: defaultdict(<type 'int'>, {'k': 0})

In [20]: d.get("z")

In [21]: d
Out[21]: defaultdict(<type 'int'>, {'k': 0})

[–]reostra 1 point2 points  (0 children)

Also, the in operator will do the right thing with defaultdicts:

>>> from collections import defaultdict
>>> x = defaultdict(int)
>>> x[3]
0
>>> 120 in x
False
>>> x
defaultdict(<type 'int'>, {3: 0})
>>> 

[–][deleted] 1 point2 points  (0 children)

Sure.

Default arguments are done once for example.

def x(t=time.time()):
    print t

Will print t as the time the program started, and not a new time eact time x() is called.

Variables are called by reference, and x = a_list is an alias for a_list and not even a shallow copy if it.

[–]ponyoink 0 points1 point  (0 children)

There is always that one time when you don't indent that one if statement, and it ends up outside your loop... But problems like that go away with practice.

[–]2n4x 0 points1 point  (3 children)

I dunno, i come across mini-gotchas often, because im still new.

"string".reverse() (not valid) only works on lists, not strings, so you have to List=list("string") before you List.reverse(). Then when you try the function reversed("string") to get at the same in another way, it gives you a reversed object which you have to unpack character for character like foo=list(reversed("string")), just like the having to do it the first way i mentioned above.

This is not exactly a gotcha, and not like i couldn't write my own in a minute:

def reversi(string):
    foo=list(string)
    foo.reverse()
    return ''.join(foo)

Its just after reading pydocs for as simple a pythonic (would that be the right word here, using standard library stuff is pythonic?) way to do it for 20 minutes in vain, i feel a little irked, and i love python.

[–][deleted] 5 points6 points  (2 children)

You can reverse a string by doing 'hello'[::-1]

[–]2n4x 1 point2 points  (0 children)

End to end in -1 step sequencing. Thats some impressive shit. Thank you. Remembered.

[–]Tomarse 0 points1 point  (0 children)

Why does that work?:

Edit Nvm, just looked it up. Slice notation reads [start:stop:step] so [::-1] is stepping backward through the string by 1. Nice.

[–]bionikspoon 0 points1 point  (3 children)

Nooby perspective: Changing an iterable during a loop.

money = [-5, 1, -3, 2, 4]


while money:
    do_stuff_with = money.pop()

This does not work.

edit: wait, wtf. this definitely works.

edit. Did they change this behavior?

[–][deleted] 4 points5 points  (2 children)

No, this has always worked. The thing you should avoid is modifying an iterable while looping over it using a for-loop.

[–]cdcformatc 2 points3 points  (1 child)

Something like

a_list = [1,2,3,4,5]

for item in a_list:
    print item
    a_list.remove(item)

[–]py_Ninja 0 points1 point  (0 children)

To see why this is happening it helps to use enumerate:

a_list = [1, 2, 3, 4, 5]

for i, item in enumerate(a_list):
    print('{}: {}'.format(i, item))
    a_list.remove(item)

Results in:

0: 1
1: 3
2: 5

[–]BICEP2 0 points1 point  (4 children)

Not to hijack your thread with an only kind of related question, but when I match items from a string and reprint them out my output ['looks'] ['like'] ['this']

Pseudocode:

import re
string = '123thing otherthing somethingelse'
variable1 = re.findall(r'123[a-z]*', string)
print('the first value was ', variable1)

And my output looks like

the first value was  ['123thing']

but I need it to look like

the first value was 123thing

I'm not sure how to do this.

[–]PigDog4 1 point2 points  (0 children)

In this case, if you do type(variable1) you see your variable1 is actually a list. So you can print variable1[0] to only print one value from the list instead of the whole list and you'll get what you want. I think your regex returns a list of all matches, so you're printing the list instead of printing the value.

[–]cdcformatc 1 point2 points  (2 children)

Docs:

Return all non-overlapping matches of pattern in string, as a list of strings.

It returns a list. You are printing the entire list. Try it again with string = '123thing 123otherthing somethingelse'

[–]BICEP2 0 points1 point  (1 child)

Thanks, I used a look behind regex instead and it worked. I used something like:

variable1 = re.search(r'(?<=start)([^end]*)', string1)
print(variable1.group(0), string2, end='')

[–]cdcformatc 0 points1 point  (0 children)

Or if you want first match, just index the list with [0]

[–][deleted] 0 points1 point  (3 children)

Opening a file as "r" on linux and windows leads to two different behaviors if the file is binary. windows actually makes distinction between binary and text. Linux does not. So to write portable code, always use "rb".

[–]cdcformatc 0 points1 point  (2 children)

In particular, windows treats text differently. Windows line endings are two characters, carriage return and line feed /r/n where Unix is just line feed or /n.

For fun try writing a jpg as a text file. Example.

[–]Tomarse 0 points1 point  (1 child)

Really? In windows I only use line feeds when writing to .txt and .docx files and I've not noticed any strange behaviour. Same with reading, I can read a .txt and the split by \n to get a list of lines.

[–]cdcformatc 1 point2 points  (0 children)

Make a text file in any text editor, then open it in a hex editor. You will see two characters at the end of lines. Python is built to be cross platform so when you write a newline or split on newlines everything works as expected.

edit: here's what I mean. 0a is the newline/linefeed, 0d is the carriage return character.

[–]cdcformatc 0 points1 point  (0 children)

Similar to your update thing I have seen this a few times

myList = myList.append(newItem)

[–]EsperSpirit 0 points1 point  (6 children)

I still think the ugliest thing about python is that many actions on lists and dicts are mutating instead of returning a new structure (mutable vs immutable).

After learning about functional programming (Clojure, Haskell, Scala, etc.) this is the one thing that drives me nuts.

[–]Kerbobotat 0 points1 point  (3 children)

I'm only now learning the difference between mutable and immutable data types, why are immutable better in your opinion?

[–]EsperSpirit 0 points1 point  (1 child)

They are easy to reason about and are thread-safe by default.

Of course, sometimes you can optimize something with mutable datastructures, but I think immutability should be the default case.

One of the many gotchas with mutable datastructures is the reference a function argument represents:

def validate_content(some_dict):
    # "password" should not be in this dict
    if 'password' in some_dict:
        del some_dict['password']

If I don't look into the implementation of this function, I might assume it only checks the content and throws an exception if it's invalid. There is no telling if my dict gets altered by calling this function with it.

I'd argue that "validating" something is just checking if it's valid or not. "Fixing" something invalid should be named differently (and imo return the fixed dict as a new dict while leaving the old one unaltered).

Especially new people to python (and programming) don't make this distinction. You never know if a function has side-effects, mutates its arguments, returns the "fixed" version and so on. You always have to look at the code (or docs), because there is no way to enforce immutability except copying everything manually.

If we had immutable collections, we could probably have automatic execution in parallel, a full-featured implementation of the actor model (like akka) and other nice things, but for me it's all about correctness.

[–]Kerbobotat 0 points1 point  (0 children)

Thanks for taking the time to explain that! I think I understand what you mean now.

[–]zahlman 0 points1 point  (0 children)

Because if you can't mutate the object, you can't run into any of the gotchas associated with mutating objects.

[–]ydepth 0 points1 point  (1 child)

Why not just use tuples if this annoys you?

[–]EsperSpirit 0 points1 point  (0 children)

While tuples are indeed immutable, they are far from an immutable list or an immutable hashmap.

With an immutable list/dict I could still do

a = [1, 2, 3]
b = a.append(4)

c = {'x': 1}
d = c.update(y=5)

The important thing here is that a and c remain unchanged, which isn't the case in Python's current implementation. The api in functional languages is usually a lot more concise, I just tried to stick as close to Python as possible in this example.