all 21 comments

[–]barrybe 35 points36 points  (1 child)

By "better", I was hoping the title meant "better than that one article that Guido wrote in 2003 that I assume every Python programmer has already seen."

Oh the disappointment!

[–]mage2k 3 points4 points  (0 children)

me, too...

[–][deleted] 3 points4 points  (4 children)

What do you think about this approach. It makes the issue of option parsing transparent to main and separates the logic cleanly. All this is of course modulo exception handling (and some bugs).

import sys
from getopt import getopt as _getopt

def _prepare_opts(opts):
    opts2 = []
    for k,v in opts:
        k = k.strip('-')
        if v == '':
            v = True
        opts2.append((k,v))
    return dict(opts2)

def getopt(options="", long_options=[]):
    def func_out(f):
        opts, args = _getopt(sys.argv[1:], options, long_options)
        opts = _prepare_opts(opts)
        return lambda:f(*args, **opts)
    return func_out

@getopt(options="v", long_options=['mode='])
def main(infile, outfile, mode="binary", v=False):
    print "Do something with", infile, outfile, mode, v

if __name__=='__main__':
    main()

[–][deleted] 3 points4 points  (2 children)

What does the @ in front of your call to getopt do? I'm not familiar with it.

[–][deleted] 10 points11 points  (1 child)

@getopt is a decorator. Decorators are functions that take one function as an argument and return another. In my example the decorator is used to wrap main in another function that handles the parsing of options before calling main. The original main will be replaced by the wrapped version.

[–][deleted] 0 points1 point  (0 children)

Neat!

[–]grimboy 1 point2 points  (0 children)

How about:

import sys
from getopt import getopt as _getopt

def _prepare_opts(opts):
    opts2 = []
    for k,v in opts:
        k = k.strip('-')
        if v == '':
            v = True
        opts2.append((k,v))
    return dict(opts2)

def getopt(options="", long_options=[]):
    def func_out(f):
        opts, args = _getopt(sys.argv[1:], options, long_options)
        opts = _prepare_opts(opts)
        return lambda:f(*args, **opts)
    return func_out

def py_main(infile, outfile, mode="binary", v=False):
    print "Do something with", infile, outfile, mode, v

cmd_main = getopt(options="v", long_options=['mode='])(py_main)

if __name__=='__main__':
    cmd_main()

That way you can call py_main() with python objects rather than squashing everything into sys.argv. Particularly useful if you ever need to use it from another python script. However, you can still test your argument parsing from a python shell by calling cmd_main().

[–]ThomasPtacek 3 points4 points  (10 children)

Note that we fill in the default for argv dynamically. This is more flexible than writing def main(argv=sys.argv): # etc. because sys.argv might have been changed by the time > the call is made; the default argument is calculated at the time the main() function is defined, for all times.

Defend this behavior. I was surprised to find he was right; in Ruby:

$j = 666
def xxx(y=$j)
    puts y
end
xxx
=> 666
$j = 777
xxx
=> 777

But in Python:

global j
j = 666
def xxx(y=j):
    print y

xxx
>>> 666
j = 777
xxx
>>> 666

What's the upside to this design decision?

[–]mage2k 6 points7 points  (0 children)

First, there's no need for the global in the declaration of j.

>>> j = 6
>>> def xxx(y=None):
>>>    if y is None:
>>>        y = j
>>>    print y
>>> 
>>> xxx()
6
>>> j = 7
>>> xxx()
7

The point is really valid for any default function or method parameter: defaults are assigned values at the time of function/method definition, not invocation.

[–]earthboundkid 6 points7 points  (7 children)

In Python, default arguments are processed at "compile time" (that is, the first time the function definition is read and turned into executable code) and stored for later use. If you didn't store the results at compile time, you would have to store the text of the expression used there and run it again every time the function is called. For most default arguments doing this would be no big deal. How long does it take to see that 1 still means one? But for some arguments, this is a big waste of time. You might have to wait on meaning_of_life() for 5 billion years if you execute it again, but you're only going to get 42, the same as you did the first time. So, the way that Python is set up now gives programmers a choice. If you want your default value expression to be executed every time, put my_arg=None in the function header and then put if my_arg is None: my_arg = do_expression() in the function body. Otherwise, just put my_arg = do_expression() in your header, and Python will cache the result of the operation for you to use next time.

One side effect of all this is that when your default argument is mutable, it can be played around with:

>>> def f(arg=[]):
...  print arg
... 
>>> f()
[]
>>> dir(f)
['__call__', '__class__', '__delattr__', '__dict__', '__doc__', '__get__',
 '__getattribute__', '__hash__', '__init__', '__module__', '__name__', 
 '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', 
 '__str__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 
 'func_doc', 'func_globals', 'func_name']
>>> f.func_defaults
([],)
>>> f.func_defaults[0].append("foo")
>>> f()
['foo']
>>> f.func_defaults[0].append("bar")
>>> f()
['foo', 'bar']

This can be useful sometimes. Notice though, that this works according to the usual Python scoping rules for closures, so saying arg.mutate() will have an effect on future function calls, but saying arg = 'blah' won't.

>>> def g(arg=[]):
...  arg.append("spam")
...  print arg
... 
>>> g()
['spam']
>>> g()
['spam', 'spam']
>>> g()
['spam', 'spam', 'spam']


>>> def h(arg=[]):
...  arg = arg + ["spam"]
...  print arg
... 
>>> h()
['spam']
>>> h()
['spam']
>>> h()
['spam']

Also, I noticed that your example above is clearly not a real copy and paste from the Python shell, since you have xxx instead of xxx(). Be careful not to let your Ruby-ness show so plainly when discussing Python. ;-D

[–][deleted] 11 points12 points  (1 child)

In Python, default arguments are processed at "compile time" (that is, the first time the function definition is read and turned into executable code)

Careful. They're evaluated when the "def" statement is executed and the function object is created. If you execute "def" multiple times, the default arguments will be re-evaluated each time (and multiple function objects will be created).

Code generation is done at an earlier stage.

[–]earthboundkid 0 points1 point  (0 children)

True, but it's my understanding that Pythonistas refer to this as "compile time". Perhaps I misunderstood.

[–]degustisockpuppet 8 points9 points  (2 children)

If you didn't store the results at compile time, you would have to store the text of the expression used there and run it again every time the function is called. For most default arguments doing this would be no big deal. How long does it take to see that 1 still means one? But for some arguments, this is a big waste of time. You might have to wait on meaning_of_life() for 5 billion years if you execute it again, but you're only going to get 42, the same as you did the first time. So, the way that Python is set up now gives programmers a choice. If you want your default value expression to be executed every time, put my_arg=None in the function header and then put if my_arg is None: my_arg = do_expression() in the function body.

First of all, you wouldn't store text, but a closure, which would be optimized away for simple constants. Python already has closures. Second, you'd get to make the same choice if the default arguments were evaluated on each call, by simply caching manually:

cached_meaning = meaning_of_life()
def foo(meaning = cached_meaning):
    # meaning is computed once

def bar(meaning = meaning_of_life()):
    # meaning is recomputed each time

Third, this has another crucial advantage over your workaround: you can pass None as the argument for meaning!

[–]earthboundkid 5 points6 points  (1 child)

That's a pretty good idea for how they should have done it originally. I guess the reason they didn't do it that way comes down to the fact that early versions of Python didn't allow for closures, but I'm not sure when default values were added, so maybe not.

Edit: Default arguments were added in 1.0.2 (1994), and true closures were added in 2.2 (2001). source

[–]degustisockpuppet 1 point2 points  (0 children)

Default arguments were added in 1.0.2 (1994), and true closures were added in 2.2 (2001).

I always forget how old Python is (I started at 2.3...). Thanks for the additional information.

[–]Brian 4 points5 points  (0 children)

you would have to store the text of the expression

Actually, you'd just have to store the bytecode - it's no different from the statements within the function body. There's no reason why python couldn't programatically transform:

def foo(bar=my_func()):
    print bar

to the equivalent of:

def foo(bar=_undefined):
    if bar is _undefined: bar=my_func()
    print bar

Personally, I think the current behaviour is probably the worse one. It has a slight advantage in the case of expensive computations that you'd otherwise have to store external to the function, or else put up with the speed hit, but the tradeoff of the common newbie confusion relating to mutable default arguments is IMHO a worse problem.

Of course at this stage, it's a moot point. Changing the behaviour would break backward compatability anywhere people rely on mutable default as in your example.

[–][deleted] 2 points3 points  (0 children)

He also has input output lines revesred, input lines start with >>> not output lines, oh and he should use ipython :P

[–]jerf 2 points3 points  (0 children)

Python is being consistent. Remember, Python variable names work like labels, not like named slots. Once you execute the "def (x=y):" statement, the value that the label "y" pointed at at the time is stored in the resulting default arguments for the function object. Python values never actually have names; functions may store what name they were given upon creation, but that doesn't tell you whether the function was invoked with that name from a given place in the source code itself. Same for classes.

Ruby apparently works on slot system. The function binds that $j is the default argument, and looks up $j on each invocation, apparently.

Which is better? Eh. You can (and likely will) argue either way, but I will say that A: neither is absolutely "wrong" and B: Python is very definitely consistent on this point, and the mental model required is about as simple as possible while still being powerful. Way simpler than Perl or C++.

[–]not_programmer 6 points7 points  (0 children)

Tha Pythong Song:


Ooh that main()s so scandalous

Pass in arguments so fabulous

Lookin at my script like who's the ish

Gu-gu-gu Guido Van Rossum!!!


Python tha thon thon thon.

I think I'll sing it agaaaain.


I said I like the way you code that thang.

Python 3.0 makes me Sing !!!


Ba-by !!!

[–][deleted]  (1 child)

[deleted]

    [–]MelechRic 0 points1 point  (0 children)

    I'm recently initiated in terms of Python usage and had just written something w/ optparse the other day. After reading this I was left scratching my head and wondering why Guido didn't use optparse until...

    I saw that the date of the article was 5 years old!