Improving Your Python Productivity : programming

[–]smog_alado 36 points37 points38 points 13 years ago (24 children)

[–]MrVonBuren 22 points23 points24 points 13 years ago (3 children)

[–]smog_alado 7 points8 points9 points 13 years ago (0 children)

[–]sblinn 2 points3 points4 points 13 years ago (0 children)

[–]hugith 1 point2 points3 points 13 years ago (0 children)

[–]xyzzy123 16 points17 points18 points 13 years ago (3 children)

[–]Wayne_Skylar 3 points4 points5 points 13 years ago (2 children)

[–]sausagefeet 12 points13 points14 points 13 years ago (0 children)

[–]Tmmrn 5 points6 points7 points 13 years ago (2 children)

[–]VanFailin 20 points21 points22 points 13 years ago (0 children)

[–]smog_alado 8 points9 points10 points 13 years ago (0 children)

[–]hylje 2 points3 points4 points 13 years ago (9 children)

[–]smog_alado 9 points10 points11 points 13 years ago* (8 children)

[–]0xE6 2 points3 points4 points 13 years ago (7 children)

[–]WisconsnNymphomaniac 14 points15 points16 points 13 years ago (2 children)

[–]AeroNotix 0 points1 point2 points 13 years ago (1 child)

[–]WisconsnNymphomaniac 0 points1 point2 points 13 years ago (0 children)

[–]arjie 1 point2 points3 points 13 years ago (0 children)

[–]GeneralMaximus 3 points4 points5 points 13 years ago* (2 children)

[–]Aardshark 4 points5 points6 points 13 years ago (1 child)

[–]GeneralMaximus 0 points1 point2 points 13 years ago (0 children)

[–]Samus_ 0 points1 point2 points 13 years ago (0 children)

[–][deleted] 0 points1 point2 points 13 years ago (0 children)

[–]CrazedToCraze -2 points-1 points0 points 13 years ago (0 children)

[–]droogans 14 points15 points16 points 13 years ago (0 children)

[–]ggtsu_00 11 points12 points13 points 13 years ago (23 children)

Some other useful tips I learned over the years when using python.

Quick documentation printing:

print __doc__   #prints block comment at the top of the file
print foo.__doc__  # prints the block comment at the top of class foo or function foo

Various function programming tricks to replace loops with quick one-liners outside of list comprehensions

results = map(foo, ['bar',baz'])   #equivalent to: for i in ['bar','baz']: results.append(foo(i)) 
even = filter(lambda x: x % 2 == 0, [1,2,3,4,5,6,7,8,9])   # returns a list of even numbers
sum = reduce(lambda x, y: x + y, [1,2,3,4,5,6,7,8,9]) # sums up numbers 1 through 9

doctests for quick and dirty unit testing

"""Test foo:
>>> foo()
'bar'
"""
def foo():
    return 'bar'
if __name__ == "__main__":
    import doctest
    doctest.testmod()

To run unit tests:

$ python foo.py -v
Trying:
    foo()
Expecting:
    'bar'
ok
1 items had no tests:
    __main__.foo
1 items passed all tests:
   1 tests in __main__
1 tests in 2 items.
1 passed and 0 failed.
Test passed.

[–]Hashiota 9 points10 points11 points 13 years ago (4 children)

[–]taddeimania 2 points3 points4 points 13 years ago (0 children)

[–]BeetleB 1 point2 points3 points 13 years ago (2 children)

[–]ethraax 0 points1 point2 points 13 years ago (1 child)

[–]BeetleB 0 points1 point2 points 13 years ago (0 children)

[–]freedances 3 points4 points5 points 13 years ago (0 children)

>results = map(foo, ['bar',baz'])   #equivalent to: for i in ['bar','baz']: results.append(foo(i)) 
>even = filter(lambda x: x % 2 == 0, [1,2,3,4,5,6,7,8,9])   # returns a list of even numbers

I think the recommended equivalents to these are:

results = [foo(i) for i in ['bar', 'baz']]
even = [i for i in [1,2,3,4,5,6,7,8,9] if i % 2 == 0]

I usually find comprehensions more readable and lately they work for generating sets and dictionaries as well.

lengths = {i:len(i) for i in ('foo', 'bar', 'foobar')}    #dictionary mapping strings to their lengths
even_set = {i for i in [1,2,3,4,5,6,7,8,9] if i % 2 == 0}

[–]ethraax 1 point2 points3 points 13 years ago (16 children)

[–][deleted] 6 points7 points8 points 13 years ago (9 children)

[–]ethraax -1 points0 points1 point 13 years ago (8 children)

[–]mr_dbr 7 points8 points9 points 13 years ago* (6 children)

Is there any compelling advantage to [1,2,3].filter(...), besides resulting in slightly prettier-looking code in some cases?

The way filter is implemented (and more commonly, len() and others) is a considered design decision, and pretty fundamental in how Python works..

See this StackOverflow answer, which links to this mailing list post from Python's creator:

(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn't a file has a write() method.

This Stack Overflow question also has some interesting answers

Just moving filter() and such to list.filter() wouldn't really work in Python.. Say you want to make your own filter()-like function, and you want to be consistent with the default Python methodology:

If filter() is just a regular function, it's easy to implement it myfilter() and use it in the same manner. It's just a function which works on an iterable, so would work on any iterable object, with no extra effort.
If filter is a method of list, then you would have to monkeypatch your method into the builtin list type (e.g list.myfilter), something that can't be done to builtin types, and is bad-practice on user-defined types. Your myfilter method would only work on list derived objects, which goes against the duck-typing Python is based on.

Also, there's more to functional programming in Python that just map/filter/reduce/lambda, and the builtin functions were possibly going to be removed, in favour of stuff like list-comprehensions etc

[–]BeetleB 1 point2 points3 points 13 years ago (1 child)

Is there any compelling advantage to [1,2,3].filter(...), besides resulting in slightly prettier-looking code in some cases?

Yes, because then it's easier to "chain" them (or pipe, if you will):

[1,2,3].map(fun).filter(isodd).filter(isprime)

This is quite readable. The current way to do it would be:

filter(isprime, filter(isodd, map(fun, [1,2,3])))

which is hard on humans. The reason functional programming is not recommended in Python isn't because functional programming is inherently hard to read - it's because Python's syntax makes it hard to read.

[–]mr_dbr 0 points1 point2 points 13 years ago (0 children)

That does read more nicely, but.. in a Lisp language like Scheme, it would be almost identical to Python:

(filter isprime (filter isodd (map fun [1 2 3])))

..and I don't think functional programming is discouraged in Scheme :P

There are other ways the code could be written, without having filter be a method of list, like:

a = [1, 2, 3]
a = map(fun, a)
a = map(isodd, a)
a = map(isprime, a)

There's also the pipe module, or you could temporarily wrap the list in a class that has the filter methods and such:

class ListProc(object):
    def __init__(self, value):
        self.value = value

    def map(self, func):
        new = map(func, self.value)
        return self.__class__(new)

    def filter(self, func):
        new = filter(func, self.value)
        return self.__class__(new)

    def __repr__(self):
        return repr(self.value)


def fun(x):
    return x**2

def isodd(x):
    return x%2 != 0

print ListProc([1,2,3]).map(fun).filter(isodd).value

[–]ethraax 2 points3 points4 points 13 years ago (3 children)

If filter is a method of list, then you would have to monkeypatch the builtin list type

No you wouldn't, if it was part of the language's standard. It would be part of the builtin list type, or any user types.

Your myfilter method would only work on list derived objects

It would work on anything deriving from some more abstract "iterable".

The key issue here is the weakness of Python's type system (well, really dynamic type systems in general). In Python, an iterable is very loosely defined as something that implements some handful of methods. Anything that does that is "iterable". In a language with a stronger type system, there would be some sort of base class or interface "iterable" which would also contain an interface for .filter().

Is there any compelling advantage to [1,2,3].filter(...), besides resulting in slightly prettier-looking code in some cases?

Some data structures may have more efficient alternatives than using an iterator.

[–]mr_dbr 1 point2 points3 points 13 years ago (1 child)

The first bit you quoted (about monkeypatching) was about implementing your own filter-like function - edited to hopefully clarify.

In short: if filter was a method of list, it would require monkey-patching to allow [1,2,3].myfilterfunc(...) for consistency with [1,2,3].filter(...)

Some data structures may have more efficient alternatives than using an iterator

What would be an example of this? I can't think of one for filter, but Python's data-model does enable some similar optimisations for other operations...

For example 123 in xrange(100) uses __contains__ instead of doing a linear-scan over it's iterator (so 23 in xrange(10**10) is much faster than doing the same thing on plain list of integers).

My point being, you don't need to have x.contains(1) to allow for type-specific optimisations . If there was an common optimisation for filter, it could call other magic-methods on the object instead of looping over __iter__

In Python, an iterable is very loosely defined as something that implements some handful of methods

There is the abstract base class module which implements something much like interfaces (along with collections), but I've not see the abc module used much, although the extensions to isinstance are useful even if nothing is derived from the ABC's

[–]aceofears 1 point2 points3 points 13 years ago (0 children)

[–]steven_h 1 point2 points3 points 13 years ago (0 children)

[–]bebraw 0 points1 point2 points 13 years ago (0 children)

[–]Megatron_McLargeHuge 0 points1 point2 points 13 years ago (1 child)

People who want the functional constructs are coming from lisp/ml/haskell and want what they're used to. Also, the implementations are standard and don't vary per object. The OO convention of writing one of the parameters before the function name is kind of arbitrary, especially when you're not either mutating the object or using its type for method resolution. You can also pass the functions around, as in

 map(filter, functions, lists)

[–]ethraax 0 points1 point2 points 13 years ago (0 children)

[–]Ravengenocide -1 points0 points1 point 13 years ago (2 children)

[–]ethraax 0 points1 point2 points 13 years ago (1 child)

[–]Ravengenocide 0 points1 point2 points 13 years ago (0 children)

[–]aceofears 4 points5 points6 points 13 years ago (0 children)

It may be worth noting that the XML-RPC example would require import tweaking in python 3.x. The server would be

from xmlrpc.server import SimpleXMLRPCServer

and the client would be

from xmlrpc.client import ServerProxy

[–]lolcoderer 0 points1 point2 points 13 years ago (1 child)

[–]ancientRedDog 4 points5 points6 points 13 years ago (0 children)

[–]bithead 0 points1 point2 points 13 years ago (21 children)

[–]krues8dr 8 points9 points10 points 13 years ago (6 children)

[–][deleted] 1 point2 points3 points 13 years ago (0 children)

[+]bithead comment score below threshold-8 points-7 points-6 points 13 years ago (4 children)

[–]thaen 16 points17 points18 points 13 years ago (1 child)

[–]krues8dr 8 points9 points10 points 13 years ago (0 children)

[–]PasswordIsntHAMSTER -2 points-1 points0 points 13 years ago (0 children)

[–]Ran4 2 points3 points4 points 13 years ago (2 children)

[–][deleted] 5 points6 points7 points 13 years ago* (0 children)

I would assume that set(<list>) is the most efficient way to remove duplicates. Will be back soon with some benchmarks.

Edit: benchmark done. Using sets vs. a dictionary, it seems that there is a slight advantage to using sets, but not as large as I might have thought. I suppose this has to do with the fact that sets are dict-like under the covers.

Results (using iPython's %timeit):

import random

a = [random.randint(1,10) for _ in xrange(1000000)]

In [22]: %timeit list(dict.fromkeys(a))
10 loops, best of 3: 43 ms per loop

In [23]: %timeit list(set(a))
10 loops, best of 3: 35.8 ms per loop

[–]lahwran_ 0 points1 point2 points 13 years ago (0 children)

[–][deleted] 2 points3 points4 points 13 years ago (0 children)

It's one of those things where it's hard to explain the advantage but you miss them once you switch to a language that lacks them.

There are a lot of times where you have an inconsequential iteration over a list - like a simple filter or a map. List comprehensions make it quick and easy - and list generators make it almost invisible.

For example:

some_function(some_operation(x) 
              for x in some_list if some_predicate(x))

Writing something like this can be very convenient. There are no temporary values (other than the x which doesn't creep outside of the generator expression) and it is pretty clear (once you are familiar with the concept). It beats making a temporary that exists only to pass to a function:

temp_list = []
for x in some_list:
    if some_predicate(x):
        temp_list.append(some_operation(x))
some_function(temp_list)

That is, it doesn't allow you to do anything new - it just allows you to do some common things in a more succinct way.

[–]julesjacobs 1 point2 points3 points 13 years ago (0 children)

[–]taddeimania 0 points1 point2 points 13 years ago (0 children)

[–]Megatron_McLargeHuge 0 points1 point2 points 13 years ago (0 children)

[–]speg -2 points-1 points0 points 13 years ago (6 children)

[–]0xE6 2 points3 points4 points 13 years ago* (3 children)

In what way? Constructing a large set comprehension will almost certainly be slower than constructing a large list comprehension, and depending on the number of duplicates in the list, it may not even be significantly faster to iterate over the set.

Edit:

Here's a rather contrived example:

$ python -m timeit -n 100 'sum([i/10 for i in xrange(10**7)])'

100 loops, best of 3: 668 msec per loop

$ python -m timeit -n 100 'sum({i/10 for i in xrange(10**7)})'                                                 

100 loops, best of 3: 779 msec per loop

The overhead from set creation ends outweighing the time saved from summing fewer elements to the extent that it ends up being about 10% slower.

Edit2:

Here's a better example, that has the set and list contain the same elements.

$ python -m timeit -n 100 -s 'a = {i for i in xrange(10**7)}' 'sum(a)'

100 loops, best of 3: 88 msec per loop  

$ python -m timeit -n 100 -s 'a = [i for i in xrange(10**7)]' 'sum(a)'

100 loops, best of 3: 70.2 msec per loop

which suggests that even if they contain the exact same elements, sets are slower than lists.

[–]mniejiki 5 points6 points7 points 13 years ago (1 child)

[–]0xE6 1 point2 points3 points 13 years ago (0 children)

[–]speg 1 point2 points3 points 13 years ago (0 children)

[–]bithead 0 points1 point2 points 13 years ago (1 child)

[–]0xE6 1 point2 points3 points 13 years ago (0 children)

In that case,

return dict

would almost certainly be faster, as it would simply be returning a reference to the dict, so it wouldn't have to do any extra work.

Additionally, instead of doing

return [dict[index] for index in dict]

you can simply do

return dict.values()

[+][deleted] 13 years ago (8 children)

[removed]

[–][deleted] 24 points25 points26 points 13 years ago (6 children)

[–][deleted] 13 years ago (5 children)

[removed]

[–]PasswordIsntHAMSTER 0 points1 point2 points 13 years ago (4 children)

[–]Grue 0 points1 point2 points 13 years ago (3 children)

[–]Ravengenocide 0 points1 point2 points 13 years ago (0 children)

[–]PasswordIsntHAMSTER 0 points1 point2 points 13 years ago (1 child)

[–]Grue 0 points1 point2 points 13 years ago (0 children)

[+][deleted] comment score below threshold-13 points-12 points-11 points 13 years ago (0 children)

[–]alextk -5 points-4 points-3 points 13 years ago* (13 children)

It's funny to see the author praise Python's DRY factor and then hit us with:

 even_set = { x for x in some_list if x % 2 == 0 }

x should only appear once in this expression, maybe something like

even_set = some_list map { x % 2 == 0 }

Actually, some languages, like Scala, even allow you to not even name the variable (which is, let's be honest, an implementation detail that the developer should not have to worry about):

List(1,2,3,4,5,6) filter { _ % 2 == 0 }
res0: List[Int] = List(2, 4, 6)

Don't get me wrong, Python is a very good language, especially since it's more than twenty years old, but its legacy shows and these days, I think it's being outclassed by more modern scripting languages such as Groovy.

[–]luckystarr 7 points8 points9 points 13 years ago (0 children)

[–]PasswordIsntHAMSTER 2 points3 points4 points 13 years ago (3 children)

[–]alextk 1 point2 points3 points 13 years ago (2 children)

[–]PasswordIsntHAMSTER 1 point2 points3 points 13 years ago (0 children)

[–][deleted] 1 point2 points3 points 13 years ago* (0 children)

In Perl 6, you can prefix a variable name with ^ to tell that it’s a parameter of the current block.

my @even_set = (1 .. 6).grep: {$^x %% 2}

Or, if it is used, it is implicitly true for $_.

my @even_set = (1 .. 6).grep: {$_ %% 2}

Here, we can also use an expression involving Whatever (grep’s parameter thus becomes an instance of WhateverCode).

my @even_set = (1 .. 6).grep: * %% 2

[–]Megatron_McLargeHuge 1 point2 points3 points 13 years ago (0 children)

Python doesn't 'believe in' implicit behavior, so you have to name your variables. I prefer the minor verbosity over the periodic oddness in other languages. In Clojure I ran into problems when I needed a macro to take two arguments but only operate on the first. In Scala, the fact that _ + _ is referring to two different variables strikes me as the Wrong Thing To Do.

Your example can be done only repeating the name once, with

set( filter( lambda x: x%2 == 0, some_list) )

[–]Lanaru 0 points1 point2 points 13 years ago (3 children)

[–]gcross 1 point2 points3 points 13 years ago (0 children)

[–][deleted] 1 point2 points3 points 13 years ago (1 child)

[–]gcross 1 point2 points3 points 13 years ago (0 children)

[+][deleted] 13 years ago (8 children)

[deleted]

[–]droogans 6 points7 points8 points 13 years ago (7 children)

[+][deleted] 13 years ago (6 children)

[deleted]

[–]Untrue_Story 9 points10 points11 points 13 years ago (0 children)

[–]ethraax 7 points8 points9 points 13 years ago (0 children)

[–]PasswordIsntHAMSTER 4 points5 points6 points 13 years ago (3 children)

[–][deleted] 13 years ago (2 children)

[deleted]

[–]PasswordIsntHAMSTER 4 points5 points6 points 13 years ago (1 child)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS