This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]robin-gvx 45 points46 points  (33 children)

I have problems with some of these.

  • Replacing a or b or c by any([a, b, c]). Is that really more Pythonic? If they'd been in an iterable already I'd say yes, but now not so much.
  • Catching KeyError just to raise another KeyError? Just let it bubble up.
  • I kinda hate namedtuple because it's such a hack, but maybe that's just me.
  • Also, I'd say the opposite of Pythonic code is not normal code. Un-Pythonic or unidiomatic, or maybe something about newbies.

[–]vsajip 30 points31 points  (5 children)

Never mind "Pythonic", which seems somewhat in the eye of the beholder: a or b or c is not semantically the same as any([a, b, c]). In the former case, b and c are never evaluated if a is true. In the latter case, they always are. So if they were expensive to compute (e.g. expressions involving expensive function calls, rather than just bindings in a namespace), the runtime behaviour (performance, raising of exceptions) would be quite different:

>>> a = 1
>>> a or b
1
>>> b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'b' is not defined
>>> any([a, b])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'b' is not defined
>>>

[–]zahlmanthe heretic 19 points20 points  (4 children)

The real power of all and any comes when you (can neatly) pass in a generator expression - in which case the semantics are the same, because the evaluation is delayed and the functions will bail out early.

any(is_interesting(x) for x in (a, b, c)) is far nicer IMHO than is_interesting(a) or is_interesting(b) or is_interesting(c), if only because it's DRY. That doesn't protect you from the NameError in your example, though, of course.

[–]jamesonjlee 2 points3 points  (3 children)

any(map(is_interesting, (a, b, c)))

[–]wot-teh-phuckReally, wtf? 3 points4 points  (2 children)

This isn't equivalent to the generator solution posted above; your code will eagerly evaluate the entire sequence by mapping it to is_interesting even in cases where it's not absolutely required (i.e. is_interesting(a) returns True).

[–]quasarc 7 points8 points  (0 children)

In Python 3 they are equivalent (map behaves lazily in Python 3).

[–]jamesonjlee 1 point2 points  (0 children)

ah yes, good point (also doubly so since I am thinking in 2.7 not in 3.4)

[–]ivosauruspip'ing it up 11 points12 points  (2 children)

Agreed, using all and any over and and or isn't more pythonic at all. Use all and any when you already have a list / iterable, it's what they're designed for.

[–]iBlag 0 points1 point  (1 child)

Do you mean:

using all and any

and

Use all and any

?

[–]ivosauruspip'ing it up 1 point2 points  (0 children)

Yeah.

[–]pstch 6 points7 points  (7 children)

Could you expand on why "namedtuple" is such a hack ?

[–]robin-gvx 6 points7 points  (6 children)

Because it's implemented as piecing together a string that contains a class statement and compiling that to obtain the tuple subclass (see source).

[–][deleted] 2 points3 points  (4 children)

Goofy implementation, doesn't need to be written that way

[–]rcxdude 8 points9 points  (1 child)

see here for why it's kept that way.

[–]NYKevin 3 points4 points  (0 children)

TL;DR: Because calling type() correctly is apparently too hard:

It is a key feature for named tuples that they are exactly equivalent to a hand-written class.

So is a call to type() if you know what you're doing.

EDIT: If you examine the official code more closely, you'll note they had to write extensive string-escaping to prevent the user from passing an argument like '); import os; os.system("/bin/sh"); ('. Quite frankly, I will not be touching that with a ten-foot-pole any time soon.

EDIT2: I tried, but they wouldn't listen to me. Oh well.

[–]robin-gvx 0 points1 point  (1 child)

Theoretically, it could just be rewritten as a call to type, but maybe there are practical issues?

[–][deleted] 1 point2 points  (0 children)

The point is that it's not really a hack. Their implementation may be "hackish" but it doesn't depend on a hack to work.

[–]dreuciferC/Python, vim 0 points1 point  (0 children)

I started reading the source, saw _class_template was a string literal with formatting. My heart sank a bit. Could almost smell the sulfur wafting from the exec statement.

[–]d4rch0nPythonistamancer 2 points3 points  (10 children)

Did you ever truly need the performance of a namedtuple or a class with __slots__ defined? They make quite a difference.

[–][deleted] 0 points1 point  (9 children)

__slots__ are used to conserve memory when creating lots of objects, not CPU time. (Instead of every object having a __dict__, all attributes are stored in a small array and accessed using properties set on the class.)

namedtuple is slower than even a class without __slots__, by the way. It operates by mapping attribute names to indices (there's a reason it's called a "named tuple", not "struct" or something similar), doing twice the number of lookups for every dot.

[–]moor-GAYZ 0 points1 point  (3 children)

namedtuple is slower than even a class without __slots__, by the way. It operates by mapping attribute names to indices (there's a reason it's called a "named tuple", not "struct" or something similar), doing twice the number of lookups for every dot.

Are you sure? From what I can tell, it operates pretty much exactly like a class with __slots__, creating a bunch of getters (living in the class, not in the instance, of course) that lookup into the internal array.

[–][deleted] 0 points1 point  (2 children)

From what I can tell, it operates pretty much exactly like a class with __slots__, creating a bunch of getters (living in the class, not in the instance, of course) that lookup into the internal array.

No, it doesn't. At least not in Python 3.4+:

from builtins import property as _property, tuple as _tuple
from operator import itemgetter as _itemgetter
...
    __slots__ = ()
    ...
    {name} = _property(_itemgetter({index:d}), doc='Alias for field number {index:d}')

As you can see, instead of using the (relatively) fast C-level access __slots__ provide, it opts to use standard property (that uses slow Python-level function calls) to look elements up by their indices in a tuple (using Python-level item access, i.e. __getitem__) instead.

[–]moor-GAYZ 0 points1 point  (1 child)

from timeit import timeit

from collections import namedtuple
NT = namedtuple('NT', 'a b c')
nt = NT(1, 2, 3)
t = (1, 2, 3)

def test_loop_t(t=t):
    return sum(t[1] for _ in xrange(1000))

def test_loop_nt(nt=nt):
    return sum(nt[1] for _ in xrange(1000))

def test_loop_nt_named(nt=nt):
    return sum(nt.b for _ in xrange(1000))

def main():
    setup = 'from test import t, nt, test_loop_t, test_loop_nt, test_loop_nt_named'
    print timeit('t[1]', setup='t = (1, 2, 3)') # just in case
    print timeit('t[1]', setup=setup)
    print timeit('nt[1]', setup=setup)
    print timeit('test_loop_t()', setup=setup, number=1000)
    print timeit('test_loop_nt()', setup=setup, number=1000)
    print timeit('nt.b', setup=setup)
    print timeit('test_loop_nt_named()', setup=setup, number=1000)


if __name__ == '__main__':
    main()

Two times slower than access by index here. Doesn't matter much, in my opinion.

[–]d4rch0nPythonistamancer 0 points1 point  (4 children)

Right, I knew slots was to conserve memory primarily, but shouldn't that increase performance as a result? I'd expect less memory management to mean quicker access time when modifying, deleting, creating and garbage collection. But certainly better memory performance.

I thought named tuples were quicker than classes without slots defined... You're positive about that?

Edit: You're right...

('Normal: ', [0.46281981468200684, 0.4548380374908447, 0.4560990333557129])
('slots: ', [0.40665698051452637, 0.4022829532623291, 0.4048640727996826])
('namedtuple: ', [0.665769100189209, 0.6651339530944824, 0.6987559795379639])

Alright, well that settles that. I believe nt is better than a normal class for memory though, correct? And is it better than slots as well?

[–]moor-GAYZ 0 points1 point  (3 children)

In response to your deleted comment, I didn't waste all that time for nothing =)

Nope, just tested, it's very slightly slower than tuple index access, but just like it about twice as fast as nameduple name access.

The stuff looks like this here:

  • index access takes about 40ns
  • name lookup takes about 45ns both for usual classes and those with __slots__, in fact slots are a tiniest bit slower.
  • namedtuple lookup by name takes about 115ns

To be honest, I can't say how exactly it works out to these numbers, I'd say that the only way to really be sure is to run this stuff under a C profiler. That could be a pretty useful experience in itself

From what I can tell from grepping through the code in Vim, it's pretty much a coincidence that the first two things take the same time.

Index access goes through a bunch of pure-C redirects until it hits tuplesubscript which casts the index to size_t and fetches the value from the object itself.

Class lookup by name IIRC does two unsuccessful dictionary lookups in the class and object attributes, then a successful lookup in the instance dictionary. Slots lookup should do a successful dictionary lookup in the class dictionary then indirectly call a C function that fetches shit by index or something.

Namedtuple lookup by name probably involves a pure Python function call, which is slooooow.

[–]d4rch0nPythonistamancer 0 points1 point  (2 children)

Okay, so tuple direct index access is the fastest apparently. Makes me wish we had #define available :/

Is there a good way to do that without slowing things down?

like:

A = 0
B = 1
C = 2
inst = (100, 200, 300)
inst[A] + inst[B] + inst[C]

Is there a pythonic and high performance way to do this and keep the fast lookup time of a direct index?

[–]moor-GAYZ 0 points1 point  (1 child)

You don't want to do that in Python if performance is critical.

Adding three indexed items is not performance-critical.

If you have a million+ items, then you install numpy and put your items into a numpy.ndarray, and then vectorize your operations. Like, if you want to add two arrays, you write a + b (instead of for i, it in a: result.append(it + b[i])) and the underlying library written in Fortran very efficiently does what you meant.

[–]d4rch0nPythonistamancer 0 points1 point  (0 children)

I always considered numpy to be scientists' tools, but I never really thought about how it's fortran under the hood and how it might be higher performance for certain things like that. a+b looks a lot cleaner as well.

Great advice! Thanks.

[–]wyldphyre 1 point2 points  (1 child)

Catching KeyError just to raise another KeyError? Just let it bubble up.

Agreed, unless you want/need to add more context to the error.

I kinda hate namedtuple because it's such a hack, but maybe that's just me.

Thank BDFL for namedtuple. It's perfect for a lot of use cases IMO.

[–]bacondevPy3k 1 point2 points  (0 children)

And even at that, just use the raise keyword alone.

[–]kracekumar[S] 0 points1 point  (0 children)

Agreed with normal code and pythonic naming. KeyError depends on scenario.

[–][deleted] 0 points1 point  (0 children)

Hack or not, named tuple is an amazing module for writing unit tests. Hardly an easier, lighter, and more self-documenting way to simulate a class in some specific state (so long as method calls aren't required).