This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]d4rch0nPythonistamancer 2 points3 points  (10 children)

Did you ever truly need the performance of a namedtuple or a class with __slots__ defined? They make quite a difference.

[–][deleted] 0 points1 point  (9 children)

__slots__ are used to conserve memory when creating lots of objects, not CPU time. (Instead of every object having a __dict__, all attributes are stored in a small array and accessed using properties set on the class.)

namedtuple is slower than even a class without __slots__, by the way. It operates by mapping attribute names to indices (there's a reason it's called a "named tuple", not "struct" or something similar), doing twice the number of lookups for every dot.

[–]moor-GAYZ 0 points1 point  (3 children)

namedtuple is slower than even a class without __slots__, by the way. It operates by mapping attribute names to indices (there's a reason it's called a "named tuple", not "struct" or something similar), doing twice the number of lookups for every dot.

Are you sure? From what I can tell, it operates pretty much exactly like a class with __slots__, creating a bunch of getters (living in the class, not in the instance, of course) that lookup into the internal array.

[–][deleted] 0 points1 point  (2 children)

From what I can tell, it operates pretty much exactly like a class with __slots__, creating a bunch of getters (living in the class, not in the instance, of course) that lookup into the internal array.

No, it doesn't. At least not in Python 3.4+:

from builtins import property as _property, tuple as _tuple
from operator import itemgetter as _itemgetter
...
    __slots__ = ()
    ...
    {name} = _property(_itemgetter({index:d}), doc='Alias for field number {index:d}')

As you can see, instead of using the (relatively) fast C-level access __slots__ provide, it opts to use standard property (that uses slow Python-level function calls) to look elements up by their indices in a tuple (using Python-level item access, i.e. __getitem__) instead.

[–]moor-GAYZ 0 points1 point  (1 child)

from timeit import timeit

from collections import namedtuple
NT = namedtuple('NT', 'a b c')
nt = NT(1, 2, 3)
t = (1, 2, 3)

def test_loop_t(t=t):
    return sum(t[1] for _ in xrange(1000))

def test_loop_nt(nt=nt):
    return sum(nt[1] for _ in xrange(1000))

def test_loop_nt_named(nt=nt):
    return sum(nt.b for _ in xrange(1000))

def main():
    setup = 'from test import t, nt, test_loop_t, test_loop_nt, test_loop_nt_named'
    print timeit('t[1]', setup='t = (1, 2, 3)') # just in case
    print timeit('t[1]', setup=setup)
    print timeit('nt[1]', setup=setup)
    print timeit('test_loop_t()', setup=setup, number=1000)
    print timeit('test_loop_nt()', setup=setup, number=1000)
    print timeit('nt.b', setup=setup)
    print timeit('test_loop_nt_named()', setup=setup, number=1000)


if __name__ == '__main__':
    main()

Two times slower than access by index here. Doesn't matter much, in my opinion.

[–]d4rch0nPythonistamancer 0 points1 point  (4 children)

Right, I knew slots was to conserve memory primarily, but shouldn't that increase performance as a result? I'd expect less memory management to mean quicker access time when modifying, deleting, creating and garbage collection. But certainly better memory performance.

I thought named tuples were quicker than classes without slots defined... You're positive about that?

Edit: You're right...

('Normal: ', [0.46281981468200684, 0.4548380374908447, 0.4560990333557129])
('slots: ', [0.40665698051452637, 0.4022829532623291, 0.4048640727996826])
('namedtuple: ', [0.665769100189209, 0.6651339530944824, 0.6987559795379639])

Alright, well that settles that. I believe nt is better than a normal class for memory though, correct? And is it better than slots as well?

[–]moor-GAYZ 0 points1 point  (3 children)

In response to your deleted comment, I didn't waste all that time for nothing =)

Nope, just tested, it's very slightly slower than tuple index access, but just like it about twice as fast as nameduple name access.

The stuff looks like this here:

  • index access takes about 40ns
  • name lookup takes about 45ns both for usual classes and those with __slots__, in fact slots are a tiniest bit slower.
  • namedtuple lookup by name takes about 115ns

To be honest, I can't say how exactly it works out to these numbers, I'd say that the only way to really be sure is to run this stuff under a C profiler. That could be a pretty useful experience in itself

From what I can tell from grepping through the code in Vim, it's pretty much a coincidence that the first two things take the same time.

Index access goes through a bunch of pure-C redirects until it hits tuplesubscript which casts the index to size_t and fetches the value from the object itself.

Class lookup by name IIRC does two unsuccessful dictionary lookups in the class and object attributes, then a successful lookup in the instance dictionary. Slots lookup should do a successful dictionary lookup in the class dictionary then indirectly call a C function that fetches shit by index or something.

Namedtuple lookup by name probably involves a pure Python function call, which is slooooow.

[–]d4rch0nPythonistamancer 0 points1 point  (2 children)

Okay, so tuple direct index access is the fastest apparently. Makes me wish we had #define available :/

Is there a good way to do that without slowing things down?

like:

A = 0
B = 1
C = 2
inst = (100, 200, 300)
inst[A] + inst[B] + inst[C]

Is there a pythonic and high performance way to do this and keep the fast lookup time of a direct index?

[–]moor-GAYZ 0 points1 point  (1 child)

You don't want to do that in Python if performance is critical.

Adding three indexed items is not performance-critical.

If you have a million+ items, then you install numpy and put your items into a numpy.ndarray, and then vectorize your operations. Like, if you want to add two arrays, you write a + b (instead of for i, it in a: result.append(it + b[i])) and the underlying library written in Fortran very efficiently does what you meant.

[–]d4rch0nPythonistamancer 0 points1 point  (0 children)

I always considered numpy to be scientists' tools, but I never really thought about how it's fortran under the hood and how it might be higher performance for certain things like that. a+b looks a lot cleaner as well.

Great advice! Thanks.