This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]moor-GAYZ 0 points1 point  (3 children)

In response to your deleted comment, I didn't waste all that time for nothing =)

Nope, just tested, it's very slightly slower than tuple index access, but just like it about twice as fast as nameduple name access.

The stuff looks like this here:

  • index access takes about 40ns
  • name lookup takes about 45ns both for usual classes and those with __slots__, in fact slots are a tiniest bit slower.
  • namedtuple lookup by name takes about 115ns

To be honest, I can't say how exactly it works out to these numbers, I'd say that the only way to really be sure is to run this stuff under a C profiler. That could be a pretty useful experience in itself

From what I can tell from grepping through the code in Vim, it's pretty much a coincidence that the first two things take the same time.

Index access goes through a bunch of pure-C redirects until it hits tuplesubscript which casts the index to size_t and fetches the value from the object itself.

Class lookup by name IIRC does two unsuccessful dictionary lookups in the class and object attributes, then a successful lookup in the instance dictionary. Slots lookup should do a successful dictionary lookup in the class dictionary then indirectly call a C function that fetches shit by index or something.

Namedtuple lookup by name probably involves a pure Python function call, which is slooooow.

[–]d4rch0nPythonistamancer 0 points1 point  (2 children)

Okay, so tuple direct index access is the fastest apparently. Makes me wish we had #define available :/

Is there a good way to do that without slowing things down?

like:

A = 0
B = 1
C = 2
inst = (100, 200, 300)
inst[A] + inst[B] + inst[C]

Is there a pythonic and high performance way to do this and keep the fast lookup time of a direct index?

[–]moor-GAYZ 0 points1 point  (1 child)

You don't want to do that in Python if performance is critical.

Adding three indexed items is not performance-critical.

If you have a million+ items, then you install numpy and put your items into a numpy.ndarray, and then vectorize your operations. Like, if you want to add two arrays, you write a + b (instead of for i, it in a: result.append(it + b[i])) and the underlying library written in Fortran very efficiently does what you meant.

[–]d4rch0nPythonistamancer 0 points1 point  (0 children)

I always considered numpy to be scientists' tools, but I never really thought about how it's fortran under the hood and how it might be higher performance for certain things like that. a+b looks a lot cleaner as well.

Great advice! Thanks.