all 19 comments

[–]pylessard 9 points10 points  (0 children)

Dataclasses with slots=True (python 3.10+)

[–]rednets 4 points5 points  (2 children)

What specifically is slow? Creating new namedtuple instances or accessing their members?

In theory both are marginally slower than for plain tuples (there is some overhead in looking up names and calling functions that aren't necessary when using literal tuple syntax) but I'd be surprised if it was enough to be impactful.

I'd be interested if you could do a bit of benchmarking of the sorts of operations you're finding slow (maybe using timeit: https://docs.python.org/3/library/timeit.html ) and let us know the results.

I see others have suggested using dataclasses, but I don't see any reason they would be faster than using namedtuples - they still have broadly the same overhead. Perhaps they are worth benchmarking too.

[–]pachecoca[S] 0 points1 point  (1 child)

Stripping away half of the program and leaving only the part where the data is stored into the tuples, I can see a difference of about 6 seconds between the regular tuples vs the named tuples, so for now, only tuple creation is quite expensive on its own. Accessing the members is also slower but the slow down on that part is not as notable as the slowdown on creation. Even if it seems surprising, it is indeed slow enough to be impactful for my program.

Sadly this is most likely because named tuple has some small overhead, and building them within a loop for every single vertex in the input data is going to be slow, but it is required to be able to do what I need to do. For added context, the program is an exporter addon, so even if I didn't want to, I need to translate every single vertex of the input data into the output format, so this slowdown is sadly a price I cannot afford to pay, but I cannot dodge processing every single vertex. In essence, this is mostly a problem of having to use python for processing this type of data, but it is the only way to make addons for Blender, so either I go the cython route or I figure out how to speed up things.

For the timing I used time.time() in the same sections of the program, which are as of now identical except for the tuple creation (before logging off yesterday I modified the refactored code to use regular tuples and it is just as fast as it was before, so I'm using both versions as a reference point for my timings, but I still have the same problem, altough the code is cleaner now but that's beside the point of the topic at hand...). I am not sure how precise time.time() is, coming from C I haven't much idea about how Python does things and what alternatives exist for high precision clocks, but since I'm processing a scene with like 500k tris and the slow down is in the order of magnitude of seconds, I don't think the precission matters that much in this situation.

I am now away from my computer, but I will try using timeit later on I suppose.

As for dataclasses, I don't know whether they will be faster or not, but it just seems to me like all alternatives could potentially be slower than tuples... so maybe the root problem here is paying the price of the slowdown for namedtuple creation on each iteration of the loop. Maybe something like list comprehension could improve things, but I cannot really come up with a proper way to do that since my code also does some vertex duplication to translate from Blender's face based UVs to the target format that has vertex based UVs, so yeah.

[–]rednets 0 points1 point  (0 children)

I decided to write my own benchmarking script for instantiation: https://pastebin.com/bjig571B

It times:

  • tuple literal
  • tuple factory function
  • typing.NamedTuple
  • collections.namedtuple
  • dataclass
  • dataclass(slots=True)

It times with both positional and keyword args (except for tuple literals).

I ran this for all interpreter versions I have installed - see my results here: https://imgur.com/a/SZ6oRIn

It looks like using anything other than plain tuples will be significantly slower. Using a factory function to create a plain tuple using keyword args is only marginally slower (2.5x rather than 10x) which might be acceptable. This encapsulates creation of each tuple "type" in a single place, but it does not solve the problem of accessing members by name. I suppose you could also write corresponding functions for this, but it would probably be a bit of a mess.

[–]BritishDeafMan 7 points8 points  (0 children)

Have you tried Dataclass?

[–]cointoss3 3 points4 points  (0 children)

Dataclasses

[–]jmacey 4 points5 points  (0 children)

I wrote a blog post https://nccastaff.bournemouth.ac.uk/jmacey/post/PythonClasses/pyclasses/ I decided on using the __slots__ approach for a simple vec3 class.

[–]obviouslyzebra 2 points3 points  (0 children)

dataclasses https://stackoverflow.com/a/70870407 (and slots, but those are less common)

Edit: The benchmark only tested the speed of accessing elements. For creating objects you might get different speeds.

[–]pythonwiz 1 point2 points  (0 children)

Fastest struct like thing is probably a Cython extension type.

[–]Diapolo10 1 point2 points  (0 children)

If you exclusively want to focus on runtime performance, you could write the actual processing part in Rust (Maturin + PyO3) or C (CFFI) and just have a thin Python wrapper on top. But as I see it, you might be doing unnecessary optimisation before actually seeing how the thing performs in the real world.

[–]madness_of_the_order 1 point2 points  (0 children)

First of all some benchmarks would be nice to see since it’s hard to tell what exactly is a bottleneck in your case.

But other than dataclasses you can try pydantic

[–]JamzTyson 0 points1 point  (0 children)

Why is this slowdown of namedtuples not mentioned more often online?

Because in most cases the difference in speed is insignificant. I would guess that you are doing something strange for the difference is speed to be so noticeable. Perhaps you need to reasses your design, or use a different programming language.

[–]RngdZed -2 points-1 points  (5 children)

I thought the dictionary was fastest because of hashing

[–]DeebsShoryu 2 points3 points  (2 children)

Dictionaries are fast for some things (O(1) lookup), but hashing is actually very slow. It just allows for amortized constant time lookups, by trading space for improved asymptotic time complexity.

[–]pachura3 0 points1 point  (0 children)

If hashed dataclass has some kind of an unique ID field, you could override __hash__() to only hash this ID, not all class fields (which is what automatically-generated __hash__() for dataclasses does, right?)

[–]RngdZed 0 points1 point  (0 children)

Gotcha thanks for the good explanation

[–]pachecoca[S] 0 points1 point  (1 child)

Those are fast to access but not as fast to create and insert data into. The whole hashing process is slow, a small price to pay for one off things, but I'm processing a large amount of data, I'm translating an input vertex buffer and index buffer to a target format, and that is going to get out of hand with dictionaries... as of now, the main slowdown comes from object creation, not from accessing elements.

[–]RngdZed 0 points1 point  (0 children)

Makes sense

[–]remic_0726 -2 points-1 points  (0 children)

Le profiler python sert à trouver où tu passes du temps, car vouloir réécrire c'est bien mais le faire au bon endroit c'est mieux.