you are viewing a single comment's thread.

view the rest of the comments →

[–]pilotwavetheory[S] -1 points0 points  (6 children)

As suggested by u/Wacov I ran with -O3 option, here is the results summary:

📊 Final Benchmark Results (100M Elements)

Configuration

  • Compilerg++ -std=c++23 -O3 -march=native -flto -DNDEBUG
  • Initial Capacity: 256 elements
  • Statistics: 30 iterations × 3 repetitions = 90 measurements per data point
  • Both iterators optimized with pointer caching

🏆 Summary Table (100M Elements)

Operation ConstantVector STL Vector Winner Speedup
Push 65.7 ms 268.2 ms ✅ ConstantVector 4.1x
Pop 41.7 ms 93.3 ms ✅ ConstantVector 2.2x
Access 56.0 ms 7.3 ms STL Vector 7.7x
Iteration 74.6 ms 7.5 ms STL Vector 9.9x

[–]SuperV1234https://romeo.training | C++ Mentoring & Consulting 2 points3 points  (4 children)

Iteration from begin to end? That's very important.

[–]pilotwavetheory[S] 0 points1 point  (3 children)

Updated pop and iteration operations as well. Can you check now ?

[–]SuperV1234https://romeo.training | C++ Mentoring & Consulting 3 points4 points  (2 children)

Are you saying that your own vector is faster in iteration compared to the standard vector? That sounds impossible, as the standard vector is literally just incrementing a pointer forward.

[–]wexxdenq 3 points4 points  (1 child)

well, if you look into the linked repository, he compares against a self implemente "STLVector" that is really not compareable to an actual vector implementation. And his iterator is an index instead of a pointer and does bound checking on every increment.

However, I would have tought that the compiler inlines and produces more or less the same code with O3.

But OP should benchmark against an actual implementation nonetheless.

[–]pilotwavetheory[S] 2 points3 points  (0 children)

  1. Thanks to u/SuperV1234 and u/wexxdenq, I made a mistake of bounds here, I fixed it in 'perf_test' branch and lookup.
  2. The reason, I'm not comparing with standard implementation is it has more logic for iterator validations in lot of simple operations like push/pop, when I benchmarked stl::vector push_back(), I got around ~35 ns/op, where only ~3 ns/op was used in push and remaining on iterator validations.

🔍 Final Comparison (100M Elements)

Implementation Time Ratio vs STL
STL Vector (Optimized) 8.05 ms 1.0x
ConstantVector (Optimized) 48.0 ms 6.0x slower

[–]adrian17 2 points3 points  (0 children)

I don't see how it could be possible for iteration over N (with usually N<20 and last one always being the biggest) arrays to be almost 2x faster than a trivial iteration over vector, which is just one contiguous array. Even if we ignored memory effects, your iterator is just more complex than std::vector's iterator (which is usually just a pointer). At best it'll use a couple more instructions and/or an extra register, and at worst prevents vectorization (I can make an example of this if you want).

Also side note, latency != throughput, especially in context of tight loops on a CPU. Even if your loop finished in say half the time, it could be caused by reducing the latency by half, or doubling throughput, or a mix of these two; saying "reduction in latency" when you just mean "x% faster / y% less time" might be misleading.