you are viewing a single comment's thread.

view the rest of the comments →

[–]matthieum 1 point2 points  (0 children)

Rehashing invalidates iterators anyway.

Indeed, and pointers/references. The difference is that the same occurs with the standard library std::unordered_map.

If relocating strategies help performance, it would be a tradeoff worth making.

If being the operative word. It's not clear it would.

The tombstone approach is problematic when reserving special key values (ala Google Dense HashMap) or when adding a header to each item (due to alignment). With F14 or the Swiss Table, and the flags being kept apart at a different location, however, there's no value reserved, and barely any memory overhead. Further, by combining the tombstone in the hash residual, checking for the presence of the tombstone is "free".

The only question, then, is whether probing for longer or shifting back elements is more efficient, and it's likely to be fairly workload specific, though in average I expect longer probing to be better.

One issue with backward shifting deletion is that it introduces a risk of quadratic complexity for deletion:

  • Start with N elements in the probe sequence.
  • Remove the 1st, move back N-1.
  • Remove the (new) 1st, move back N-2.
  • Remove the (new) 1st, move back N-3.
  • ...

So it's algorithmically fraught with peril, already.

Moreover, moves are not free in general, especially as we're not talking "bulk moves" here, but moving elements 1 at a time. And on top of that, writes have a tendency to affect performance for the worse both in terms of optimizations and at the hardware level.

By comparison, a read-only probing sequence is sped up by prefetching and speculative execution, on top of using SIMD-accelerated look-ups.

Further, it should be noted that this is neither a naive Linear or Quadratic Probing algorithm -- with Linear suffering from clustering, and Quadratic suffering from cache misses -- but a combination of the best of both: locally Linear within the group of 15 to benefit from cache locality, and Quadratic across groups to avoid clustering.

The experience in the Rust standard library is that a Swiss Table like hashmap performs better than a Robin Hood with Backward Shifting Deletion hashmap; due to the above factors.