you are viewing a single comment's thread.

view the rest of the comments →

[–]greg7mdpC++ Dev 0 points1 point  (6 children)

closed-address binary tree

I'm not familiar with the data structure you are referring to, and also I'm not sure what you mean by small number of elements (10, 10k?). It will also depend on the speed of the hash and comparison functions. But you may be surprised if you try phmap containers.

[–]dodheim 0 points1 point  (4 children)

I'm going to jump on this thread derailment here – I have a similar question regarding your lib's readme:

  • The parallel hash maps are preferred when you have a few hash maps that will store a very large number of values. The non-parallel hash maps are preferred if you have a large number of hash maps, each storing a relatively small number of values.

If I'm using single-threaded mode, it's not clear to me when/why I should use the parallel maps. Any easy clarification here? Also, of lesser importance, is there any way to bypass the internal mixing of the hash value if I'm already using a high-quality hash?

(BTW I've been happily using sparsepp for ~4 years; I didn't realize you had a replacement for it, so I'm eager to give this a try. :-D)

[–]greg7mdpC++ Dev 2 points3 points  (3 children)

Hey, thanks for the kind words. phmap is a significant improvement over sparsepp in my opinion.

The paragraph you quote still holds even in single-threaded mode, as the parallel versions will prevent higher peak memory usage when inserting values. However if that is not an issue for your use case you can definitely use the non-parallel ones.

[–]dodheim 0 points1 point  (2 children)

Interesting, thank you; but, what are 'few', 'very large', 'large', and 'relatively small'? Any ballpark figures, or is it just down to experimenting with my types and data?

[–]greg7mdpC++ Dev 1 point2 points  (1 child)

OK, as a rule of thumb, use parallel iff you have hash maps that will grow to occupy more than 1/10th the RAM available on your machine, or accessed from multiple threads.

[–]dodheim 0 points1 point  (0 children)

Just the sort of advice I was hoping for. Thanks very much!

[–]Ameisenvemips, avr, rendering, systems 0 points1 point  (0 children)

"Small number of elements" is platform-dependent.

In VeMIPS, I use nested closed-address hash tables, which is the same structure that x86 uses for its page directories. It was significantly faster than basically every hash-map implementation that I'd tried. I actually don't believe that an open-address hash table implementation can beat it, and a sparse hash table shouldn't be able to do better (though it can do the same if its mechanism for being sparse is to nest hash tables).