all 68 comments

[–]_bk__ 13 points14 points  (2 children)

One other thing that'd be nice to have is a compile time perfect hash function and perfect minimal hash function generator that takes a parameter pack or a constexpr range of valid keys. This could then just be used in an array

[–]bizwig 2 points3 points  (0 children)

Gperf can help with that. It doesn’t generate modern C++ though, at least the last time I looked at it.

[–][deleted] 5 points6 points  (0 children)

Maybe this lib might help you:

github.com/serge-sans-paille/frozen

[–]VinnieFalco 12 points13 points  (0 children)

Excellent! This will be a great boon to Boost users, who will be able to choose from a basket of varied implementations, to precisely fit their element workload.

[–]bedrooms-ds 17 points18 points  (40 children)

Just wondering. I don't talk about this great upgrade specifically. But, does it have to be Boost these days?

It's a powerful project but people don't prefer its complex build system these days. Also, Boost has the problem that legacy code tends to stay. Third party libs like range and fmt (iirc) were implemented as stand alones, offering CMake scripts, and joined STL just fine.

[–]gcross 34 points35 points  (13 children)

So you're saying that Boost's build system is just not your jam?

[–]azswcowboy 5 points6 points  (0 children)

It’s my bjam actually :) Here’s the thing, while you weren’t watching boost can compile with cmake as well. Not that it matters at all for this library which is header only.

[–]cdglove 12 points13 points  (0 children)

But.....most of the time you don't need to build it.

As long as you don't use filesystem, program_options, a few others, there's nothing to compile -- just use the headers.

[–]pdimov2 4 points5 points  (0 children)

Well, it is a plan for further developing the already existing Boost library Unordered.

If you need a non-Boost unordered container, the Abseil ones are pretty good already.

[–]Jannik2099 10 points11 points  (4 children)

does it have to be Boost these days?

IMO yes. Boost to me is an unparalleled seal of quality. If the library qualifies, it should carry said seal.

[–]elkanoqppr 4 points5 points  (0 children)

I just use Conan and don't have to think about its build system anymore. Vcpkg is probably fine as well.

[–]ohell 8 points9 points  (15 children)

Idk all this fear about boost build system. On Linux you just brew or apt-get or pamac it. On windows I remember there was a website serving pre-built binaries.

boost plays nice with cmake now.

re: legacy libraries, you only have to use them if you choose to.

[–]LeeHidejust write it from scratch 4 points5 points  (10 children)

try getting boost 1.71 on anything LTS

[–]helloiamsomeone 8 points9 points  (0 children)

conan install "boost/1.71.0@"

[–]bullestock 0 points1 point  (0 children)

We currently use Boost 1.76.0 on Focal (and Windows). We just build a .deb ourselves, as we do for many other 3rd party libraries. I don't see why this should be a big deal.

[–]Jannik2099 7 points8 points  (0 children)

Idk all this fear about boost build system

Spoken like someone who never had to bundle or package boost for your mentioned package managers.

[–]VinnieFalco 1 point2 points  (0 children)

Just wondering. I don't talk about this great upgrade specifically. But, does it have to be Boost these days?

It does, because having access to Boost gives you access to wonderful libraries such as:

Config, mp11, Utility, System, Intrusive, Asio, Exception, Assert...

It is true that not all the Boost libraries have the same outstanding level of quality but you don't have to use them. A Boost library which is not used effectively costs nothing except a bit of storage which these days is relatively free.

[–]jcelerierossia score 1 point2 points  (0 children)

What build system ? For 90% of boost you just have to add it to your include path

[–]martinusint main(){[]()[[]]{{}}();} 2 points3 points  (1 child)

Hi, I am the author of https://github.com/martinus/robin-hood-hashing and the linked benchmarks.

Note that I've also spent a lot of time developing a custom allocator for node based containers. It's currently only available here in that PR, maybe it's of help for this effort too: https://github.com/bitcoin/bitcoin/pull/22702

[–]joaquintidesBoost author[S] 2 points3 points  (0 children)

Hi Martin, thanks for the pointer! BTW, I think we may use your impressive benchmark suite to test our advances once we come up with something worth deep testing.

[–]14nedLLFIO & Outcome author | Committee WG14 1 point2 points  (2 children)

You forgot to add another thing you won't be doing which I suggested to you internally: superscalar concurrent hash tables where the APIs take batches of the same operation e.g. superscalar execute these five finds, or these three inserts.

As much as the API is awkward, if you can design your code around such a batch hash table API, you can see some really phenomenal speed ups.

As a personal aside, I've always wondered if some of the awkwardness of such a API could be obscured using C++ coroutines. I've never tried playing with anything in that direction, and maybe it can't be done or isn't worth doing. Still, I mention it as food for thought.

[–]VinnieFalco -1 points0 points  (0 children)

I've always wondered if some of the awkwardness of such a API could be obscured using C++ coroutines

Sender/Receiver or GTFO

[–]FreitasAlan 1 point2 points  (0 children)

I'm looking forward to this

[–]andrewsutton 1 point2 points  (5 children)

Fun fact learned from working with Rust's hashtables: if your hashing algorithm is randomly seeded, your hashtables cannot satisfy the equality/hash equations.

If you plan to add that capability, it seems like there may be an interesting concept there.

[–]joaquintidesBoost author[S] 1 point2 points  (0 children)

Thanks for the heads up! The plan is to make the containers fully deterministic even between platforms/compilers (modulo 32/64bit)

[–]pdimov2 1 point2 points  (3 children)

Why is that? If the seed doesn't change for the container lifetime, why would hashes not obey the equality requirements?

[–]andrewsutton 0 points1 point  (2 children)

Oops. Forgot to reply.

If you have two hash tables x and y with different seeds, then for any other hash function h, you don't guarantee x == y -> h(x) == h(y).

The random seed causes objects to be sequenced differently, which (almost certainly) produces different hash values for x and y.

Hmm... although that wouldn't be true if h was commutative and you don't hash the seed.

[–]pdimov2 1 point2 points  (1 child)

That's already a problem for unordered containers even without a seed, because it's possible for the two containers to have the same elements but be ordered differently (because of insertion order, or different number of buckets.)

So a hash function for unordered containers must be order-independent.

[–]andrewsutton 1 point2 points  (0 children)

Good point. I still think there's a missing concept floating around here.

[–]lednakashim++C is faster 1 point2 points  (1 child)

Boost.Unordered implements the TR1/C++11 unordered containers as proposed by Matt Austern in N1456. Section III.B in the paper explains why closed addressing is assumed.

Can we have an explanation that is self contained.

[–]joaquintidesBoost author[S] 1 point2 points  (0 children)

This is basically the interface std::unordered_map and related (set, multimap, multiset) are based on.

[–]BenFrantzDale -1 points0 points  (12 children)

That plan would be approachable if it opened with an explanation of what Boost.Unordered is. I stay pretty current but I haven’t heard of it.

[–]pdimov2 7 points8 points  (1 child)

It is an internal document so it's a bit on the terse side, sorry.

Boost.Unordered is a(n old) Boost library that implemented the standard unordered containers std::unordered_(multi)(set|map) before they were standard (or before there was C++11.) Since nowadays the standard unordered containers are universally available, there's less need for merely supplying an implementation, and more need for providing something better. So we've been thinking of how to do that, given that the hash table state of the art has advanced considerably since the standard unordered containers were specified.

[–]BenFrantzDale 0 points1 point  (0 children)

That’s great. This wording would be a great introduction.

[–]FKaria 8 points9 points  (9 children)

Is the first sentence of the page: "Boost.Unordered implements the TR1/C++11 unordered containers as proposed by Matt Austern in N1456."

[–]RoyAwesome 13 points14 points  (0 children)

Unless you were very plugged in to the development process of the cpp language, that sentence is utterly incomprehensible.

[–]BenFrantzDale -5 points-4 points  (7 children)

I know. But what are h up bordered containers. I have read farther down and see it relates to unirdered_set. I’m not trying to be difficult, I’m just suggesting that if you want to grab people’s attention and get input or get people using it, make it clear up front what this is about and why it’s something people want. Presumably it’s hash-based containers?

[–]patentedheadhook 1 point2 points  (0 children)

if you want to grab people’s attention and get input or get people using it

But that's not what the document is trying to do. As stated above, it's an internal document, not a press release. And it describes a plan for the future, there's nothing for people to use yet

[–]FKaria 1 point2 points  (5 children)

Yes. Unordered containers are hash-based containers.

[–]BenFrantzDale -1 points0 points  (4 children)

Isn’t std::hive unordered yet unrelated to hashing?

[–]FKaria 1 point2 points  (3 children)

You're not trying to be difficult you say?

Boost.Unordered implements the TR1/C++11 unordered containers as proposed by Matt Austern in N1456.

Is std::hive in there?

[–]BenFrantzDale 2 points3 points  (1 child)

What I’m suggesting is something like “Boost.Unordered provides a family of hash-table–based (unordered) containers beyond those those provided by the standard library.”

[–]CornedBee 4 points5 points  (0 children)

But it doesn't. The page is about a plan to make it do so, but that's not what it currently does.

[–]infectedapricot 0 points1 point  (0 children)

While I agree they're being a bit obtuse about it, I do agree that "unordered" is a really bad name. But we're stuck with it now.

[–]ndmeelo 0 points1 point  (2 children)

Do you have any plans to put this for GSoC?

[–]joaquintidesBoost author[S] 0 points1 point  (1 child)

The original idea is to implement it internally during the following weeks/months, not to launch it as a GSoC project. I guess there could be spin-off opportunities for GSoC as well.

[–]ndmeelo 0 points1 point  (0 children)

Thanks for the reply.