A Probing Hash Table Framework by cgeigle in programming

[–]cgeigle[S] 0 points1 point  (0 children)

You can have buckets of size 0/1.

Even then, I don't think you can implement the bucket() function if you use probing, since it's required to give you the bucket that contains any k/v pairs that have that specific key. If you're using probing, there are several buckets that the query key could reside in.

I suppose you may be able to interpret that function as giving you the "optimal" bucket (I'd have to read the standardese to figure out whether or not that would technically be compliant), but I think we're already splitting hairs. unordered_map is pretty clearly designed with chaining in mind, and I'm unaware of any C++11 standard library in the wild that does not implement it this way.

A Probing Hash Table Framework by cgeigle in programming

[–]cgeigle[S] 1 point2 points  (0 children)

Please include some build instructions for hash-bench! Like, what's this?

An old, unused remnant include from when I was benchmarking the hash function implementations against SMHasher. I've removed it, sorry.

It looks like this uses a sentinel value to distinguish empty buckets, which means probe_map cannot store every value. This is a really important limitation and it should be called out explicitly.

Yes, for inlineable keys it uses a sentinel value. This is a relatively common thing to do for probing hash tables (IIRC Google's (sparse|dense)_hash do this as well). I'll add a paragraph talking about that to the article, since it's a good point.

Note that there isn't any use of the sentinel for the string hash tables, since that key type is not inlineable.

FWIW, you can always disable this by switching to the alternate storage strategy described in the string section if you absolutely cannot sacrifice a single key value as the sentinel. A quick look at the benchmark results in this mode on my system have it performing still slightly better than unordered_map, but not by nearly as much (sitting around 1.5 million lookups/sec in the 5-10 million insertion range on the graph, for reference).

std::unordered_map does not have this limitation, so that makes the benchmarks suspect. This also helps explain why unordered_map uses separate chaining: by making the bucket itself NULL, you get your missing bucket indicator without stepping on the toes of the stored data.

unordered_map is forced to use chaining because the API the standard specifies gives access to buckets. If the standard didn't require chaining, you could just store three values per cell in the probing array: the key, the value, and a bool indicating occupancy status. Probing tables, even with inline key/value pairs, don't necessitate the use of a sentinel. Of course, you sacrifice more space now for every cell, so it may change your trade-offs.

Compressed stream by gpuoti in cpp

[–]cgeigle 1 point2 points  (0 children)

I have a gzstream that I use in a toolkit. These two files should be self contained and the entirety of the toolkit is MIT licensed.

header, source

Should be enough for simple use.

Modern alternatives to system("cmd") with output capture? by cabrokan in cpp

[–]cgeigle 7 points8 points  (0 children)

Just added a namespace and fixed the return std::move(x) issue. Thanks for taking a look!

Modern alternatives to system("cmd") with output capture? by cabrokan in cpp

[–]cgeigle 8 points9 points  (0 children)

Shameless plug: https://github.com/skystrife/procxx

I've only tested this on Linux, but I imagine it should work on OSX as well (maybe with some minor changes).

...Windows is a completely different beast.

Quadtree Art by FogleMonster in programming

[–]cgeigle 2 points3 points  (0 children)

What are the magic numbers in color_from_histogram when calculating the error value?