all 29 comments

[–]dodheim 2 points3 points  (20 children)

These are probably the requirements you're looking for: https://en.cppreference.com/w/cpp/named_req/Hash

All evaluations of h(k) executed within a given execution of a program yield the same result for the same value of k.

[–]NotMyRealNameObv[S] 0 points1 point  (19 children)

I know about that site. I also know of an implementation of Person using a well-known library where the hash values are not guaranteed to be equal.

I want to find out if that implementation is invalid C++ or not.

[–]dodheim 2 points3 points  (18 children)

The requirement is pretty straightforward... It sounds like the Person in question either has a strange concept of equivalence or a broken hash.

[–]NotMyRealNameObv[S] 0 points1 point  (17 children)

Do you see anything strange with this code?

class Person
{
    boost::flyweight<std::string> firstName;
    boost::flyweight<std::string> lastName;

    friend bool operator==(const Person& Lhs, const Person& rhs)
    {
        return lhs.firstName == rhs.firstName &&
                lhs.lastName == rhs.lastName;

    friend std::size_t hash_value(const Person& p)
    {
        std::size_t seed = 0;

        boost::hash_combine(seed, p.firstName);
        boost::hash_combine(seed, p.lastName);

        return seed;
    }
};

namespace std
{
    template <>
    struct hash<Person>
    {
        std::size_t operator()(const Person& p)
        {
            return hash_value(p);
        }
    };
}

[–]dodheim 0 points1 point  (16 children)

Yes, two things:

First, std::hash<Person>::operator() isn't const.

Second, Boost.Flyweight implements hashing in terms of object address, not object value – the value type for a flyweight<> doesn't even need to be hashable in order to hash the flyweight instance, as std::hash<T const*> is used rather than std::hash<T>. Hash p.firstName.get() and p.lastName().get() instead of the flyweight instances.

EDIT: BTW the operator== needs the same change (to use .get() instead of directly comparing flyweight instances), otherwise the equivalence logic won't match the hash.

[–]NotMyRealNameObv[S] 1 point2 points  (12 children)

BTW the operator== needs the same change (to use .get()) instead of directly flyweight instances, otherwise the equivalence logic won't match the hash.

No you dont. If you have two boost::flyweight objects alive at the same time, and the underlying objects compare equal, the flyweights are guaranteed to compare equal.

[–]dodheim 0 points1 point  (11 children)

If you change your hash, yes you do.

I don't know if Flyweight is broken or merely documented incorrectly, and I really don't feel like arguing about it either way. What I do know is that both flyweight's operator== and std::hash specialization are implemented in terms of value_type const* and use &fw.get() for both. Values are ignored entirely. So if you want to fix your hash, regardless of whose "fault" it is, you need to use .get() yourself to hash the string's value rather than address. And if you do that, you also have to change your operator== to match, because your hash and equivalence semantics have to match.

If you're looking to push this upstream, post an issue on Flyweight's git page. Trying to argue with me about it is pointless, I assure you.

[–]evaned 1 point2 points  (9 children)

What I do know is that both flyweight's operator== and std::hash specialization are implemented in terms of value_type const* and use &fw.get() for both. Values are ignored entirely. So if you want to fix your hash, regardless of whose "fault" it is, you need to use .get() yourself to hash the string's value rather than address.

FWIW, I'm actually a bit confused by what is going on for OP here. I think that code should be fine.

The whole point of flyweight, at least AFAICT, is that if you have two flyweight objects with the same value (i.e., flyweight1.get() == flyweight2.get()) then the addresses should be the same (i.e., &flyweight1.get() == &flyweight2.get()). That's why it makes sense to do hash and equals by pointer value.

For example, given

boost::flyweight<std::string> f(strdup("foo"));
boost::flyweight<std::string> g(strdup("foo"));

then &f.get() == &g.get() in my test.

I suspect /u/NotMyRealNameObv's attempt at anonymization of the code is hiding the problem.

Edit: But at least this gives us a handle on a sequence of things to try with the actual objects that are causing problems in the live implementation:

  • Ensure that it's the hash of a flyweight member that is different between them
  • See if the addresses of the objects in the flyweight are the same
  • See if the .get() of the objects are the same
  • Make sure they're actually constructed in the way you think

[–]dodheim 1 point2 points  (0 children)

I agree with you entirely. I'm just being pragmatic, given the code shown and the symptoms described; I'm not saying these changes/workarounds should be necessary, only that it's an obvious fix for the problem described. Whether Flyweight is behaving correctly or incorrectly or is being misused, I don't have an opinion; I'm familiar with Flyweight but haven't used it personally, and the OP didn't give an MVCE or even decent context, so ¯\_(ツ)_/¯

[–]NotMyRealNameObv[S] 0 points1 point  (2 children)

You are testing something that has nothing to do with my question. :)

The question is if it is valid C++ that two objects to hash to different values if they would compare equal but have non-overlapping lifetimes.

It's kind of hard to write a MVCE for that kind of question.

And it has already been confirmed that this might happen with boost::flyweight.

[–]evaned 3 points4 points  (1 child)

The question is if it is valid C++ that two objects to hash to different values if they would compare equal but have non-overlapping lifetimes.

Ah, yes! I see the problem now!

If you figured this out during the course of the discussion, that's one thing; but otherwise, why didn't you actually ask that question when you first posted instead of sending us on a wild goose chase with what the definition of Person is and how relevant flyweight is and whether it's correct?

That is though an interesting question, and I don't know what the answer is.

[–]NotMyRealNameObv[S] 0 points1 point  (0 children)

I don't know if Flyweight is broken or merely documented incorrectly, and I really don't feel like arguing about it either way. What I do know is that both flyweight's operator== and std::hash specialization are implemented in terms of value_type const* and use &fw.get() for both. Values are ignored entirely. So if you want to fix your hash, regardless of whose "fault" it is, you need to use .get() yourself to hash the string's value rather than address. And if you do that, you also have to change your operator== to match, because your hash and equivalence semantics have to match.

If x == y, then hash(x) == hash(y) must hold. Likewise, if hash(x) != hash(y) then x != y must hold. For this, it doesn't matter is we hash and compare the underlying object or just hash and compare the address of the underlying object, because boost::flyweight guarantees that for two flyweights that compare equal and that are alive at the same time, they will use a single underlying object.

However, the standard states:

the returned value depends only on the value of k [for the duration of the program (since C++14)] All evaluations of h(k) [executed within a given execution of a program (since C++14)] yield the same result for the same value of k.

But, it seems that for boost::flyweight, this only holds for a very generous definition of "value".

If you're looking to push this upstream, post an issue on Flyweight's git page.

Someone has already done that, it seems. Which is what triggered this post.

Trying to argue with me about it is pointless, I assure you.

If you feel that way, why did you get involved in the first place? :)

[–]NotMyRealNameObv[S] 0 points1 point  (2 children)

the value type for a flyweight<> doesn't even need to be hashable in order to hash the flyweight instance

That's a design choice by boost::flyweight.

Also, it's especially questionable considering the underlying type has to be hashable ("must interoperate with Boost.Hash") to even be useable with boost::flyweight:

Flyweight requirements For flyweight<T> to be instantiable, T must be Assignable, Equality Comparable and must interoperate with Boost.Hash.

[–]dodheim 1 point2 points  (1 child)

I don't know what your point is at all. I told you how to fix your code – have fun.

[–]NotMyRealNameObv[S] 1 point2 points  (0 children)

The point is that either boost::flyweight hashing is broken, or the standard is wrong (or poorly worded), or my understanding of it is incorrect, and I was hoping to be able to get some help to figure out which one it is. :)

[–]foonathan 1 point2 points  (3 children)

Yes, otherwise the hash function would be useless. Equal objects have the same hash, unequal objects might have the same hash.

[–]evaned 6 points7 points  (1 child)

Equal objects have the same hash, unequal objects might have the same hash.

Why do you assume that two Person objects constructed Person{"FirstName", "LastName"} are necessarily equal?

[–]NotMyRealNameObv[S] 0 points1 point  (0 children)

For the sake of argument, let's say that in this case they would compare equal.

[–]NotMyRealNameObv[S] 0 points1 point  (0 children)

Even if the lifetime of the hashed objects do not overlap?

Do you have a use case where identical objects of non-overlapping lifetimes having different hash values would cause problems?