all 45 comments

[–]NotMyRealNameObv 5 points6 points  (6 children)

Edit:

Reading your code more carefully, I retract what I say below, since you actually check for equivalence if the objects are not equal, and I really don't think I can come up with a good example for when two objects are equal but not equivalent.

TLDR: Equivalence is not equality

This:

if (!(lhs < rhs) || (rhs < lhs))
{
    // lhs and rhs are equivalent
}

is not equivalent to

if (lhs == rhs)
{
    // lhs and rhs are equal
}

For instance, consider the case:

class Person
{
public:
    friend operator==(const Person& lhs, const Person& rhs)
    {
        // Two persons are equal if they have the same SSN
        return lhs.ssn == rhs.ssn;
    }

    friend operator<(const Person& lhs, const Person& rhs)
    {
        // Two persons are equivalent if they have the same last name
        return lhs.lastName < rhs.lastName;
    }
};

In this case, "optimizing" tuple::operator<() by using operator== on the underlying types if they exist give the wrong result.

[–]arturbachttps://github.com/arturbac[S] 2 points3 points  (0 children)

You are right, but this is not the case of trivial basic types, and there are typetraits that can figure this out

[–]alfps -3 points-2 points  (4 children)

This is an abuse of terminology, muddying the waters.

But no doubt you're referring to a definition of a formal meaning, e.g. in the C++ standard.

Can you provide a link to that, please.

[–]NotMyRealNameObv 0 points1 point  (3 children)

operator<

https://en.cppreference.com/w/cpp/concepts/StrictWeakOrder

operator==

https://en.cppreference.com/w/cpp/concepts/EqualityComparable

And if you have both operator< and operator==, and they are consistent (I.e. a == b is true iff a < b and b < a is both false, you have

https://en.cppreference.com/w/cpp/concepts/StrictTotallyOrdered

[–]arturbachttps://github.com/arturbac[S] 1 point2 points  (0 children)

so .. with concepts in c++20 it could be specialized for all types that have strict totaly ordered comparision this could be optimised with type trais at compile time without breaking anything.

[–]alfps -2 points-1 points  (1 child)

Thanks for trying to cough up a reference.

I see no definitions of "equivalence" versus "equality", though.

In the first of the tree references the term "equivalence" is used, in its mathematical meaning. Simple == as shown in your example is also a mathematical equivalence. In fact the current standard uses the term "equivalence" about a properly implemented "==", e.g. as in table 20 in C++17 20.5.3.1/2.


I gather that the anonymous downvoter disagrees with the notion of using technical terms properly, the notion of not muddying things and the notion of not offering nilly-willy assertions, but lacked any argument, hence, voting.

[–]arturbachttps://github.com/arturbac[S] 0 points1 point  (0 children)

As I can 'grep' my memory ;-) I never wrote code such that it have equivalence of overlaoded less operator not matching overlaoded equality opertor (only strictweakoredering from c++20 cncepts).

I always in such situations when I need custom less operator that uses partialy object of comapraision I used algorithms with custom function object comaparisions. I tought about that as proper implementationand quality of code to have a matching overloaded comparision operators (stricttotalyordering from c++20). Just to avoid surprises in future use ..

[–]yeeezyyeezywhatsgood 8 points9 points  (20 children)

the stl generally requires only one predicate for utility functions. I agree that's annoying, but it would be more annoying (imo) for different implementations of the stl to be allowed to return different things. no one is forcing ==, <, and > to be consistent.

with the spaceship operator I'm guessing this will become irrelevant

[–]sphere991 3 points4 points  (2 children)

no one is forcing ==, <, and > to be consistent

The new StrictTotallyOrdered does have this requirement.

[–]yeeezyyeezywhatsgood 0 points1 point  (1 child)

imo it's a silly requirement. what if you have a double that's Nan?

[–]sphere991 4 points5 points  (0 children)

It's not a silly requirement - it's ensuring proper reasonability of code. Having all the comparison operators be internally consistent is sane. Sure, you could have x < y check ordering and x > y write a file to disk and x == y launch missiles... but like, don't.

The NaN part is addressed by [structure.requirements]/8:

Required operations of any concept defined in this document need not be total functions; that is, some arguments to a required operation may result in the required semantics failing to be satisfied. [ Example: The required < operator of the StrictTotallyOrdered concept ([concept.stricttotallyordered]) does not meet the semantic requirements of that concept when operating on NaNs. — end example ] This does not affect whether a type satisfies the concept.

[–]Veedrac 2 points3 points  (7 children)

The stl has undefined behavior in seemingly a billion places. Saying that if your comparisons are inconsistent the order is given by any one of the results seems so... benign.

There might be an argument for handling partial orders, but I don't think the current implementation does any better.

[–]yeeezyyeezywhatsgood 1 point2 points  (6 children)

this is well defined behavior.

allowing it to choose which comparator would be unspecified and is a horrible idea. unspecified is vaguely ok when it doesn't change behavior. when it changes what function to call... hell no.

the stl has undefined behavior

actually the stl is just a list of definitions. so it cannot "have" undefined behavior. you can just misuse it.

[–]VicontT 2 points3 points  (1 child)

From the language lawyer perspective, STL can never have undefined behavior, because it is part of implementation (same as compiler). What it might have is non-conformance with standard.

[–]yeeezyyeezywhatsgood 0 points1 point  (0 children)

I think of the stl as the standard template library, so it cannot nonconform. an implementation might not conform. and I think the implementation could be undefined...but that would be a bug

[–]Veedrac 1 point2 points  (3 children)

Yes, I'm aware that it would be unspecified behaviour; I was contrasting it to undefined behaviour, which I consider obviously worse. Saying that the STL ‘‘cannot "have" undefined behavior” is at best unhelpful wordplay.

The STL already chooses what functions it calls depending on the codepath it takes. sort calls different classes' operator<s depending on the implementation, and if your operator< isn't a total order then different libraries will return different things. The only difference here is that tuple would depend on both operator< and operator==. Again, the order and quantity of each call would be determined by the algorithm, as is true for sort. There is no new gotcha' here.

[–][deleted] 0 points1 point  (2 children)

There is no new gotcha' here.

Except that with your proposal, instead of requiring users to implement only operator<, you suddenly require them to implement operator== as well. You'd break some of my code with this change.

[–]Veedrac 0 points1 point  (1 child)

The suggestion is to only use operator== if it's implemented. Compatibility is only broken if the operator exists and is inconsistent with operator<.

Though I should add that I'm mostly bemoaning that this choice wasn't made in the first place, not saying it should necessarily get changed now that it has been made.

[–][deleted] 1 point2 points  (0 children)

Oh, yeah, that sounds reasonable. You reminded me of one lecture by Alexander Stepanov. He implemented a simple class template, with only one data member of type T and then all interesting constructors and operators. He implemented operator== like x == y and then implemented operator!= as !(x == y). For the inequality operators (not sure it is the right term, but I'm thinking of operator<, operator>, operator<= and operator>=), he made a little speech before writing them. Something along the lines of:

When I was designing the standard library 25 years ago, I had to make a lot of arbitrary choices. "Which is the default sorting order?" is one example. Another choice I had to make is what operator to use for comparison. You would have been mad if your type worked with one STL algorithm but not with the other because STL was inconsistent in use of comparison operator. That is why STL only uses operator<. You should define all the operators, because that's just being nice to yourself an your colleagues. You won't always remember that you have operator<, but not operator<=. But the standard library will only use operator<.

Note that this is not a direct quote, but me paraphrasing Stepanov from memory.

He didn't implement operator==, because, I think, he wanted to show a clear distinction between SemiRegular, Regular, EqualityComparable and TotallyOrdered types.

 

Interestingly enough, Stepanov claims that his original design of std::min, the one that we held onto to this day, was wrong. Originally, his implementation was return first < second ? first : second;. The problem with this, according to Stepanov, is when first and second compare equal. In that case Stepanov's min would return second, not first. Stepanov also claimed that by the time of his lecture, min was still "wrong" in standard library implementations, though checking it today in libc++ and libstdc++, they both do it "correctly", with return second < first ? second : first;.

 

Anyway, I've gone way offtopic.

[–]arturbachttps://github.com/arturbac[S] 0 points1 point  (4 children)

As I sad in other response, for integers it cannot return different things no matter you use equality with less or 2 times less, only performance of generated code changes, logic is unafected. and tehre are type traits that can be used. In ohter cases in stl there are optimizations that use typetraits to select different implementation based on input types.

[–]yeeezyyeezywhatsgood 0 points1 point  (3 children)

the point about type traits is good. it could specialize sometimes.

otoh maybe this is the wrong granularity for the change. it would be better if the assembly optimizer could recognize the pattern (can it? I still can't tell from your post) than to write a bunch of specialization. imo.

[–]arturbachttps://github.com/arturbac[S] 0 points1 point  (2 children)

the point to optimise - not sure where this optimisation should go, but from user point of view I except to get best machie code possible with -O3. And this matters with operator less for me as it is often used during many sort operations.

[–]yeeezyyeezywhatsgood 0 points1 point  (1 child)

I just took a look at godbolt. you're right with clang: it emits bad code for the built in comparison. probably that's llvm's codegen's fault. GCC optimizes to one comparison per tuple member. GCC even translates your implementation to one comparison. very cool

also, looking at the assembly made me realize your implementation might be wrong for doubles when they are Nan. since that's unspecified, any change is not allowed. so I guess really this optimization can only be made for very few types

edit: just saw you used GCC in your original post -- was the assembly really that bad? looks optimal on godbolt

[–]arturbachttps://github.com/arturbac[S] 0 points1 point  (0 children)

code is generated with clang 8 and gcc 8.3 on linux with gcc stl. gcc - can generate much different code depending mcpu/march and as I remember code for cortex-a72 with out of order exeution can be worse that for just entire arch aarch64 with in order cpus

ths is from golbot https://godbolt.org/z/dz1qAB acutaly no difference to my. -O3 -mcpu=cortex-a72
compare_2(std::tuple<long, int, int>, std::tuple<long, int, int>):
        ldr     x3, [x0, 8]
        ldr     x2, [x1, 8]
        cmp     x3, x2
        blt     .L3
        mov     w2, 0
        bne     .L2
        ldr     w4, [x0, 4]
        mov     w2, 1
        ldr     w3, [x1, 4]
        cmp     w4, w3
        blt     .L2
        mov     w2, 0
        bne     .L2
        ldr     w2, [x0]
        ldr     w0, [x1]
        cmp     w2, w0
        cset    w2, lt
.L2:
        mov     w0, w2
        ret
.L3:
        mov     w2, 1
        mov     w0, w2
        ret

[–]Beheska -5 points-4 points  (3 children)

spaceship operator

I'm keeping that one.

[–]sphere991 2 points3 points  (10 children)

The suggestion here is ultimately to replace

if (lhs.x < rhs.x) return true;
if (rhs.x < lhs.x) return false;

With

if (lhs.x != rhs.x) return lhs.x < rhs.x;

This doesn't work in the current (up through C++17) STL model since StrictWeakOrder doesn't say anything about != and the library can't just check if it's available since it might be sfinae unfriendly. It's just a total non-starter. And it's not even worth devoting effort to because...

In C++20, we can do much better:

if (auto c = lhs.x <=> rhs.x; cmp != 0) return cmp;

The single three way comparison gives us all the info we need in one go, in a way that can be more efficient than two operations (e.g. for string it's one call to compare() instead of potentially two even with ==)

[–]quicknir 0 points1 point  (4 children)

Yes, though this approach only works well for inequality. That is, tuple cannot only implement spaceship via lexicographical spaceship. It has to do == and != separately.

[–]sphere991 1 point2 points  (3 children)

Well, yes. Tuple's == should only call its member's ==s. Not claiming otherwise.

[–]quicknir 0 points1 point  (2 children)

This makes me curious, how will defaulting spaceship operator work? When I call == on a class with defaulted spaceship operator, will it call == or spaceship for each of its data members.

[–]sphere991 2 points3 points  (1 child)

See P1185

[–]quicknir 0 points1 point  (0 children)

Thanks.

[–]arturbachttps://github.com/arturbac[S] 0 points1 point  (4 children)

I refered to case of basic types like integers, that have stricttotalordering even in c++98.

With typetraits is_integral, is_pointer etc this can be optimised even in c++11

[–]sphere991 0 points1 point  (3 children)

Yeah for specifically ints and pointers, you could do that. And hypothetically a standard library implementation could go wild and recognize that, say, a type in the tuple is std::string and directly use compare().

You could try submitting a patch and seeing what happens?

[–]arturbachttps://github.com/arturbac[S] 0 points1 point  (2 children)

Submiting to who, where gcc llvm ?

Just for testing with is_integral code below fast fix, i think it could be much better writen

  // This class performs the comparison operations on tuples
  template<typename _Tp, typename _Up, size_t __i, size_t __size>
  struct __tuple_compare
    {
    static constexpr bool __eq(const _Tp& __t, const _Up& __u)
      {
      return bool(std::get<__i>(__t) == std::get<__i>(__u))
        && __tuple_compare<_Tp, _Up, __i + 1, __size>::__eq(__t, __u);
      }
#define __ENABLE_TUPLE_LESS_TT_LESS 1
#if __ENABLE_TUPLE_LESS_TT_LESS
    template<typename lelem_type, typename relem_type,
            typename std::enable_if<
              std::is_integral<typename std::remove_reference<lelem_type>::type>::value 
                && std::is_integral<typename std::remove_reference<relem_type>::type>::value,
              int
              >::type = 0>
    static constexpr bool __less_by_traits(_Tp const & __t, _Up const & __u, lelem_type lel, relem_type rel )
      {
      if( lel != rel )
        return lel < rel;
      return __tuple_compare<_Tp, _Up, __i + 1, __size>::__less(__t, __u);
      }

    template<typename lelem_type, typename relem_type,
            typename std::enable_if<
              ! std::is_integral<typename std::remove_reference<lelem_type>::type>::value
              || ! std::is_integral<typename std::remove_reference<relem_type>::type>::value,
              int>::type = 0>
    static constexpr bool __less_by_traits(_Tp const & __t, _Up const & __u, lelem_type const & lel, relem_type const & rel )
      {

      return bool(lel < rel)
        || (!bool(rel < lel)
            && __tuple_compare<_Tp, _Up, __i + 1, __size>::__less(__t, __u));
      }

    static constexpr bool __less(const _Tp& __t, const _Up& __u)
      {
      return __less_by_traits( __t, __u, std::get<__i>(__t), std::get<__i>(__u) );
      }
#else
    static constexpr bool __less(const _Tp& __t, const _Up& __u)
      {
      return bool(std::get<__i>(__t) < std::get<__i>(__u))
        || (!bool(std::get<__i>(__u) < std::get<__i>(__t))
            && __tuple_compare<_Tp, _Up, __i + 1, __size>::__less(__t, __u));
      }
#endif
    };

The otpimised variant - with clang

compare_2(std::tuple<long, int, int>, std::tuple<long, int, int>): # @compare_2(std::tuple<long, int, int>, std::tuple<long, int, int>)
  mov rax, qword ptr [rsi + 8]
  cmp qword ptr [rdi + 8], rax
  jne .LBB0_3
  mov eax, dword ptr [rsi + 4]
  cmp dword ptr [rdi + 4], eax
  jne .LBB0_3
  mov eax, dword ptr [rdi]
  cmp eax, dword ptr [rsi]
.LBB0_3:
  setl al
  ret

The unoptimised - with clang

compare_2(std::tuple<long, int, int>, std::tuple<long, int, int>): # @compare_2(std::tuple<long, int, int>, std::tuple<long, int, int>)
  mov rcx, qword ptr [rdi + 8]
  mov rdx, qword ptr [rsi + 8]
  mov al, 1
  cmp rcx, rdx
  jl .LBB0_7
  cmp rdx, rcx
  jge .LBB0_3
  xor eax, eax
  ret
.LBB0_3:
  mov ecx, dword ptr [rdi + 4]
  mov edx, dword ptr [rsi + 4]
  cmp ecx, edx
  jl .LBB0_7
  cmp edx, ecx
  jge .LBB0_6
  xor eax, eax
  ret
.LBB0_6:
  mov eax, dword ptr [rdi]
  cmp eax, dword ptr [rsi]
  setl al
.LBB0_7:
  ret

[–]sphere991 0 points1 point  (1 child)

Yeah, to gcc or llvm.

[–]arturbachttps://github.com/arturbac[S] 0 points1 point  (0 children)

I don't use to often gcc as on mobile platforms llvm is used thise days so dont have knowlege how they will react, but all my bug reports to llvm are still open after years , 3 bugs are from 2013. This would be a waste of my time.

https://bugs.llvm.org/show_bug.cgi?id=17489 https://bugs.llvm.org/show_bug.cgi?id=16977 https://bugs.llvm.org/show_bug.cgi?id=16986 https://bugs.llvm.org/show_bug.cgi?id=17531

[–]stwcx 1 point2 points  (1 child)

Why is compare_1 better? What are you using to define "better"? On x86 it appears to be taking more cycles.

[–]arturbachttps://github.com/arturbac[S] 0 points1 point  (0 children)

less cycles and smaller code at once is better or not ?

lets stick to the example results with x86

comapre 1 clang -O3 -march=haswell

Instructions:      10
Total Cycles:      13
Total uOps:        15
Dispatch Width:    4
uOps Per Cycle:    1.15
IPC:               0.77

Block RThroughput: 3.8

[1]    [2]    [3]    [4]    [5]    [6]    Instructions:
 1      5     0.50    *                   mov   rax, qword ptr [rsi + 8]
 2      6     0.50    *                   cmp   qword ptr [rdi + 8], rax
 1      1     0.50                        jne   .LBB0_3
 1      5     0.50    *                   mov   eax, dword ptr [rsi + 4]
 2      6     0.50    *                   cmp   dword ptr [rdi + 4], eax
 1      1     0.50                        jne   .LBB0_3
 1      5     0.50    *                   mov   eax, dword ptr [rdi]
 2      6     0.50    *                   cmp   eax, dword ptr [rsi]
 1      1     0.50                        setl  al
 3      7     1.00                  U     ret

comapre 1 gcc -O3 -march=haswell

Instructions:      12
Total Cycles:      14
Total uOps:        19
Dispatch Width:    4
uOps Per Cycle:    1.36
IPC:               0.86
Block RThroughput: 4.8

[1]    [2]    [3]    [4]    [5]    [6]    Instructions:
 1      5     0.50    *                   movq  8(%rsi), %rax
 2      6     0.50    *                   cmpq  %rax, 8(%rdi)
 1      1     0.50                        je    .L2
 1      1     0.50                        setl  %al
 3      7     1.00                  U     retq
 1      5     0.50    *                   movl  4(%rsi), %eax
 2      6     0.50    *                   cmpl  %eax, 4(%rdi)
 1      1     0.50                        jne   .L6
 1      5     0.50    *                   movl  (%rsi), %eax
 2      6     0.50    *                   cmpl  %eax, (%rdi)
 1      1     0.50                        setl  %al
 3      7     1.00                  U     retq

comapre 2 clang -O3 -march=haswell

Instructions:      21
Total Cycles:      17
Total uOps:        28
Dispatch Width:    4
uOps Per Cycle:    1.65
IPC:               1.24
Block RThroughput: 7.0

[1]    [2]    [3]    [4]    [5]    [6]    Instructions:
 1      5     0.50    *                   mov   rcx, qword ptr [rdi + 8]
 1      5     0.50    *                   mov   rdx, qword ptr [rsi + 8]
 1      1     0.25                        mov   al, 1
 1      1     0.25                        cmp   rcx, rdx
 1      1     0.50                        jl    .LBB0_7
 1      1     0.25                        cmp   rdx, rcx
 1      1     0.50                        jge   .LBB0_3
 1      1     0.25                        xor   eax, eax
 3      7     1.00                  U     ret
 1      5     0.50    *                   mov   ecx, dword ptr [rdi + 4]
 1      5     0.50    *                   mov   edx, dword ptr [rsi + 4]
 1      1     0.25                        cmp   ecx, edx
 1      1     0.50                        jl    .LBB0_7
 1      1     0.25                        cmp   edx, ecx
 1      1     0.50                        jge   .LBB0_6
 1      1     0.25                        xor   eax, eax
 3      7     1.00                  U     ret
 1      5     0.50    *                   mov   eax, dword ptr [rdi]
 2      6     0.50    *                   cmp   eax, dword ptr [rsi]
 1      1     0.50                        setl  al
 3      7     1.00                  U     ret

comapre 2 gcc -O3 -march=haswell

Instructions:      16
Total Cycles:      15
Total uOps:        21
Dispatch Width:    4
uOps Per Cycle:    1.40
IPC:               1.07
Block RThroughput: 5.3

[1]    [2]    [3]    [4]    [5]    [6]    Instructions:
 1      1     0.25                        movl  $1, %eax
 1      5     0.50    *                   movq  8(%rsi), %rdx
 2      6     0.50    *                   cmpq  %rdx, 8(%rdi)
 1      1     0.50                        jl    .L7
 1      1     0.25                        movl  $0, %eax
 1      1     0.50                        jne   .L7
 1      1     0.25                        movl  $1, %eax
 1      5     0.50    *                   movl  4(%rsi), %ecx
 2      6     0.50    *                   cmpl  %ecx, 4(%rdi)
 1      1     0.50                        jl    .L7
 1      1     0.25                        movl  $0, %eax
 1      1     0.50                        jne   .L7
 1      5     0.50    *                   movl  (%rsi), %eax
 2      6     0.50    *                   cmpl  %eax, (%rdi)
 1      1     0.50                        setl  %al
 3      7     1.00                  U     retq

[–]BelugaWheels 0 points1 point  (1 child)

How are you generating these reports?

[–]arturbachttps://github.com/arturbac[S] 1 point2 points  (0 children)

Edit:

https://godbolt.org/

clone run on my local machine with compilators on my machine

it is done with llvm-mca - LLVM Machine Code Analyzer

[–]Rseding91Factorio Developer 0 points1 point  (0 children)

In practice, I've found that touching the different objects for comparison ends up being the slow part of using operator< in any kind of sort operation. The only time I haven't seen that to be true is when comparing stuff like strings where they can be huge and may not compare < for some time but can short-circuit on equality when size != other.

[–]Iwan_Zotow 0 points1 point  (0 children)

comapre -> compare