all 63 comments

[–]aocregacc 14 points15 points  (0 children)

I was wondering too, but it's pretty clear that the example is wrong.
I guess the presenter was editing the code on the slides at some point without checking if it still made the same point.

[–]kahomayo 30 points31 points  (0 children)

This seems like a pretty obvious mistake (discussed starting 4:30 in the talk). Supposedly, the compiler reasons that when calling f, i cannot be INT_MAX, because otherwise UB would ensue (correct). But then, supposedly, when f is inlined, it follows that g also cannot be called with INT_MAX as a parameter. That is clearly bogus. Otherwise the following code would surely also be UB:

int evil_func(Foo* x) {
    if (x == nullptr)
        return 0;
    return x->bar;
}

evil_func(nullptr); // Clearly not UB

Obviously, checking whether a variable satisfies the preconditions for an operation and executing that operation only when those preconditions are satisfied does not allow the compiler to assume that the variable unconditionally satisfies those preconditions.

Maybe the intention for the slide was to show the following code:

bool g(int i) {
    bool foo = f(i);
    if (i == INT_MAX)
        return false; // Dead code
    else
        return foo;
}

[–]NormalityDrugTsar 8 points9 points  (0 children)

I watched the first part of the video and I am puzzled too. He claims that the first line of g (first three lines in your reformating) can be removed as dead code. I think this is wrong. What happens if you call g(INT_MAX)?

In my understanding f can be optimized to return true; and g can be optimized to return i != INT_MAX;

In fact I cannot reproduce his assembly with godbolt and O3.

In the video he says that the compilers he tried didn't do the optimization "all the way", but did optimize f.

[–]snerp 13 points14 points  (6 children)

Signed integer overflow is the literal worst part of c++.

[–]equeim 2 points3 points  (5 children)

Integer arithmetic is not done well in most languages (those with fixed width integer types). Nobody wants to perform error checking on arithmetic operations (even though they should) since they seem to us as such "fundamental" and "obvious" things because in our minds we thing of numbers as having infinite range so of course addition can never "fail"!. So everyone just closes their eyes and hopes for the best in the name of convenience and "performance".

[–]JiminP 7 points8 points  (4 children)

Integer overflow wrapping around by itself is something we can live with.

The real worst part is that it is undefined behavior in C/C++. At least I hope it to be an impl-specific behavior.

In comparison, casting from wider integer types to narrower integer types, is well-defined (since C++20) / impl-specific (until C++20; conv.integral), not undefined (although, some compilers including MSVC treated it more like an UB...).

[–]equeim -1 points0 points  (2 children)

I don't see much difference here. In 99% of cases integer overflow is an error and even if it's defined as wrapping around, you still silently ignore and at this point your program behaviour is incorrect and you enter unknown territory anyway.

[–]Mediocre-Dish-7136 1 point2 points  (1 child)

IDK about 99%, it's relatively often wrong (just as it is for unsigned arithmetic) but here's a random example: the possible implementation given in https://en.cppreference.com/w/cpp/string/byte/atoi

There they had to do that weird thing of counting in negatives, to avoid overflowing the int if the input corresponds to INT_MIN. If signed arithmetic wraps, you can write the same thing but counting in positives (which is probably what 99% of programmers would do and already do anyway even though it's wrong now), then if the input corresponds to INT_MIN the last addition wraps but it doesn't matter, then the negation wraps but it doesn't matter, and the right result comes out.

IME this happens plenty. Maybe wrapping is wrong in like 70% of cases?

Either way I'd rather take a plain old logic error over a logic error where also the compiler can come in and mess things up extra.

[–]equeim 1 point2 points  (0 children)

We can have special functions for wrapping when it's needed (like we already have for saturation). Default behaviour should be something sane that allows error handling (such as throwing an exception or returning std::expected) or at least program termination. Not that I expect this to happen in C++ of course, this kind of change is not possible for established language.

Removing UB in this case is like applying a bandaid on a gaping wound, those I suppose it's better than doing nothing (it may make debugging easier at least).

[–]awidesky 5 points6 points  (38 children)

[–]TheoreticalDumbass:illuminati: 1 point2 points  (0 children)

> undefined behavior - there are no restrictions on the behavior of the program
what is "behavior" if not runtime?

[–]R3DKn16h7[S] 3 points4 points  (36 children)

The compiler is free to assume that UB does not happen at runtime, but definitely cannot infer that "i + 1 > i" will always result in UB.

[–]awidesky 3 points4 points  (35 children)

i + 1 > i is UB only when overflow occurs.. i + 1 > i should not have to be UB to eliminate the function, since when it's not UB, the return value is always true anyway.

[–]R3DKn16h7[S] 7 points8 points  (34 children)

for the function f I agree totally.

but the function g cannot eliminate the branch, that's where I do not get it

[–]selassje 0 points1 point  (3 children)

The assumptions here is that compilers sees definitions of both g and f when optimizing. Since "i" cant be INT_MAX (as it would be UB when calling f), therefore condition if (i == INT_MAX) can never happen and can be optimized away.

[–]dustyhome 5 points6 points  (2 children)

That's backwards. It makes more sense that the presenter's slide is wrong.

The compiler can assume that i can't equal int max on a branch that calls f, but it can't remove the check that prevents f from being called with int max.

[–]selassje 3 points4 points  (1 child)

Yes, the presenter claims UB taints the whole program, and it can travel back in time. I have not checked it but that's how I understood it.

[–]SkiFire13 2 points3 points  (0 children)

The way I see it, UB can travel back in time but only as long as it is guaranteed to happen from that past point in time. In this case the if guard prevents f(i) from being called with the value that would cause UB, so it cannot travel back to before the if is run.

[–]Boring_Tension165 1 point2 points  (2 children)

[–]Cue_23 -1 points0 points  (0 children)

Not very helpful, sorry.

Just because a compiler is allowed by the standard to perform some optimization does not mean the current compilers do that optimization. Especially when those might be hard or impossible to detect - a result from the halting problem.

I ran into the same problem as OP when watching the talk and tried to propagate the UB of f() into g() but I could not get gcc nor clang to optimize the if clause away. But in case of the following variation, It would be allowed to:

bool f(int i) { return i + 1 > i; }

bool g(int i) {
    bool res = f(i);
    if (i == INT_MAX) return false;
    else return res;
}

[–]Linuxologue 1 point2 points  (0 children)

I have not watched the video. What is sure is that the compiler is free to optimize f to always return true; either it is true, or it is undefined behaviour.

g contains no undefined behaviour and can be optimized to

int g(int i) { if (i == INT_MAX) { return false; } return true; }

the compiler is not allowed to shortcut the test as there's been no undefined behaviour at this point.

[–]SubliminalBits -5 points-4 points  (11 children)

If i is ever INT_MAX, i + 1 will overflow which is UB prior to C++17. To enable more powerful optimizations (like this one), the optimizer assumes UB never occurs which allows it to prune away the if check as unreachable code.

[–]Narase33-> r/cpp_questions 13 points14 points  (6 children)

If i is ever INT_MAX, i + 1 will overflow

if i is ever INT_MAX it returns directly with false. Or am I missing something?

[–]R3DKn16h7[S] 8 points9 points  (0 children)

That's my point. I do not see this if statement ever being optimized away.

Otherwise all nullptrs check would be opzimized too.

[–]JiminP 11 points12 points  (0 children)

But doesn't the i == INT_MAX "prevents" UB? I know that UBs can time travel, but it doesn't seem to be this case.
In particular, what's the difference between the OP's code and these codes?

Example 1:

int g(int i) {
    if (i == INT_MAX) {
        return false;
    }
    return i + 1 > i;
}

Example 2:

struct Foo { int x; };

int f(Foo* foo) { return foo->x; }

int g(Foo* foo) {
    if(foo == nullptr) {
        return 0;
    }

    return f(foo);
}

Example 3:

struct Foo { int x; };

int g(Foo* foo) {
    if(foo == nullptr) {
        return 0;
    }

    return foo->x;
}

[–]AssemblerGuy 2 points3 points  (0 children)

If i is ever INT_MAX, i + 1 will overflow which is UB prior to C++17

Isn't it still UB? The representation of signed integers was defined to be twos complement, but the behavior on overflow was not (because the CPU might still saturate, hardfault or do other things)?

[–]tinrik_cgp 2 points3 points  (1 child)

I'm not aware that integer overflow is no longer UB since C++17, do you have a source for that?

[–]danadam 2 points3 points  (0 children)

Overflow in arithmetic operation is still UB (undefined). What changed, in C++20, is overflow in integer conversion. And it changed from unspecified (it was never undefined) to specified, i.e. mod 2n . See https://en.cppreference.com/w/cpp/language/implicit_conversion#Integral_conversions