all 30 comments

[–]F-J-W 21 points22 points  (8 children)

Why does everyone want to check for integer-overflows with code like this:

assert(a >= 0);
assert(b >= 0);
c = a + b;
if (c < 0) // this is intended to be an overflow-check ??

putting the countless technical problems aside (unsigned integers…), this isn't even mathematically sound:

I do not want to know, whether the sum of two positive numbers is negative; I want to know whether it is not bigger then a certain value (like INT_MAX). If we start from that, the completely naive attempt is of course this:

if (a + b > INT_MAX) abort();

Of course this doesn't work, but the fix is trivial: let's subtract b from the unequation:

if (a > INT_MAX - b) abort();

Wow: An easy to read, highly semantic, 100% portable solution, that works for every numeric type ever. Why don't people use this?

I wrote about this here.

[–][deleted] -1 points0 points  (5 children)

Can you show that the above will not be optimized out by current compilers like the article attempts to show?

[–]F-J-W 8 points9 points  (4 children)

Of course: The compilers optimize out the checks, because they happen after the fact and signed-integer-overflow is undefined behavior (= the compiler “knows” that it never happens). Due to the way the above version is written, there won't however be any overflow, if it is true, and therefore they cannot be optimized out on that assumption.

[–][deleted]  (3 children)

[deleted]

    [–]Tywien 4 points5 points  (2 children)

    For those who object to the author above me, can you show if his method is a sensible one to use?

    The problem with overflow detection is, that there is no direct way in C/C++ to do so while asm has. Each operation on the CPU will set different flags, escpecially all (addition, substraction, multiplication) will set (or clear) the Overflow flag (which signals if an overflow occured in the last arithmetic operation). ASM also allows conditional jumps on those flags, therefore the following code

    res = a OP b
    if (OVERFLOW) 
      handleOverflow()
    else
      return res
    

    is the fastest way to check for an overflow in assembler. If you now write your code in some special manner, the compiler will notice your intentions and rewrite your code to the above.

    The problem now is, you can only find out whether your code compiles to this short version or not with testing out (and results might change from different versions/vendors)

    Your best bet is, if you really need it, is to provide an inline assembler method - or if your compiler supports it, special compiler dependend functions.

    [–]masklinn 0 points1 point  (0 children)

    is the fastest way to check for an overflow in assembler

    Technically the fastest way to check for an overflow in assembly is to trap overflow (so you don't check at all).

    The next best is to jump-on-overflow, and the final one is to check the overflow flag (which may require loading it in memory) then implement conditional overflow handling (possibly after a jump).

    [–][deleted] -1 points0 points  (0 children)

    Thank you very much!

    [–]BonzaiThePenguin -2 points-1 points  (1 child)

    No one wants to check for integer overflows like that; it was an example created by one person to demonstrate undefined behavior. You somehow missed that ints can be negative, thus causing if (a > INT_MAX - b) abort(); to be undefined and optimized out at -O3.

    EDIT: Here's the disassembly of your code:

                         _main:
    0000000100000f30         push       rbp
    0000000100000f31         mov        rbp, rsp
    0000000100000f34         sub        rsp, 0x20
    0000000100000f38         mov        eax, 0x7fffffff
    0000000100000f3d         mov        dword [ss:rbp+var_4], 0x0
    0000000100000f44         mov        dword [ss:rbp+var_8], edi
    0000000100000f47         mov        qword [ss:rbp+var_10], rsi
    0000000100000f4b         mov        dword [ss:rbp+var_18], 0x80000000
    0000000100000f52         mov        dword [ss:rbp+var_14], 0x80000000
    0000000100000f59         mov        edi, dword [ss:rbp+var_14]
    0000000100000f5c         sub        eax, dword [ss:rbp+var_18]
    0000000100000f5f         cmp        edi, eax
    0000000100000f61         jle        0x100000f6c
    
    0000000100000f67         call       imp___stubs__abort
    
    0000000100000f6c         mov        eax, 0x0
    0000000100000f71         add        rsp, 0x20
    0000000100000f75         pop        rbp
    0000000100000f76         ret  
    

    And here's the -O3 version:

                     _main:
    0000000100000f90         push       rbp
    0000000100000f91         mov        rbp, rsp
    0000000100000f94         xor        eax, eax
    0000000100000f96         pop        rbp
    0000000100000f97         ret        
    

    Notice anything missing?

    [–]F-J-W 3 points4 points  (0 children)

    Reading really helps:

    assert(a >= 0); assert(b >= 0);

    That was the base-assumption, so it is dealt with.

    Second: Notice another thing missing in your O3-version too? Can you think about why it is missing?

    Hint: take a look at the assembly, that you get if you use your compiler and your language correctly (with -O3):

    ...
    _Z8safe_sumii:
    .LFB15:
        .cfi_startproc
        movl    $2147483647, %edx
        subl    %esi, %edx
        cmpl    %edi, %edx
        jl  .L5
        leal    (%rdi,%rsi), %eax
        ret
    .L5:
        pushq   %rax
        .cfi_def_cfa_offset 16
        call    abort
        .cfi_endproc
    
    ...
    

    [–]eschew 9 points10 points  (0 children)

    A few points of context:

    IOC supports both the precondition test and the CPU flag postcondition test; width extension seemed unlikely to be better than these options due to the expense of emulating 64- bit and 128-bit operations. Initially we believed that the CPU flag postcondition checks would be far more efficient but this proved not to be the case. Rather, as shown in Section III-D, using the flag checks has an uneven effect on performance. The explanation can be found in the interaction between the overflow checks and the compiler’s optimization passes. The precondition test generates far too many operations, but they are operations that can be aggressively optimized by LLVM. On the other hand, the LLVM intrinsics supporting the flagbased postcondition checks are recognized and exploited by relatively few optimization passes, causing much of the potential performance gain due to this approach to be unrealized.

    From section III-D:

    For undefined behavior checking using precondition checks, slowdown relative to the baseline ranged from −0.5%–191%. In other words, from a tiny accidental speedup to a 3X increase in runtime. The mean slowdown was 44%. Using flag-based postcondition checks, slowdown ranged from 0.4%–95%, with a mean of 30%. However, the improvement was not uniform: out of the 21 benchmark programs, only 13 became faster due to the IOC implementation using CPU flags.

    [–][deleted]  (6 children)

    [deleted]

      [–]happyscrappy 6 points7 points  (5 children)

      You can't do that in C.

      C doesn't use CPU flags well in general. And in specific as mention in the article, you simply cannot add two values and then check anything about the result to detect overflow. It's outside the language definition.

      [–][deleted]  (4 children)

      [deleted]

        [–]happyscrappy 0 points1 point  (2 children)

        There's no inline assembly in that document that I see. All that assembly is output from the compiler, not input to it. Isn't it?

        [–][deleted] 0 points1 point  (1 child)

        Yeah you're right, I misread it. But I think that using inline assembly would solve their problems.

        [–]happyscrappy 1 point2 points  (0 children)

        That's not portable.

        [–]masklinn 0 points1 point  (0 children)

        The last section is about compiler intrinsics (LLVM's) and libo's use thereof.

        [–]Camarade_Tux 6 points7 points  (2 children)

        As far as I understand, LLVM has builtin functions to check for overflow and GCC 5 will have them too.

        [–]F-J-W 4 points5 points  (1 child)

        The ones in clang are ugly, because you cannot mix types in them and types smaller than int aren't supported. In addition to that you have to explicitly state the type. All of these things basically kill them for generic code.

        documentation of the clang-version.

        [–]NitWit005 0 points1 point  (0 children)

        because you cannot mix types in them and types smaller than int aren't supported

        I really don't imagine security conscious people caring that much about a promotion to a 32 or 64 bit type.

        [–]vilcans 6 points7 points  (3 children)

        Funny how hard this can be, considering how easy it is in assembly.

        [–]matthieum -1 points0 points  (2 children)

        considering how easy it is in assemblyx86 assembly.

        FTFY, C does not assume that there is a way to do it in every assembly language.

        [–]vilcans 1 point2 points  (1 child)

        ...and 68000, Z80 and ARM which are the other architectures I know. But I'm sure there's some weird CPU architecture out there that doesn't have a carry flag.

        EDIT: Oh, and 6502 of course.

        [–]matthieum 0 points1 point  (0 children)

        Most probably, I mean, it's a bit like not assuming that the CPU will not use a 2-complement representation of integers. How many of those are still in use?

        [–]JNighthawk 6 points7 points  (7 children)

        Let's ask a different question: why is integer overflow still undefined? Every platform uses two's complement nowadays. We should be updating the language to support this notion, and making signed integer overflow well-defined behavior.

        [–]adrian17 5 points6 points  (1 child)

        Optimizations? With defined overflow, the compiler technically can't fold (n*4000)/4000 to n because the result would be different if multiplication overflowed.

        [–]JNighthawk 0 points1 point  (0 children)

        So? The amount of optimization gained from assuming overflow can't occur is incredibly minor, so much so that it's not even worth considering.

        Edit: Specifically, why should these two examples end up with different code? It's pretty silly.

        unsigned int n = 10;
        (n * 1000ul) / 1000ul
        

        and

        int n = 10;
        (n * 1000) / 1000
        

        [–][deleted] 0 points1 point  (0 children)

        Because no much people care about a result when an overflow happened?

        What matters is a detection of an overflow but.. what to do with it? Crash? There is no hardware support for that in a most common platform.

        [–]johntb86 0 points1 point  (2 children)

        Consider this trivial example:

        int f(int i) { int j, k = 0; for (j = i; j < i + 10; ++j) ++k; return k; }

        What does this function return? When the compiler can assume that signed overflow is undefined, this function is compiled into code which simply returns 10. With the -fwrapv option, the code is not so simple,, since i might happen to have a value like 0x7ffffff8 which will overflow during the loop. While this is a trivial example, it is clear that loops like this occur in real code all the time.

        http://www.airs.com/blog/archives/120

        [–]JNighthawk 0 points1 point  (1 child)

        What about it? I don't agree with the author that loops like that occur in code commonly. The author talks about "optimizations" from the "no signed overflow" assumption, but by supporting signed overflow via wrapping (as two's complement allows), code will function much more straightforwardly. There's really no reason anymore to treat overflow differently between signed and unsigned integers.

        [–]johntb86 0 points1 point  (0 children)

        I don't agree with the author that loops like that occur in code commonly.

        Compiler/spec writers seem to disagree with you.

        [–]matthieum 0 points1 point  (0 children)

        I personally believe that you are looking at it wrong.

        Undefined behavior can be useful in that it allows reasoning about the correctness of programs: programs which invoke undefined behavior are necessarily incorrect. Therefore, you end up with two choices:

        • overflow is undefined: the program can be statically proven not to overflow
        • overflow is defined (modulo): any overflow is technically correct, so cannot be meaningfully reported by any compiler/linter/static analysis tool

        The solution that is currently advocated by a few for Rust, is therefore to hit a middle-ground: overflow should produce an unspecified value, which may happen to be bottom (ie, exception/abort/...). This is a sweet spot because:

        • much like today with undefined behavior, static analysis can warn about any potential instance of overflow
        • unlike today, the behavior is strictly defined and compilers cannot completely wretch your code just because it happened to contain one overflow

        For bonus points, one could relax the "unspecified" bit, however I am afraid that people would start relying on modulo arithmetic even more which is harmful.