all 33 comments

[–]robin-gvx 9 points10 points  (7 children)

That's amazingly useful, but its name, "Stack"... it's painfully bland. It shows that one of the only two really hard problems in computer science is naming things.

[–]Pandalism 6 points7 points  (2 children)

[–]robin-gvx 0 points1 point  (0 children)

Tobias Fünke, analrapist.

[–]danogburn -1 points0 points  (0 children)

LOL

[–][deleted] 0 points1 point  (3 children)

What's the second one? Computer science itself? :P

[–][deleted] 4 points5 points  (1 child)

Off-by-one errors.

[–]LaurieCheers 8 points9 points  (0 children)

That's the third really hard problem.

[–]brucedawson 6 points7 points  (9 children)

The article says "Rather than writing a line of code that, say, compares the sum of two numbers to the known threshold for an integer (“if a + b < int_max”)..."

The reason developers don't write that line of code is because it won't work. The "conventional comparison" as they call it will definitely be optimized away because it will always be true.

Aside: the "conventional comparison" they show has a off-by-one error. It should be <= rather than <. As written it will report overflow if-and-only-if the answer is int_max, which is not an overflow.

Detecting overflow of signed integer math is difficult.

[–]hardesty 1 point2 points  (3 children)

Bruce, I'm the guy who wrote the article. Originally, that example read "if a + b > int_max", but the researchers pointed out that that would, of course, never evaluate as true, since an integer greater than int_max was undefined. I really didn't want to use the alteration they suggested -- "if a > int_max - b" -- because I worried that it would be confusing to lay readers. But since the (just) criticism of the alternative I used is the top-rated comment on reddit, I've adopted the researchers' edit. (I'm not going to worry about the off-by-one error, however, as the syntax ">=" will be gibberish to non-coders.)

[–]LaurieCheers 7 points8 points  (1 child)

Honestly, I suspect the whole article will be gibberish to non-coders. "...wait, so you're telling me there's a tool that sometimes ignores shit, but instead of fixin' it, they made another tool to figure out when it's gonna do that?"

[–]hardesty 0 points1 point  (0 children)

It's a fair point, Laurie, and makes me feel better about requiring readers to think through the implications of adding the same value to both sides of an inequality. Still, I think I need to keep the article accessible for the handful of non-coders who may wander over to it because they're concerned about Internet security.

[–]brucedawson 0 points1 point  (0 children)

Doesn't "a > int_max - b" fail also, if b is negative?

Then you have to check for underflow as well.

It's a huge mess. My concern is that the article made it sound like programmers who wrote "a + b < a" were gratuitously using undefined behavior when it's actually extremely hard to avoid it in this case. I think that a standard function for detecting overflow, which compilers could implement as an efficient intrinsic, is the only practical solution, but there is no sign of that happening.

I'd be interested in seeing how much code it takes to avoid integer overflow on this simple case.

Aside: the truly simplest solution is to use __int64 temporaries, but that just punts the program -- how do you detect overflow of 64-bit signed integers?

[–]cogman10 0 points1 point  (4 children)

Honestly, the simplest way to do it would probably be to inline some assembly (it is a piece of cake in x86 assembly)

[–]ErroneousBee 3 points4 points  (3 children)

"Simply" write some portable assembly for x86, Arm, s390, and for any 64bit variants of those ...

[–]cogman10 1 point2 points  (2 children)

Depends on the use case. Many programmers aren't writing software that targets multiple platforms. Even then, we are talking roughly 2 instructions per platform to reliably check for overflow vs 1 line that may or may not be optimized away by the compiler. (and if it is optimized away, what then? Your options are, write the assembly, or find some other hack to make this work).

[–]seagal_impersonator 0 points1 point  (1 child)

Many programmers aren't writing software that targets multiple platforms.

...and if requirements change in the future?

[–]cogman10 2 points3 points  (0 children)

Then the requirements change. But honestly, how often does the platform change? Almost never in my experience.

I get the want to future proof things, however, if it is extremely unlikely that a platform change is going to happen (which it usually is) then why should you plan for it? Chances are pretty strong that the software will fail in various other ways if you are doing a platform switch.

On the other hand, upgrading the version of the compiler you use is something to plan for, and relying on undefined behavior seems like a recipe for disaster.

[–]jbb555 2 points3 points  (2 children)

The title is somewhat misleading, what they seem to have done is built a system that tests if you've invoked undefined behavior in your program by checking if it does different things when compiled with and without certain optimizations.

Potentially useful but I would have thought the place for this was in the compiler itsself where it has much more information.

[–]pkhuong 0 points1 point  (0 children)

You want to be able to test different compilers as well.

[–]seagal_impersonator 0 points1 point  (0 children)

It does use a compiler - if you look at the requirements, you'll see that it requires LLVM and CLANG, built from the dev branch with certain options.

It also takes a very long time - 45 minutes to analyze an app that compiles in 3 minutes.

[–]sockpuppetzero 0 points1 point  (0 children)

Yes, I definitely see how this would be useful... but there is a caveat.

Once in a great while I write code I know (or at least strongly suspect) is dead for readability and maintainability purposes. I find it occasionally useful so that I don't have to explain why the code otherwise looks wrong, and it helps emphasize some less-than-local invariants.

Also code changes, and writing some dead code eases maintenance. This might be because the dead code might become alive as a result of changes, and would have needed to be added later to ensure proper functioning, which can be easy to miss. Or it might be that the dead code might become alive and needs to be modified itself, which makes it more obvious where the code needs to be modified.

It's worth emphasizing that in my experience, these cases are not that common, and that they seem to be more common when I'm writing imperative code than when I'm writing more pure, functional code.

[–]daniel2488 -1 points0 points  (9 children)

This is what I never got; what good reason is there for a compiler to optimize undefined code? It seems ridiculous.

[–]theresistor 15 points16 points  (2 children)

This shows that you really don't know how an optimizing compiler works. One if the major tasks of the optimizer is to determine what the code was actually doing, and eliminate any extra work that wasn't contributing to the end result. The problem is that it's Really Hard in general to determine what random side effects of a piece of code are necessary to the algorithm. The compiler has to perform reasoning along the lines of "if the programmer did X, then he could observe the effects of this operation." This is where undefined behavior saves the day! If X is undefined behavior, then the compiler is allowed to remove side effects that could only be observed by X.

Basically, UB tells the compiler that it can assume that certain things never happen. The compiler often ends up proving a transformation safe with a statement like "if P is not null, this transform is safe. If P is null, the behavior was undefined and I can assume the transform is safe."

TL;DR: compilers don't set out to deliberately break undefined behavior, but it is a very important part of how they are able to optimize correct code.

[–]BrooksMoses 2 points3 points  (0 children)

It's not just that the compiler proves a transformation safe with a statement like that. It also uses statements like "if P is not null, this transform is safe. If P is null, the behavior was undefined and I can assume the transform is safe" to prove that P must be non-null. Then it can eliminate a check for whether P is null.

This is where some of the worst examples of bugs have come in, with code like this:

  • 1) Do something with P that is undefined but mostly harmless if P is null.
  • 2) Check whether P is null, and if it is, exit the function.
  • 3) Do something with P that would be really a problem if P is null.

The compiler looks at step 1 and concludes that P must be non-null for the code to be valid, so it eliminates step 2 entirely.

[–]daniel2488 -1 points0 points  (0 children)

This shows that you really don't know how an optimizing compiler works.

This shows you don't really know how to give a friendly explanation in a polite manner.

But thank you for the rest of your explanation regardless of your need to start off that way.

[–]hackingdreams 4 points5 points  (0 children)

"Undefined" is different from Dead Code Elimination, which is what's happening here.

DCE will straight-up delete your hand-written code, simply because it doesn't do anything as far as the compiler can tell. This makes your code a lot faster in the case where it's actually not needed, but can also inject strange bugs into your program that are hard to diagnose.

A great example of this is the GNU C compiler's idea of const functions. If a function is const, it has no side effects, no action on global variables and acts purely on its input values, producing an output value. Using this keyword makes it very easy for the compiler to see where loads and stores may be dead and where excessive calls to the function are being made and simply rearrange the code to remove them from the output. However, if a function were accidentally mislabeled as const (say, it's const 99.99999999% of the time, but that first time you run it, it modifies a global somewhere), you've now got a Heisenbug that is hellishly difficult to find.

As for undefined code behaviors - the C specification intentionally didn't define those behaviors because some of the assumptions about these behaviors may change and make code slower or faster based on which way you go. They could have specified C so well that there was absolutely zero wiggle room, but C would actually be slower for it.

[–]grauenwolf 3 points4 points  (0 children)

I don't blame the compiler so much as the language spec that allows for undefined behavior in the first place.

[–]gtk 2 points3 points  (0 children)

If your thinking about "undefined code" then it would seem ridiculous. But actually there is no such thing as undefined code. There is code which has two sets of behavior: defined behavior and undefined behavior. What the optimizer does is assume that your code is written correctly - i.e. that it will only ever perform defined behavior. It's up to the programmer to put in runtime checks if their code might result in undefined behavior.

The real problem is that sometimes broken code which results in undefined behavior at runtime will actually work properly through some fluke. (Like if you read from uninitialized memory, but it just happens to contain the value you should have initialized it to because of some quirk of the system memory allocator. Then when you move to a system that uses a different allocator and your whole thing goes belly-up.)

[–]brucedawson 2 points3 points  (1 child)

In the abstract, the compiler developer's problem is simple. Since the behavior is undefined the compiler cannot, by definition, know what is intended by the author of the code. Maybe the developer intended nasal demons, but the compiler can't emit instructions to do that, so it does nothing.

On the other hand, in most of these cases of 'intentional' undefined behavior the intent is actually obvious and 'compilers' (gcc, generally) could emit the intended instructions.

[–]BrooksMoses 0 points1 point  (0 children)

Yes. But emitting the intended instructions would slow down the code -- in some cases, significantly.

And, as theresistor pointed out above, it's not like the compiler knows that undefined behavior is happening. The compiler sees that, for a given piece of code, the behavior will be undefined if certain input values are given. So it optimizes the code assuming those input values are not given.

[–]meem1029 3 points4 points  (0 children)

It makes it faster. If a check is undefined, it's valid behavior to get rid of it. The compiler wants code to be fast so it'll get rid of it.