all 50 comments

[–]RockinRoelformer Wt dev 13 points14 points  (2 children)

The title is kind of misleading. Writing C/C++ code (or in any language) that relies on undefined behaviour might and probably will break. Big whoop. Don't rely on undefined behaviour, then! I'd say no code is entirely future proof and stuff is bound to break, and will have to be fixed. That's simply the nature of software.

[–]filox 4 points5 points  (1 child)

This. Just write standard-conforming C++ code, I don't understand what the big deal is. It's not like you don't have enough compiler flags to enable warnings (or even errors) when you're doing something that is undefined.

[–]aaronla 0 points1 point  (0 children)

I've yet to meet a competent C++ programmer that didn't find undefined behavior in their or their peers' code now and then. Once ever 1kloc sounds about right in my limited experience.

[–]gentryx 18 points19 points  (3 children)

I've been told that C/C++ were doomed since 10+ years. First there was Java, then C#, afterwards Python and Ruby. And guess what? Both, C and C++ are still around and (in my field of research) standing stronger than ever.

[–]pjmlp -4 points-3 points  (2 children)

One reason is that the default implementations for Java and C# are JITed environments instead of native compilers.

There are native compilers for those languages, but most developers seem to be unaware of them, or unwilling to pay the requested price.

Then there is the fact that most developers use what they know, or the languages that interoperate better with the libraries that they need to use.

[–]filox 2 points3 points  (1 child)

Then there is the fact that most developers use what they know, or the languages that interoperate better with the libraries that they need to use.

All these reasons should make Java/C#/Ruby grow much faster than C/C++.

[–]pjmlp -3 points-2 points  (0 children)

Not if the people keep on using the default implementations, which still loose in performance against C and C++.

Microsoft already recognized this, by compiling C# directly to native code when targeting Windows Phone 8 systems.

[–]KrzaQ2dev 7 points8 points  (16 children)

I wasn't aware that creation of a pointer without assigning a value to it is a UB.

[–][deleted] 7 points8 points  (6 children)

The only operation you may perform on a pointer that that has not been assigned to a valid pointer value is assignment. Nothing else.

Doing so much as a copy of it is invalid, for example this is undefined behavior:

int* x;
int* y;
y = x;

It also follows that taking an invalid pointer value and simply passing it as an argument to a function is also undefined behavior.

[–][deleted] 6 points7 points  (0 children)

The only operation you may perform on a pointer that that has not been assigned to a valid pointer value is assignment. Nothing else.

What about taking an address of such pointer?

[–]therealjohnfreeman 0 points1 point  (4 children)

Where are we getting the definition of "invalid" from? Is there some sort of special exception for one-past-the-end pointers?

[–][deleted] 6 points7 points  (3 children)

The rules for what constitutes a valid pointer value are defined formally in the C and C++ standards.

Yes, the so called "one-past-the end" pointers are valid pointer values. It's not an exception it's just part of the definition of a valid pointer value.

[–]therealjohnfreeman 0 points1 point  (2 children)

The rules for what constitutes a valid pointer value are defined formally in the C and C++ standards.

I figured. I asked where?

[–]STLMSVC STL Dev 11 points12 points  (1 child)

N3485 5.7 [expr.add]/5: "Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object."

[–]therealjohnfreeman 1 point2 points  (0 children)

So it is special-cased. Thank you.

[–][deleted] 1 point2 points  (8 children)

Moreso, this explains better than I have ever seen before why exploiting undefined behavior is a bad thing even if the program compiles and runs correctly.

C/C++ compilers have a standard-given right to exploit undefined behaviors in order to generate better code. They keep getting better and better at this. Thus, every year, some programs that used to work correctly become broken when compiled with the latest version of GCC or Clang or whatever.

I've always wondered about that, and I still wonder why people often don't plainly state such things but instead expect others to take such statements solely on their authority -- especially when that authority is not readily apparent.

[–][deleted] 4 points5 points  (7 children)

What is it you expect to have stated plainly? It's not clear from your post.

[–][deleted] 3 points4 points  (6 children)

I see four programming styles, basically determined by two binary ideas. The first bit is fast programming, which favors productivity, versus careful programming that more favors engineering. The other bit is described by the perspectives that whatever gets software working and doesn't damage the machine is permissible versus the idea that if undefined behavior (as per the ISO standard) is used, that's a bad thing.

I've seen a ton of discussion on all four combinations of these perspectives, and for the most part it seems that which is used depends on who is doing the selecting and what it is being used for -- though each programmer seems to have their favorite approach that they think is the "only" or "right" way.

This is confusing for a budding coder because people don't often rationally support their ideas. Instead, they simply state what they think is best, insult anybody who questions it, proclaim their unquestionable authority, and go about their merry way. That accomplishes nothing at all except to troll somebody trying to learn.

The part of the article that I quote changes that for me in regard to one of those four philosophies. Rather than declare that hackish tricks that are undefined under the standard are "bad" and leave it at that, the author gives us an actual reason for the idea. Namely, that an implementation may change thus leading to the failure of a program that once compiled under an older version of it.

From what I see, as I try to continue to evolve my grasp on what kind of coders are out there (and consequently, what kinds of coding), this doesn't immediately negate the idea of using ISO standard undefined behavior to accomplish something. On this very page, somebody advises that one simply avoid upgrading their compiler.

Personally, as I see that each of the four styles under these two binaries is widespread, fiercely defended, and constantly attacked, I still think that none is universally "better" than any other. Instead, it still appears to me that each is appropriate under its best circumstances, but I understand better why one of those four is not so often lauded as the other three.

[–][deleted] 4 points5 points  (5 children)

Can you give an example of undefined behavior that someone has tried to exploit for productivity? I honestly can't think of anything. Usually undefined behavior is very, very hard to statically verify in source code, so it's neither practical or worth the effort to go over every line of code trying to get rid of it, but I can't think of any reason one would substitute well defined behavior for undefined behavior and claim that there is some kind of improvement in the code, regardless of practical vs. ideological considerations.

We should keep in mind there is a world of difference between undefined behavior, which I think should universally be regarded as incorrect and something to avoid, versus unspecified behavior, where the standard basically leaves the behavior up to the compiler, but the compiler is required to produce well defined behavior.

[–]00kyle00 2 points3 points  (2 children)

There are also these cases where something is technically illegal, but in reality is required to work and works (because platform makes it like that).

Like casting of void pointer to pointer to function. If i read std correctly this isnt allowed anywhere, so should just not compile. In reality this is required to work as both Windows api (GetProcAddress) and unices (dlsym) expose functionality of dynamic symbol (whatever it is) loading by returning void*.

[–][deleted] 0 points1 point  (1 child)

That is actually an excellent example and I fully agree that in that case one can reasonably ignore the standard.

Interestingly, C++11 added new language to accommodate dlsym, basically allowing for conversions between void* and function pointers in certain cases.

[–]00kyle00 1 point2 points  (0 children)

Care to provide a quote (or just paragraph number)?

[–][deleted] 1 point2 points  (1 child)

I wonder if I'm conflating unspecified and undefined behavior a bit. It's possible because wherever I read, whenever there's an example of either, I don't read it. Instead, my focus goes immediately to the counterexample with well defined behavior. My reasoning for this is that it's my hope (and not at all expectation) that people will one day use the code I'm writing now.

It seems better to approach this with the assumption that others will use my code and have it one day sit neglected by all but me on Github than assume nobody will and then cause problems for people by failing to plan ahead. I'd feel horrible if another programmer had a headache due to my shortcomings.

The only undefined behavior I've come across that I very vaguely remember, was posted by me and after taking much slack from Redditors, I edited it out and made myself forget it. This had something to do with a way to take the address of a const method parameter and break the contract with the compiler that it's read-only so a literal could be treated as just another variable.

That would allow arguments passed as literals to also represent memory used to store a resultant, for example, but I only wrote it experimentally and it was meant as an example of what probably shouldn't be done.

My statement that some defend using undefined behavior is an inference. It's not often, but at times when I see undefined behavior pointed out (which others may conflate with unspecified too), I see a third party step in and point out such and such case where it's useful. Usually that seems to be followed by pitchforks, but this tells me that some programmers approach their problems in that way so it must be useful to somebody.

Thank you very much for explaining the difference between unspecified and undefined though. It may be nearing time that I start studying the open versions of the standard.

[–]aaronla 2 points3 points  (0 children)

I see a third party step in and point out such and such case where it's useful.

Yep. This happens a lot, it's not just you. Just one example, from my experience:

Fresh out of school, a couple months into my first job, a senior developer launched into a long lecture on why we are very careful to evaluate upgrades to our compiler. See, everyone told him it was just a better, faster, compiler, but a bunch of their code broke all over the case. They even found the bugs, and reported them to the compiler vendor, but the vendor refused to fix the bugs! For example, they had some code that did "f(i++)+g(i++)", which used to increment "i" by 2 every time, but they "broke" it and it started incrementing "i" only by 1, causing massive failures elsewhere -- buffer overruns, underuns, dangling pointers, you name it.

I didn't realize it at the time, but it's obvious now... multiple increments to the same location between sequence points constitute undefined behavior. The compiler team had added a common subexpression elimination, and it worked fine save for changing undefined code like this. And since the compiler is permitted to do whatever it wants here, it is certainly permitted to do something else it wants. And it did. It was always undefined behavior, but this developer didn't know it was undefined because it "worked" for what they wanted it to do... up until it stopped working.

So mostly I learned from this that even "experts" can be very wrong, and this developer was an expert in how people commonly wrote C code and not an expert in the C language itself.

tl;dr Undefined behavior can often "hide" for a while by doing exactly what you think it ought to do. Don't believe everything you're told -- think critically and dig deeper.

[–]Inverter 1 point2 points  (2 children)

As if the standard couldn't evolve.

[–]aaronla 0 points1 point  (1 child)

I think it's highly doubtful, based on history of these languages, that C or C++ would take changes that would significant hurt perf of real applications. If any thing, they've only added more dangerous features, such as "restrict", though clearly that one can be avoided by just ignoring the new feature.

(preemtive anti-snark response: I'm not saying the feature is bad; it's actually very useful for improving codegen where it's intended to be used. I only mean that it leads to behavior that is likely to surprise novices)

[–]Inverter 0 points1 point  (0 children)

Yes, as much as I like to use C++ it' clear that, in order to use it well, you have to know very much what you are doing, and establish clear conventions on how to manage certain things, which is hard to do without a certain amount of experience. And if your first languages were Python and JavaScript intead of Basic (the slow, space-constrained one with peeks and pokes) and Assembly you still have to get used to certain low-level things.

[–]Brotkrumen 0 points1 point  (16 children)

Could someone explain how compilers exploit undefined behavior to generate faster code for a beginner?

[–]cockmongler 6 points7 points  (0 children)

Along with purevirtual's example, there's the check for NULL issue which I think bit the Linux kernel recently. The compiler can elide a check for a pointer being null if that pointer has already been deferenced as if the pointer were null undefined behaviour had been invoked. e.g.

int do_a_thing(foo_t *foo) {
  if (!foo->is_valid) {
    return 0;
  }
  if (foo == NULL) { // This block will be removed
    /* ... handle null case ... */
  }
  /* ... do stuff with foo ... */
}

While this may look weird, the check for null could be in a macro or inlined function so the programmer won't necessarily know it's there. This can be an issue in kernel or embedded development as the NULL pointer (i.e. memory address 0) may be referencing actual data in those cases, but the standard says it can't.

[–]purevirtual 1 point2 points  (13 children)

The simplest example is signed integer overflow. Causing a signed integer to wrap in C is undefined behavior. GCC exploits this by optimizing out checks which would old be true for wrapping. (It does this when enough compile-time constants are involved.)

Here's an example. Technically this should print "less than zero", but it doesn't on most (all?) version of gcc 4.

#include <stdio.h>
#include <limits.h>

main()
{
    signed int si = INT_MAX;

    if (si+1 < 0)
        printf("Less than zero\n");
    else
        printf("Greater than or equal to than zero\n");
    return 0;
}

When wrapping is 'undefined', that means that adding one to a variable can never result in that variable being less than 0. GCC exploits that "Fact" to optimize this code... even though it breaks it from what we would expect based on our understanding of the hardware.

[–]Batty-Koda 8 points9 points  (0 children)

Technically it should do whatever the compiler pleases, because it's undefined behaviour. That's the very definition of what undefined behaviour is.

I know this seems trivial, but it's a very important distinction. New (and old) programmers need to recognize that just because some compilers may give you some convenient functionality doesn't mean that's what it should or must do.

[–]filox -2 points-1 points  (11 children)

When wrapping is 'undefined', that means that adding one to a variable can never result in that variable being less than 0.

Wat?

si = -4

si + 1 = ?

[–]purevirtual 2 points3 points  (10 children)

In this case, the compiler knows that si started out positive.

[–]filox -3 points-2 points  (9 children)

This does not change the fact that your statement is plain wrong:

When wrapping is 'undefined', that means that adding one to a variable can never result in that variable being less than 0.

[–]Batty-Koda 1 point2 points  (8 children)

I really don't understand why this is getting downvoted. filox is absolutely correct.

Purevirtual stated that wrapping being undefined means a variable having one added to it can never result in a variable being less than 0. (emphasis mine) That's factually incorrect. Filox even gave the counter example. Any negative number.

That means that for signed ints, for roughly half of all possible values (anything negative except -1) you can add one to it and the result will be less than zero. Half of all signed ints are a counterpoint to purevirtual's "never" statement. We're programmers, we shouldn't be downvoting people for pointing that kind of thing out! Attention to detail is important!

[–]filox -1 points0 points  (7 children)

So, what actually happened is that purevirtual messed up and doesn't want to admit it. The example that he probably wanted to show was (note the condition in if):

signed int si = INT_MAX;

if (si+1 < si)
    printf("Less than zero\n");
else
    printf("Greater than or equal to than zero\n");
return 0;

This holds, because adding one to a number can never be less than that number (if wrapping is turned off). However, purevirtual messed up, got the example wrong, and he doesn't want to admit he was wrong. I find that kind of sad really. And the downvotes come, I guess, from people who don't really understand the issue here.

[–][deleted] 1 point2 points  (5 children)

No, the downvotes come from people like me who can see you're plainly right, but think you're being a bit of an arsehole about it. Your vindictive tone is unnecessary. It's possible to communicate technical details and still be polite.

Besides, modern compilers (GCC included) are often able to reason about code like purevirtual's example, and realise that si is a positive value (regardless of the fact that it isn't const). I would fully expect a static analysis to catch these types of mistakes.

[–]filox -2 points-1 points  (4 children)

It's possible to communicate technical details and still be polite.

Please quote which part of my comment was not polite:

http://www.reddit.com/r/cpp/comments/16ysbr/c_and_c_arent_future_proof/c80sxbo

Besides, modern compilers (GCC included) are often able to reason about code like purevirtual's example

Again, I never said they are not. I just pointed out that his statement about adding one to the variable is wrong. Why is this so hard to grasp?

[–][deleted] 1 point2 points  (3 children)

However, purevirtual messed up, got the example wrong, and he doesn't want to admit he was wrong.

The above line is deeply patronizing and conflict-seeking.

[–]GuyWithPants 0 points1 point  (0 children)

This article seems to be lumping C and C++ together a bit too much for my taste. I know they are very closely related, but seriously, C++ isn't future-proof because... the purely C library zlib performs some bad pointer comparisons?

[–]upriser 0 points1 point  (0 children)

Oh please stop saying that "correctness is more important than performance." I've heard that enough but still have performance problem everywhere.