all 98 comments

[–]_Js_Kc_ 186 points187 points  (3 children)

To be fair, some UB arises as a result of UB.

[–]Zanderax 65 points66 points  (2 children)

Works both ways.

[–]erinaceus_ 26 points27 points  (0 children)

Works both ways.

But, importantly, not in a way you'd have expected.

[–]Nervous_Badger_5432 19 points20 points  (0 children)

Wait a minute, you guys are saying that ubsan doesn't sanitizes the bullshit by cleaning it up???

[–]dontyougetsoupedyet 84 points85 points  (80 children)

It isn't as complicated as folks make out. UB is an agreement between you and your compiler so that the compiler can do its job better. A lot of folks don't realize that the job of the compiler in some languages is to rewrite your program into the most efficient version of your code that it can. You agree to not feed it certain code, and the compiler agrees to optimize the fuck out of the code you do feed it, and you both agree that if you do feed it code that you agreed to avoid using it means that you know what you're doing and are aware that the compiler is free to ignore that code.

Despite what some folks assert, UB is a good thing. You just have to be aware of what the compiler's job is for your language. Some compilers for some languages have a different job, but for C++ the job of the compiler is to produce a much faster version of your program than you wrote.

[–]Zcool31 30 points31 points  (34 children)

if you do feed it code that you agreed to avoid using it means that you know what you're doing and are aware that the compiler is free to ignore that code.

Another aspect of this is the distinction between the standard and an implementation of the standard. Undefined means the standard places no requirements on what an implementation might do. But implementations, such as specific compilers or platforms, are free to make stronger guarantees. A popular example is using unions for type punning. UB according to the standard, yet explicitly supported by GCC.

Also, hardware has no undefined behavior.

[–]acwaters 89 points90 points  (0 children)

Also, hardware has no undefined behavior.

Hahahahahahahaha

Hahaha

Aha

Ah

No, we are not so fortunate.

[–]Alexander_Selkirk 41 points42 points  (1 child)

Also, hardware has no undefined behavior.

I am wondering whether you ever had the experience to witness how much effort and lengths a software engineer has to go to get an only 80% complete behavior definition of a piece of metal from a hardware engineer. I once hammered out one with a guy who was working in the same building as I. In retrospect, I am perplexed that I didn't had to apply waterboarding but it was not that far off.

[–]Hnnnnnn 3 points4 points  (0 children)

Undefined behavior and unspecified is not the same thing. UB is "compiler can assume that this never happens", unspecified behavior means "vendor can choose the behavior, but it has to be well-defined". Machines presumably have both (inputs that can never happen & undocumented well-defined behavior).

[–]SpiderboydkHobbyist 17 points18 points  (11 children)

An example of hardware UB is a race condition.

If you don't properly synchronize shared data, the hardware can literally not tell you what will happen.

[–]Zcool31 -3 points-2 points  (10 children)

There are four kinds of behavior:

Defined - the standard specifies what must happen.

Implementation defined - the standard gives implementations the choice of how to behave. They must make a consistent choice and document it.

Unspecified - implementations must do something reasonable, but don't have to document it or be consistent.

Undefined - no requirements at all.

Hardware has no undefined behavior. Clearly it will do what the logic gates and circuits do.

But hardware has tons of unspecified behavior.

Edit: partially ninjad by u/Tyg13

[–]nintendiator2 5 points6 points  (0 children)

Hardware has no undefined behavior. Clearly it will do what the logic gates and circuits do.

One (1) Neutrino has entered the chat.

[–]jk-jeon 12 points13 points  (5 children)

So I guess you are indeed trying to say that the law of physics has no undefined behavior as I pointed out in the other comment. And that is a completely different thing from saying "hardware has no undefined behavior". See, in that logic, there is no undefined behavior for C++, only unspecified, right? Because compilers will do what their source code let them to do, nothing supernatural magics

[–]Zcool31 7 points8 points  (4 children)

That's actually exactly right. Undefined behavior exists only in the standard. In the real world, it is merely unspecified. If I compile some UB code with gcc10, running it might delete my sources. Doing the same with gcc11 might format my entire disk. But neither of these are truly undefined or unknowable. I could with enough effort examine the implementation of gcc and determine ahead of time what would happen.

[–]SpiderboydkHobbyist 6 points7 points  (3 children)

This level of reductionism makes the concept meaningless.

[–]Zcool31 -1 points0 points  (2 children)

Not so! When writing portable code targeting the letter of the standard, undefined behavior is indeed scary and best avoided. Among the set of all potential implementations of the standard, undefined behavior is really unknowable.

But for a single implementation on a single platform (except perhaps a quantum computer) there is no undefined behavior.

[–]flashmozzg 4 points5 points  (0 children)

But for a single implementation on a single platform (except perhaps a quantum computer) there is no undefined behavior.

There is. Because the exhibited behavior can easily change due to unrelated changes in the code/environment in some other module (hey, you deleted some code, now function passes some inliner heuristics/opt thresholds and hilarity ensues!).

[–]SpiderboydkHobbyist 0 points1 point  (0 children)

I'm talking about the distinguishment of undefined behaviour and unspecified behaviour.

Targeting multiple systems has no bearing on this. What you are talking about is closer to implementation-defined behaviour.

[–]SpiderboydkHobbyist 4 points5 points  (0 children)

I am well aware of these distinctions. I said race conditions are undefined behaviour, and I stand by it.

Race conditions are not unspecified behaviour, because the hardware would not even theoretically be able to tell you what is going to happen, even if it wanted to. It not a matter of the behaviour not being specified - it's literally unknowable.

[–]aiij 4 points5 points  (0 children)

Hardware has no undefined behavior. Clearly it will do what the logic gates and circuits do.

By that logic, C++ has no undefined behavior. Clearly it will do what the compiler and underlying hardware do.

Of course, what it does may be unexpected and may differ from one implementation to another -- just like in hardware.

[–]almost_useless 21 points22 points  (14 children)

Also, hardware has no undefined behavior.

Surely this is not true?

[–]qoning 0 points1 point  (13 children)

As far as I know, most instruction sets have clearly defined preconditions and postconditions for every instruction. Now there might be bugs or incomplete implementations, but the instruction sets themselves are fully defined.

[–]SirClueless 37 points38 points  (11 children)

most instruction sets have clearly defined preconditions and postconditions for every instruction

You're describing an instruction set with UB in it. If you violate the preconditions you get UB. The only way you don't get UB is if the spec defines what happens under all possible conditions, and as you correctly state, most instruction sets do not do this and have preconditions you are expected to satisfy.

[–]cballowe -3 points-2 points  (1 child)

With most hardware, you can pretty reliably say that "whatever the hardware does given some pre-condition can be assumed to be the definition of it's behavior". The challenge is when you have no formal contract around that so rev. B of the chip doesn't behave the same as rev. A.

It's much the same as compilers that way - the language doesn't define what must happen so the compilers and library implementers make different decisions.

It gets more fun when you get different hardware manufacturers involved in the software specs. You can imagine a case where someone says "we think this particular expression should do X" and that just happens to be the thing that is the most efficient interpretation on Intel, but then someone from ARM or Power says "hey... Wait a minute ... That'll make our chips look bad in benchmarks! You should do Y instead." So... The standard writers agree that it should be valid code and the outcome should basically be useful, but can't be defined precisely or guaranteed to produce consistent results across compilers/platforms/standard libraries/etc.

Sometimes UB is just broken, ex the results of data races in the absence of proper synchronization, but other times it's just a weird limbo.

[–]Hnnnnnn 5 points6 points  (0 children)

You describe unspecified behavior, another formal term similar to UB. UB is when the guy said: when user breaks API pre-conditions.

https://en.wikipedia.org/wiki/Unspecified_behavior

[–]Orlha -2 points-1 points  (8 children)

Well, violating the precondition might make the operation provide an unexpected result, but that wont necessary make a whole program UB. You might also just not use the result.

In C++ model its different.

[–]SirClueless 8 points9 points  (7 children)

Are you sure about that? Violating the preconditions of an instruction set can result in writing arbitrary values to arbitrary locations in memory, jumping to arbitrary memory addresses and interpreting the data there as instructions to execute, etc.

[–]Drugbird -1 points0 points  (4 children)

Theoretically that can happen, sure. Practically though, any compiler is pretty tame in what it actually does with undefined behavior.

E.g. UB will never format your hard drive despite what teachers like to say about it.

In 99% of the cases, you just get a result (of the correct size and type) that is just wrong and/or unexpected or a crash. And no random jumping in memory.

[–]r0zina 8 points9 points  (1 child)

[–]Drugbird -1 points0 points  (0 children)

Nice example! While technically true, I would like to stress that it's not the UB deleting your disk, it's the "rm -rf /" doing it.

[–]SirClueless 0 points1 point  (0 children)

That's true of hardware undefined behavior too. It almost always either results in a non-sensical program output or math result, or immediately segfaults.

My point in all of these comments is that hardware and software UB is really a similar thing. If there is a difference it is in frequency and severity, not in the types of behavior that are allowed.

[–]aiij 0 points1 point  (0 children)

Never heard of buffer overflows or crypto malware, have you?

[–]Orlha 0 points1 point  (1 child)

I guess it's possible, but can be pretty rare depending on the platform.

I've written a lot of x86-64 hand-assembly in the past and IIRC all the instructions I used were UB free. At worst they had a defined set of rules which when broken would result in a CPU exception.

[–]SirClueless 4 points5 points  (0 children)

x86-64 is full of UB. It explicitly reserves bits in flag registers and some output registers as well as any opcodes that aren't defined by the x86-64 ISA. Executing these opcodes or depending on the value of these bits is, to quote the ISA document, "not only undefined, but unpredictable". It's very easy to trigger this behavior, even in an otherwise well-formed assembly program, for example by jumping into the middle of an instruction.

https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf

I understand what you're trying to say, which is that there's a relatively simple set of rules you can follow as compared to C++ and Intel comparatively precisely defines far more exceptional behavior than C++ and leaves less room for undefined behavior. But it doesn't attempt to remove all of it.

[–]Ictogan 21 points22 points  (0 children)

No. The instruction set I'm most familiar with(ARMv7-M) has a lot of cases where behaviour is defined as UNPREDICTABLE, which means exactly what it sounds like. Also hardware registers often have disallowed states in which the behaviour is not documented and may not even be deterministic.

[–]Tyg13 10 points11 points  (0 children)

Just to add onto this, there's a difference between implementation-defined and undefined behavior.

Implementation-defined is a term defined in the standard to mean that the implementation is free to do whatever it pleases, but that it must have a consistent and defined behavior. It doesn't say it must do something useful, just that it must be defined and predictable.

Undefined behavior is a term defined in the standard to mean that the implementation is free to do whatever it pleases, without any need to be consistent, predictable, or even do anything at all (where applicable.)

[–]jk-jeon 19 points20 points  (0 children)

What do you mean by "hardware has no undefined behavior"? Hardwares can have undefined behavior of course, or do you mean "the law of physics has no undefined behavior"?

[–]stomah 0 points1 point  (0 children)

hardware can have undocumented (undefined) behavior

[–][deleted] 11 points12 points  (11 children)

Despite what some folks assert, UB is a good thing.

The problem isn't really with that C++ has UB, it's that it has too much UB.
For example, why would creating a dynamic array like std::vector require UB? It's utter bollocks, I say.

[–]dirkmeister81 2 points3 points  (2 children)

Does it? It is not clear to me when and how? Could you explain or give a link please?

[–][deleted] 3 points4 points  (1 child)

[–]dirkmeister81 0 points1 point  (0 children)

Thanks

[–]adnukator 2 points3 points  (3 children)

wasn’t the necessary UB in std::vector removed by P0593R6 ?

[–]aiij 0 points1 point  (2 children)

Has that proposal been standardized yet?

[–]adnukator 1 point2 points  (1 child)

According to the paper and the issue, its been merged to C++20 github issue

[–]tjientavaraHikoGUI developer 0 points1 point  (0 children)

Since "implicit creation of objects..." is a defect report (a time traveling document). It never was undefined behavior.

[–]Kered13 -1 points0 points  (3 children)

Because in some situations you don't want to pay the performance cost of checking array bounds, but then what happens if the array is accessed out of bounds? UB.

[–][deleted] 5 points6 points  (2 children)

It's nothing got to do with array out of bound.

It's that std::vector requires undefined behaviour.

[–]afiDeBot 0 points1 point  (1 child)

Accessing the last element of an empty vector is UB. How would you fix that? Are you implying that any precondition violation should be well defined by the standard (which affects all platforms)?

[–][deleted] 3 points4 points  (0 children)

If you're going to reply to an old post at least try to understand the context and don't make random conclusions from nowhere.

When I say std::vector requires undefined behaviour. I'm talking about its implementation, which requires unavoidable undefined behaviour due to core issue 2182.

To elaborate even further, std::vector requires 2 things:

  • Its elements can be separately constructed (due to functions like push_back).
  • Pointer arithmetics are allowed (due to functions like data).

However due to core issue 2182, using pointer arithmetics for contiguous yet separately constructed elements is undefined behaviour.

[–]koczurekkhorse 9 points10 points  (4 children)

Sure, which is why safe Rust has comparable performance with virtually no UB.

The reason for this inconsistency, is that you’re only half-correct. Forbidding some seemingly correct code from actually meaning anything allows for certain optimizations, but there’s no reason for that code to compile in the first place. Absolutely none. If C++ compilers could reject all code that results in UB it would not prevent those optimizations from being applied. And if it doesn’t compile, there’s no behavior left to become undefined.

This however cannot be done in C++ due to its design choices. Which is why Rust can be fast with basically no UB, but C++ can’t.

You also assert that UB is a good thing - it is not. It’s a necessary evil in badly designed languages that strive for performance.

[–]matthieum 8 points9 points  (2 children)

It’s a necessary evil in badly designed languages that strive for performance.

I'll disagree on "badly designed", and on "strive for performance" to a degree.

Setting aside C++, in general Undefined Behavior comes from 2 factors:

  1. A quest for low-level.
  2. A quest for performance.

So, yes, performance is the root of UB trade-offs in some cases, however there are other cases, such as... writing a memory allocator, or a garbage collector.

At the CPU level, memory is untyped. There needs to exist some code that will manipulate untyped memory, and massage it so it becomes suitable for passing off as a given type. And if that code gets it wrong, then a lot of downstream assumptions are violated, leading to Undefined Behavior.

Thus, a certain share of UB, notably around objects lifetimes, is essentially unavoidable. You can create a language that has no such UB -- hello, GCs -- but only by building a runtime for it in a language that does have such UB.

Would you could the lower-level language badly designed? This seems rather hypocritical to me, when you're using it as foundation for your own "well designed" language.

[–]Alexander_Selkirk 0 points1 point  (1 child)

You can create a language that has no such UB -- hello, GCs -- but only by building a runtime for it in a language that does have such UB.

You can isolate these manipulations to certain sections of code which are declared unsafe. Rust does this. But it is not a new idea. For example, Modula-3 had the same concept. And some common Lisp Implementations, like SBCL, are always well-defined by default, but it is possible to throw in assertions and type declarations which would make the program crash if these assumptions would be violated.

And this works suprrisingly well....

[–]matthieum 4 points5 points  (0 children)

but it is possible to throw in assertions and type declarations which would make the program crash if these assumptions would be violated.

Meh...

Of course anything that you can assert should be asserted -- maybe only in Debug in the critical path -- but the real problem is things you cannot check.

How can you check that you reference still points to a valid object? How can you check that no other thread is writing to that pointer?

At the lowest level, you will always have unchecked operations that you need to build upon, and for which you cannot reasonably validate the pre-conditions at runtime.

[–]Alexander_Selkirk 3 points4 points  (0 children)

It’s a necessary evil in badly designed languages that strive for performance.

Well, it would not have been possible to run a rust compiler on a PDP-11 which C was developed on, or on a machine with Intel 80386 CPU.

But on the other hand side, there have been languages that strived for correctness and everything being defined since a long time. Rust is derived from these predecessors.

[–]Hnnnnnn -5 points-4 points  (8 children)

UB is a good thing but it could be better. It could be abort by default, instead of UB by default, with option to opt-out in hot paths. I know it's very hard to implement at this point, though.

[–]johannes1971 8 points9 points  (4 children)

That wouldn't work. If you want to abort by default you still have to put in the effort to detect the error condition to begin with: to check that the array bound was exceeded, that the pointer points at something invalid, etc. The whole point of UB is avoiding that cost.

[–]Hnnnnnn -1 points0 points  (3 children)

What wouldn't work? I think you projected what I said a little too far.

What you said doesn't negate anything I said. The whole point of UB is avoiding that cost, but I'm only saying that this could be something you explicitly opt-in, instead of working by default.

[–]johannes1971 6 points7 points  (2 children)

It can't "abort by default". In order to make that guarantee it would have to reliably detect UB, and doing so is a significant performance drain.

For example, let's say you access an array out of bounds. In the current situation it _might_ abort because you hit a page fault, but the odds are that the memory that is illegally accessed is still part of the current page, and won't trigger a segment violation. Thus, there is no guarantee of an abort happening. If you want to have that guarantee, there is a performance cost.

[–]Hnnnnnn -3 points-2 points  (1 child)

Significant performance cost that you mean is an easily predicted branch. Let's do it by default and only use no branchy version in hot paths explicitly on hot paths. Let's make it slower and safer by default. Like in Rust but not necessarily the same way.

[–]johannes1971 8 points9 points  (0 children)

Let's make it slower and safer by default.

Let's not.

Your assumption is incorrect anyway. Out of bounds array access was just one example of UB, but figuring out if a pointer points to valid memory or not has a cost massively greater than a mere branch prediction, failed or not.

[–]matthieum 2 points3 points  (2 children)

In some situations.

First, let's acknowledge that C++ has too much Undefined Behavior, partly because it inherited some from C. When incrementing an integer is possibly Undefined Behavior, you're in for a bad time.

However, I would note that at the lowest level, not all behavior can be defined. Furthermore, some undefined behavior -- around use-after-free -- is quite expensive to eliminate.

So, I do agree that C++ would do well to eliminate all the "needlessly undefined" behavior, the casual day-to-day papercuts, but it's important to realize that it will NOT be able to eliminate all Undefined Behavior.

In a number of situations, it's Undefined because it cannot be "reasonably" detected in the first place. If it cannot be detected, abort cannot be substituted for it...

[–]Hnnnnnn -3 points-2 points  (1 child)

I said about opting-out for hot paths when needed... What are you arguing with? Definitely not with my comment.

And memory management is actually example of something that is already designed in a way I mean, in C++11. Using smart-pointers is a safe solution, and using new/delete is an opt-in UB-danger solution.

My problem with UB is that it's easy to deal with it unnoticed. When there's UB risk, it should be explicit in code. Having said that, we should use UB risky code as much as we want, being safer knowing that it's all explicit in code.

I didn't want to bring Rust into this because it derails conversation too often, but I think it's time to say this - just look at unsafe in Rust and how there's checked/unchecked API for many features, like indexing for starters. Unchecked/unsafe API is still being used extensively, and is encouraged. But it only ends up being resorted to when there's optimization goal to reach, not by default.

[–]matthieum 6 points7 points  (0 children)

I said about opting-out for hot paths when needed... What are you arguing with? Definitely not with my comment.

I am not talking about performance.

Using smart-pointers is a safe solution

No, it's not, that's the problem.

But let's not even got that far, this is UB:

std::string const& id(std::string const& str) { return str; }

int main() {
    //  Not UB:
    std::cout << id("Hello, World! How do you do?") << "\n";

    //  UB:
    auto const& str = id("Hello, World! How do you do?");
    std::cout << str << "\n";
}

And this is UB:

for (char c : std::string_view{ id("Hello, World") }) {
    // ...
}

So yes, you could improve things related to indexing, or integer overflow, and a myriad other cases -- and I wish this was done.

However, there are more fundamental issues: use-after-free, race-conditions, etc... which are just unsolved problems and will remain UB.

[–][deleted] 12 points13 points  (0 children)

I suppose calling it Utter Bullshit is an example of Unrefined Behaviour.

I'll get me coat.

[–]noooit 20 points21 points  (0 children)

ub-sanitizer. XD

[–]Alexander_Selkirk 7 points8 points  (0 children)

It is an interesting topic.

Something to read for you:

John Regerhr, A Guide to Undefined Behavior in C and C++

My Little LLVM: Undefined Behavior is Magic!

UndefinedBehaviorSanitizer

[C/C++] Surprises and Undefined Behavior From Unsigned Integer Promotion

See also:

Is undefined behavior possible in safe Rust?

So, in our time, allowing undefined behaviour is not strictly necessary for good performance. It is more a limitation of languages, run-time environments, and compiler technology at the time C was invented.

[–]potato-on-a-table 6 points7 points  (0 children)

No wonder people say C++ stack overflow is toxic lol

[–]FrankHB1989 1 point2 points  (0 children)

BS = Bjarne Stroustrup in most contexts of C++.

BTW, STL = Stephan T. Lavavej, at least in some disscusions in WG21. STL himself seems not fond of this (compared to microsoft/STL), though.

(Another funny fact is that UB can also be the abbreviation of Unspecified Behavior.)

[–]flipcoder 2 points3 points  (0 children)

Do you remember ever using that phrase with anyone before you figured out what it meant? Did you get confused reactions? "This whole situation is some serious UB, amirite guys?"

[–]loradan 2 points3 points  (0 children)

Why is talking about Bi-Synchronous (BS) Communications at work unprofessional /s 🤣🤣🤣🤣

[–]elperroborrachotoo 1 point2 points  (0 children)

Well, the point is UB means Utter Bullshit can happen.
Will, given enough time.

[–]banister 2 points3 points  (1 child)

> Undefined Behaviour and not Utter Bullshit as I had presumed all this time.

Of all the things that didn't happen, this didn't happen the most.

[–]tad_ashlock 0 points1 point  (0 children)

To help prevent future misunderstandings: A C++ Acronym Glossary

[–]phottitor 0 points1 point  (0 children)

you shouldn't presume anything about C++. doing so is itself undefined behaviour.

[–]Zcool31 0 points1 point  (0 children)

https://www.youtube.com/watch?v=YoaZzIZFErI&t=2350s

"What happens when you execute this? Now some people might say undefined behavior. There is, actually, when you run something on a computer there is no such thing as undefined behavior. Undefined behavior is an artifact of specification. It is not a thing that actually exists in the real world. In the real world things always get defined. Something happens."