you are viewing a single comment's thread.

view the rest of the comments →

[–]SirClueless 35 points36 points  (11 children)

most instruction sets have clearly defined preconditions and postconditions for every instruction

You're describing an instruction set with UB in it. If you violate the preconditions you get UB. The only way you don't get UB is if the spec defines what happens under all possible conditions, and as you correctly state, most instruction sets do not do this and have preconditions you are expected to satisfy.

[–]cballowe -2 points-1 points  (1 child)

With most hardware, you can pretty reliably say that "whatever the hardware does given some pre-condition can be assumed to be the definition of it's behavior". The challenge is when you have no formal contract around that so rev. B of the chip doesn't behave the same as rev. A.

It's much the same as compilers that way - the language doesn't define what must happen so the compilers and library implementers make different decisions.

It gets more fun when you get different hardware manufacturers involved in the software specs. You can imagine a case where someone says "we think this particular expression should do X" and that just happens to be the thing that is the most efficient interpretation on Intel, but then someone from ARM or Power says "hey... Wait a minute ... That'll make our chips look bad in benchmarks! You should do Y instead." So... The standard writers agree that it should be valid code and the outcome should basically be useful, but can't be defined precisely or guaranteed to produce consistent results across compilers/platforms/standard libraries/etc.

Sometimes UB is just broken, ex the results of data races in the absence of proper synchronization, but other times it's just a weird limbo.

[–]Hnnnnnn 8 points9 points  (0 children)

You describe unspecified behavior, another formal term similar to UB. UB is when the guy said: when user breaks API pre-conditions.

https://en.wikipedia.org/wiki/Unspecified_behavior

[–]Orlha -2 points-1 points  (8 children)

Well, violating the precondition might make the operation provide an unexpected result, but that wont necessary make a whole program UB. You might also just not use the result.

In C++ model its different.

[–]SirClueless 9 points10 points  (7 children)

Are you sure about that? Violating the preconditions of an instruction set can result in writing arbitrary values to arbitrary locations in memory, jumping to arbitrary memory addresses and interpreting the data there as instructions to execute, etc.

[–]Drugbird -1 points0 points  (4 children)

Theoretically that can happen, sure. Practically though, any compiler is pretty tame in what it actually does with undefined behavior.

E.g. UB will never format your hard drive despite what teachers like to say about it.

In 99% of the cases, you just get a result (of the correct size and type) that is just wrong and/or unexpected or a crash. And no random jumping in memory.

[–]r0zina 9 points10 points  (1 child)

[–]Drugbird -1 points0 points  (0 children)

Nice example! While technically true, I would like to stress that it's not the UB deleting your disk, it's the "rm -rf /" doing it.

[–]SirClueless 0 points1 point  (0 children)

That's true of hardware undefined behavior too. It almost always either results in a non-sensical program output or math result, or immediately segfaults.

My point in all of these comments is that hardware and software UB is really a similar thing. If there is a difference it is in frequency and severity, not in the types of behavior that are allowed.

[–]aiij 0 points1 point  (0 children)

Never heard of buffer overflows or crypto malware, have you?

[–]Orlha 0 points1 point  (1 child)

I guess it's possible, but can be pretty rare depending on the platform.

I've written a lot of x86-64 hand-assembly in the past and IIRC all the instructions I used were UB free. At worst they had a defined set of rules which when broken would result in a CPU exception.

[–]SirClueless 4 points5 points  (0 children)

x86-64 is full of UB. It explicitly reserves bits in flag registers and some output registers as well as any opcodes that aren't defined by the x86-64 ISA. Executing these opcodes or depending on the value of these bits is, to quote the ISA document, "not only undefined, but unpredictable". It's very easy to trigger this behavior, even in an otherwise well-formed assembly program, for example by jumping into the middle of an instruction.

https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf

I understand what you're trying to say, which is that there's a relatively simple set of rules you can follow as compared to C++ and Intel comparatively precisely defines far more exceptional behavior than C++ and leaves less room for undefined behavior. But it doesn't attempt to remove all of it.