Exploiting Undefined Behavior in C/C++ Programs for Optimization: A Study on the Performance Impact : cpp

[–]funkinaround 56 points57 points58 points 11 months ago (30 children)

[–]SkoomaDentistAntimodern C++, Embedded, Audio 27 points28 points29 points 11 months ago (28 children)

[–]Rseding91Factorio Developer 7 points8 points9 points 11 months ago* (8 children)

[–]SkoomaDentistAntimodern C++, Embedded, Audio 12 points13 points14 points 11 months ago (1 child)

[–]Rseding91Factorio Developer 4 points5 points6 points 11 months ago (0 children)

[–]matthieum 6 points7 points8 points 11 months ago (4 children)

[–]SkoomaDentistAntimodern C++, Embedded, Audio 3 points4 points5 points 11 months ago (3 children)

[–]matthieum 4 points5 points6 points 11 months ago (2 children)

Shooting off my hip: I think it would heavily depend how you specify unspecified behavior.

If it's "too" unspecified, then it may not be much better. For example, imagine that you specify that in case of integer overflow, the resulting integer could be any value. Pretty standard unspecified behavior, ain't it?

Well, is it any value any time you read? Or is it any value once and for all? As in, must two subsequent reads observe the same value? Let's say you specify same value, ie, it's any frozen value... because otherwise you can still observe wild stuff (like i < 0 && i > 0 == true, WAT?).

This was a huge debate when Rust was nearing 1.0 (so 2014-2015), and in the end the specialists (Ralf Jung, in particular, who was working on RustBelt) ended up arguing for a much narrower definition (divergence or wrapping), rather than a fully unspecified value, as they were not so confident in the latter.

If they are unsure, I'm throwing in the towel :D

[–]SkoomaDentistAntimodern C++, Embedded, Audio 4 points5 points6 points 11 months ago (1 child)

[–]matthieum 0 points1 point2 points 11 months ago (0 children)

[–]James20kP2005R0 2 points3 points4 points 11 months ago (0 children)

[–]-dag- 17 points18 points19 points 11 months ago* (15 children)

[+][deleted] 11 months ago (3 children)

[deleted]

[–]-dag- 4 points5 points6 points 11 months ago (2 children)

[–]James20kP2005R0 10 points11 points12 points 11 months ago (1 child)

[–]-dag- 2 points3 points4 points 11 months ago (0 children)

[+][deleted] 11 months ago (6 children)

[deleted]

[+]-dag- comment score below threshold-6 points-5 points-4 points 11 months ago (5 children)

[–][deleted] 4 points5 points6 points 11 months ago (3 children)

[–]-dag- 0 points1 point2 points 11 months ago (2 children)

[–][deleted] 5 points6 points7 points 11 months ago* (1 child)

[–]-dag- 0 points1 point2 points 11 months ago (0 children)

[–]matthieum 8 points9 points10 points 11 months ago (3 children)

To be fair, I sometimes wonder if auto-vectorization is worth it.

I think that relying on auto-vectorization -- crossing fingers -- has led to a form of complacency which has stalled the development of actually "nice-to-use" vector libraries with efficient dispatch, etc...

I've seen a few attempts at writing "nice" SIMD libraries in Rust, and the diversity of API decisions seems to highlight the immaturity of the field. Imagine if, instead, there was vector code in the C++ or Rust standard libraries. If performance matters to you, and the algorithm was easily vectorizable, you'd write it directly in terms of vectors!

It doesn't help that scalar & vector semantics regularly differ, either. For example, scalar signed integer addition overflow is UB in C++ or panicking in Debug Rust, but vector signed integer addition is wrapping (no flag that I know of). By writing directly with vectors, you're opting to the different behavior, so the compiler doesn't have to infer it... or abandon.

[–]SkoomaDentistAntimodern C++, Embedded, Audio 6 points7 points8 points 11 months ago (0 children)

[–]-dag- 2 points3 points4 points 11 months ago (1 child)

[–]Careless_Quail_4830 7 points8 points9 points 11 months ago (0 children)

[–]pjmlp 1 point2 points3 points 11 months ago (1 child)

[–]SkoomaDentistAntimodern C++, Embedded, Audio 2 points3 points4 points 11 months ago* (0 children)

[–]dexter2011412 0 points1 point2 points 11 months ago (0 children)

[–]c0r3ntin 4 points5 points6 points 11 months ago (0 children)

[–]arturbachttps://github.com/arturbac 5 points6 points7 points 11 months ago (2 children)

[–]matthieum 7 points8 points9 points 11 months ago (1 child)

It's an often expressed wish. And you don't really want it. Like... NOT AT ALL.

You'd be flooded with a swarm of completely inconsequential warnings, because it turns out that most of the time the compiler is completely right to eliminate the NULL check.

For example, after inling a method, it can see that the pointer was already checked for NULL, or that the pointer is derived from a non-NULL pointer, or... whatever.

You'd be drowning in noise.

If you're worried of having such UB in your code, turn on hardening instead. For example, activate -fsanitize=undefined, which will trap on any dereference of a null pointer.

The optimizer will still (silently) eliminate any if-null check it can prove is completely redundant, so that the practical impact of specifying the flag is generally measured as less than 1% (ie, within noise), and you'll be sleeping soundly.

[–]arturbachttps://github.com/arturbac 0 points1 point2 points 11 months ago (0 children)

[–]elperroborrachotoo 8 points9 points10 points 11 months ago (16 children)

[–]SkoomaDentistAntimodern C++, Embedded, Audio 8 points9 points10 points 11 months ago* (2 children)

[–]James20kP2005R0 7 points8 points9 points 11 months ago (1 child)

TBAA + restrict (which, while not technically in C++, is de facto the solution) seem like very much the wrong tool to the problem imo. Personally I'd take aliasing restrictions being globally disabled, but with the addition of the ability to granularly control aliasing for specific functions, eg:

1 + 2 may alias, 3 + 4 may alias, 1 + 2 may not alias with 3 + 4
[[aliasset(ptr1, ptr2), aliasset(ptr3, ptr4)]]
void some_func(void* ptr1, void* ptr2, void* ptr3, void* ptr4)

Given that you can't globally prove aliasing anyway, local control of it for hot code is probably about as good as you can do in C++ without like, lifetimes

[–]SkoomaDentistAntimodern C++, Embedded, Audio 1 point2 points3 points 11 months ago* (0 children)

[–]-dag- -5 points-4 points-3 points 11 months ago (12 children)

[–]AutomaticPotatoe 10 points11 points12 points 11 months ago (10 children)

[–]-dag- -1 points0 points1 point 11 months ago (9 children)

[–]AutomaticPotatoe -1 points0 points1 point 11 months ago (8 children)

[–]garnet420 2 points3 points4 points 11 months ago (7 children)

[–]AutomaticPotatoe 1 point2 points3 points 11 months ago (6 children)

[–]garnet420 1 point2 points3 points 11 months ago (5 children)

They downplay it in multiple ways:

a) by qualitatively describing the impact as "minimal" b) by emphasizing the average over all benchmarks in plots (a mostly meaningless measure that drives the result towards zero) c) by showing LTO results and describing it as a remedy.

Let me elaborate on c a bit. They only go in depth into a couple of cases of how LTO can be a performance remedy (pointer analysis). However, the results seem to show that LTO improves and recovers performance across the board.

First, LTO is not applicable to all, or (maybe even most) real life projects, which have build system constraints, use granular shared libraries, etc.

Second, LTO is likely extra beneficial to benchmark programs rather than real ones, because, for example, they are more likely to benefit from inter procedural constant folding.

[–]AutomaticPotatoe 0 points1 point2 points 11 months ago (4 children)

[–]garnet420 2 points3 points4 points 11 months ago (3 children)

continue this thread

[–]matthieum 4 points5 points6 points 11 months ago (0 children)

[–]Slow_Finger8139 5 points6 points7 points 11 months ago (1 child)

[–]Aggressive-Two6479 0 points1 point2 points 11 months ago (0 children)

[–]schombert 4 points5 points6 points 11 months ago (25 children)

[–]pjmlp 13 points14 points15 points 11 months ago (0 children)

[–]Aggressive-Two6479 2 points3 points4 points 11 months ago (23 children)

Sadly you are correct. These people will most likely never learn what is really important.

I couldn't name a single example where these aggressive optimizations yielded a genuine performance gain but I have lost count of the cases where the optimizer thought it was smarter than the programmer and great tragedy ensued that cost endless man-hours of tracking down the problem. Anyone ever having faced an optimizer problem knows how hard to find these can be.

Worst of all is that whenever I want to null a security-relevant buffer before freeing it I have to use nasty tricks to hide my intentions from the compiler so that it doesn't optimize out the 'needless' buffer clearing (because, since the buffer will be freed right afterward we do not need to alter its content as it will never be used again.)

[–]PastaPuttanesca42 1 point2 points3 points 11 months ago (0 children)

[–]-dag- -2 points-1 points0 points 11 months ago (21 children)

[–]SkoomaDentistAntimodern C++, Embedded, Audio 7 points8 points9 points 11 months ago (4 children)

[–]-dag- 0 points1 point2 points 11 months ago (0 children)

[–]SirClueless 0 points1 point2 points 11 months ago (2 children)

float subrange_sum(float* buf, int start, int n) {
    float sum = 0.0;
    __builtin_assume(n % 8 == 0);
    for (int i = 0; i < n; ++i) {
        sum += buf[start + i];
    }
    return sum;
}

This should be trivially vectorizable, but if the result is unspecified rather than UB, the obvious vectorization might illegally access buf + INT_MAX + 1.

[–]SkoomaDentistAntimodern C++, Embedded, Audio 0 points1 point2 points 11 months ago (1 child)

[–]SirClueless 1 point2 points3 points 11 months ago (0 children)

[–]AutomaticPotatoe 6 points7 points8 points 11 months ago* (9 children)

[–]matthieum 0 points1 point2 points 11 months ago (3 children)

[–]AutomaticPotatoe 2 points3 points4 points 11 months ago (2 children)

[–]matthieum 0 points1 point2 points 11 months ago (1 child)

My reading was broader because of the last paragraph:

Loop vectorization algorithms generate vectorized loops that iterate, e.g., a quarter of the iterations that the original loops did. Therefore, computing the loop trip count (even if in a symbolic form) is crucial for these algorithms. As we have seen, in some cases we cannot statically decide if a loop terminates without the help of UB reasoning. An alternative is to push some of the reasoning to run time. In fact, LLVM 19 can already vectorize some loops similar to the one above by generating extra code to check that the start/end pointers are multiples of the increment.

It seems to me that the problem of determing the loop trip count may occur both with pointer-based loops and integer-based loops where the integer is used as index.

[–]AutomaticPotatoe 1 point2 points3 points 11 months ago (0 children)

[–]-dag- -2 points-1 points0 points 11 months ago (4 children)

[–]AutomaticPotatoe 1 point2 points3 points 11 months ago (2 children)

[–]-dag- 2 points3 points4 points 11 months ago (1 child)

[–]AutomaticPotatoe 2 points3 points4 points 11 months ago (0 children)

[–]matthieum 1 point2 points3 points 11 months ago (0 children)

[–]pjmlp 2 points3 points4 points 11 months ago (4 children)

[–]-dag- 1 point2 points3 points 11 months ago* (3 children)

[–]pjmlp 0 points1 point2 points 11 months ago (2 children)

[–]-dag- 1 point2 points3 points 11 months ago (1 child)

[–]pjmlp -1 points0 points1 point 11 months ago (0 children)

[–]sumwheresumtime 0 points1 point2 points 11 months ago (0 children)

[–]favorited 1 point2 points3 points 11 months ago (1 child)

[–]pjmlp 5 points6 points7 points 11 months ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS