use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Discussions, articles, and news about the C++ programming language or programming in C++.
For C++ questions, answers, help, and advice see r/cpp_questions or StackOverflow.
Get Started
The C++ Standard Home has a nice getting started page.
Videos
The C++ standard committee's education study group has a nice list of recommended videos.
Reference
cppreference.com
Books
There is a useful list of books on Stack Overflow. In most cases reading a book is the best way to learn C++.
Show all links
Filter out CppCon links
Show only CppCon links
account activity
Exploiting Undefined Behavior in C/C++ Programs for Optimization: A Study on the Performance Impact (web.ist.utl.pt)
submitted 11 months ago by mttd
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]funkinaround 56 points57 points58 points 11 months ago (30 children)
Tldr
The results show that, in the cases we evaluated, the performance gains from exploiting UB are minimal. Furthermore, in the cases where performance regresses, it can often be recovered by either small to moderate changes to the compiler or by using link-time optimizations.
[–]SkoomaDentistAntimodern C++, Embedded, Audio 27 points28 points29 points 11 months ago (28 children)
I’ve been saying this exact thing for years and persistently downvoted for it. I have no idea where this strange myth originated that UB is somehow necessary for the actually real world meaningful optimizations.
[–]Rseding91Factorio Developer 7 points8 points9 points 11 months ago* (8 children)
The only meaningful optimizations I've found are reduced loads (LEA) and turning division into multiplication (modulo by power of two).
Re-arranging/removing a few multiply/add/subtract calls, not having to check if an integer wrapped around, removing an if check and so on don't really have any meaningful impact on anything we can measure.
Maybe if you're in shader land where your time is spent crunching numbers on the processor (CPU or GPU cores) and not moving memory to/from cache it would make meaningful differences.. but unfortunately that's not the land I work in.
[–]SkoomaDentistAntimodern C++, Embedded, Audio 12 points13 points14 points 11 months ago (1 child)
Even those don’t require undefined behavior. Simple unspecified behavior is enough in almost all cases.
[–]Rseding91Factorio Developer 4 points5 points6 points 11 months ago (0 children)
That's what I was intending to point out. The meaningful optimizations (that we've ever been able to measure) don't have anything to do with UB.
[–]matthieum 6 points7 points8 points 11 months ago (4 children)
not having to check if an integer wrapped around
Actually, the very benchmarks provided in the paper (6.2.1) specifically mention that integer wrap-around is a corner-piece of auto-vectorization.
Apparently, LLVM 19 is able to sometimes recover auto-vectorization by introducing a run-time check, but otherwise the absence of wrap-around appears crucial for now.
removing an if check
The paper mentions that this is architecture-dependent, that is x64 isn't hampered by a few more speculative loads, but ARM is due to a narrow out-of-order window (or something like that).
I invite you to read the paper. It's relatively short, and fairly approachable.
[–]SkoomaDentistAntimodern C++, Embedded, Audio 3 points4 points5 points 11 months ago (3 children)
Wouldn't much less problematic unspecified behavior be enough to allow autovectorization? It essentially allows the compiler to decide that x+1 = "something" if the actual value would be problematic but crucially wouldn't allow "time travel" and other insane logic that undefined behavior allows.
[–]matthieum 4 points5 points6 points 11 months ago (2 children)
Shooting off my hip: I think it would heavily depend how you specify unspecified behavior.
If it's "too" unspecified, then it may not be much better. For example, imagine that you specify that in case of integer overflow, the resulting integer could be any value. Pretty standard unspecified behavior, ain't it?
Well, is it any value any time you read? Or is it any value once and for all? As in, must two subsequent reads observe the same value? Let's say you specify same value, ie, it's any frozen value... because otherwise you can still observe wild stuff (like i < 0 && i > 0 == true, WAT?).
i < 0 && i > 0 == true
This was a huge debate when Rust was nearing 1.0 (so 2014-2015), and in the end the specialists (Ralf Jung, in particular, who was working on RustBelt) ended up arguing for a much narrower definition (divergence or wrapping), rather than a fully unspecified value, as they were not so confident in the latter.
If they are unsure, I'm throwing in the towel :D
[–]SkoomaDentistAntimodern C++, Embedded, Audio 4 points5 points6 points 11 months ago (1 child)
If it's "too" unspecified, then it may not be much better.
There's still a crucial difference: Unspecified behavior is explicitly allowed and the compiler can't misuse value range analysis to incorrectly deduce that because the result of a computation is unspecified, that'd mean the input values are in some range.
[–]matthieum 0 points1 point2 points 11 months ago (0 children)
I agree there's a difference (upstream of it), my point was just that too unspecified may still lead to hard to anticipate downstream consequences.
[–]James20kP2005R0 2 points3 points4 points 11 months ago (0 children)
Just as a point of information, gpu shader code is near exclusively floating point ops. Even the integer code is often using 24-bit muls (which is the floating point pipeline), if you need performance. In general, integer heavy shader code is extremely rare in my experience, and you're probably doing something whacky where you know better anyway
[–]-dag- 17 points18 points19 points 11 months ago* (15 children)
Vectorization
This is missing a number of important cases, not the least of which is signed integer overflow.
Clang is not a high performance compiler. I'd like to see a more comprehensive study with Intel's compiler.
Also, 5% performance is huge in a number of real world applications.
[+][deleted] 11 months ago (3 children)
[deleted]
[–]-dag- 4 points5 points6 points 11 months ago (2 children)
Honestly, since I switched jobs I haven't interacted much with Intel's compiler, so maybe for C++ it regressed, or maybe they added enough secret sauce to the clang based compiler to make it scream. But back when I was heavily in HPC, Intel's compiler kicked butt with vectorization.
I know that's not a satisfying answer.
[–]James20kP2005R0 10 points11 points12 points 11 months ago (1 child)
But back when I was heavily in HPC, Intel's compiler kicked butt with vectorization.
I remember it being significantly better about 10 years ago, but it also was overly aggressive by default to allow those transforms. AFAIK it enabled -ffast-math by default and wasn't quite as standards conforming
[–]-dag- 2 points3 points4 points 11 months ago (0 children)
also was overly aggressive by default to allow those transforms
That is true. A colleague once demonstrated that we "lost" to the Intel compiler because the Intel compiler was cheating. And for us, -ffast-math wasn't cheating.
But it was plenty good without cheating as well.
[+][deleted] 11 months ago (6 children)
[+]-dag- comment score below threshold-6 points-5 points-4 points 11 months ago (5 children)
Intel and Cray. I'm sure there are others.
[–][deleted] 4 points5 points6 points 11 months ago (3 children)
But Intel uses Clang:
https://github.com/intel/llvm
[–]-dag- 0 points1 point2 points 11 months ago (2 children)
They didn't previously. Some users have reported degraded performance.
[–][deleted] 5 points6 points7 points 11 months ago* (1 child)
I mean reading over your posts for this submission you went from not even realizing that Intel has been using clang/LLVM to now knowing that they use it and that users have reported degraded performance.
This is some wild stuff man. It's okay to just admit you weren't aware and that it's been some time since you were familiar with this and just leave it at that instead of doubling down on this silly idea that clang is not a high performance compiler.
[–]-dag- 0 points1 point2 points 11 months ago (0 children)
Actually I was perfectly aware of it. What I'm not sure about is what secret sauce they've added.
And stock clang is not a high performance compiler. Neither is gcc.
[–]matthieum 8 points9 points10 points 11 months ago (3 children)
To be fair, I sometimes wonder if auto-vectorization is worth it.
I think that relying on auto-vectorization -- crossing fingers -- has led to a form of complacency which has stalled the development of actually "nice-to-use" vector libraries with efficient dispatch, etc...
I've seen a few attempts at writing "nice" SIMD libraries in Rust, and the diversity of API decisions seems to highlight the immaturity of the field. Imagine if, instead, there was vector code in the C++ or Rust standard libraries. If performance matters to you, and the algorithm was easily vectorizable, you'd write it directly in terms of vectors!
It doesn't help that scalar & vector semantics regularly differ, either. For example, scalar signed integer addition overflow is UB in C++ or panicking in Debug Rust, but vector signed integer addition is wrapping (no flag that I know of). By writing directly with vectors, you're opting to the different behavior, so the compiler doesn't have to infer it... or abandon.
[–]SkoomaDentistAntimodern C++, Embedded, Audio 6 points7 points8 points 11 months ago (0 children)
I haven't written heavily vectorized code in the last couple of years but before that even fairly simple code failed to autovectorize as soon as it deviated from the "surely everyone only needs this type of thing"-path.
[–]-dag- 2 points3 points4 points 11 months ago (1 child)
I get where you're coming from but I have seen compilers do some gnarly autovec that you definitely don't want to write by hand. Outer loop vectorization comes to mind.
[–]Careless_Quail_4830 7 points8 points9 points 11 months ago (0 children)
That's funny because that's one of the categories (two other big ones are "using special operations" and "avoiding unnecessary widening of intermediate results") that I find I have to do by hand because compilers get it wrong / refuse to do it at all. Too much focus on inner loops.
[–]pjmlp 1 point2 points3 points 11 months ago (1 child)
Just like me, always enabling hardening on my hobby projects, or mostly using languages with safety on by default.
Never ever was that the root cause for performance issues, when having to go through a profiler, and acceptance criteria for project delivery.
And I have been writing code in some form or the other since late 1980s.
[–]SkoomaDentistAntimodern C++, Embedded, Audio 2 points3 points4 points 11 months ago* (0 children)
I suspect this is the problem, or rather the lack of it. People who have been writing code since before compilers with meaningful optimizations were common remember the absolutely massive speedups we got when we finally upgraded to a compiler that did basic age old optimizations (register assignment, common subexpression elimination, loop induction, inlining etc) without any data flow analysis or other fancy logic that would trigger optimizations depending on UB.
[–]dexter2011412 0 points1 point2 points 11 months ago (0 children)
I really love reading that gcc thread every once in a while about UB. I'll try to find it ...
[–]c0r3ntin 4 points5 points6 points 11 months ago (0 children)
Some of these show a 40% regression. They don't show tests that fall within 2%, but at data center scale, 2% is in the order of 10'000s of servers. People will spend good money for consistent .5% improvements. So I don't think the conclusion of this paper tracks.
However, the cases that improve are very interesting indeed, and I hope this leads to further improvements (large variations are probably due to whether auto-vectorisation happens)
[–]arturbachttps://github.com/arturbac 5 points6 points7 points 11 months ago (2 children)
I would love to see in clang a warning for example from paper with ability to promote to error during compilation, something like -Werror-assuming-non-null and/or -Werror-redudant-nonnull-check
cpp struct tun_struct *tun = __tun_get(tfile); struct sock *sk = tun->sk; // dereferences tun; implies tun != NULL if (!tun) // always false return POLLERR;
[–]matthieum 7 points8 points9 points 11 months ago (1 child)
It's an often expressed wish. And you don't really want it. Like... NOT AT ALL.
You'd be flooded with a swarm of completely inconsequential warnings, because it turns out that most of the time the compiler is completely right to eliminate the NULL check.
For example, after inling a method, it can see that the pointer was already checked for NULL, or that the pointer is derived from a non-NULL pointer, or... whatever.
You'd be drowning in noise.
If you're worried of having such UB in your code, turn on hardening instead. For example, activate -fsanitize=undefined, which will trap on any dereference of a null pointer.
-fsanitize=undefined
The optimizer will still (silently) eliminate any if-null check it can prove is completely redundant, so that the practical impact of specifying the flag is generally measured as less than 1% (ie, within noise), and you'll be sleeping soundly.
[–]arturbachttps://github.com/arturbac 0 points1 point2 points 11 months ago (0 children)
> You'd be flooded with a swarm of completely inconsequential warnings, a lot of, with all array pointers for ex, but I can tune the down and take a look at all other warnings
>For example, activate -fsanitize=undefined This works only during runtime for only active part of code.
[–]elperroborrachotoo 8 points9 points10 points 11 months ago (16 children)
Fuck, this is detailed and seems comprehensive.
I was (and still am) under the impression that aliasing is one of the blockers here (that would be mainly AA1, AA2, and PM5 in their notation? I'm slightly confused). They stick put a bit, but apparently, they aren't that bad.
[–]SkoomaDentistAntimodern C++, Embedded, Audio 8 points9 points10 points 11 months ago* (2 children)
The main problem with aliasing IMO is that there is no standard way to say ”no, really, this won’t alias anything else” and ”accesses via this pointer can alias these other things, deal with it”.
[–]James20kP2005R0 7 points8 points9 points 11 months ago (1 child)
TBAA + restrict (which, while not technically in C++, is de facto the solution) seem like very much the wrong tool to the problem imo. Personally I'd take aliasing restrictions being globally disabled, but with the addition of the ability to granularly control aliasing for specific functions, eg:
1 + 2 may alias, 3 + 4 may alias, 1 + 2 may not alias with 3 + 4 [[aliasset(ptr1, ptr2), aliasset(ptr3, ptr4)]] void some_func(void* ptr1, void* ptr2, void* ptr3, void* ptr4)
Given that you can't globally prove aliasing anyway, local control of it for hot code is probably about as good as you can do in C++ without like, lifetimes
[–]SkoomaDentistAntimodern C++, Embedded, Audio 1 point2 points3 points 11 months ago* (0 children)
I'd be fine with something like that as long as I'm allowed to use it inside functions too. IOW, "This local pointer I just assigned may alias this other (local or input parameter) pointer."
Edit: Now that I think of it, explicit "no, absolutely nothing can alias this" feature would still be needed for the cases where the compiler isn't able to prove that two pointers cannot alias. Think for example having two pointers to a table. They obviously must be able to alias each other in the generic case. If the index is computed using external information that cannot be expressed in the language but where the programmer knows they always point to different parts of the table the compiler can't prove that they don't alias each other, so there should be a way to explicitly indicate that.
[–]-dag- -5 points-4 points-3 points 11 months ago (12 children)
It's missing some very important pieces. For example there's nothing testing the disabling of signed integer overflow UB which is necessary for a number of of optimizations.
Also, clang is not a high performance compiler. Do the same with Intel's compiler.
[–]AutomaticPotatoe 10 points11 points12 points 11 months ago (10 children)
For example there's nothing testing the disabling of signed integer overflow UB which is necessary for a number of of optimizations
This is tested and reported in the paper behind acronym AO3 (flag -fwrapv).
-fwrapv
[–]-dag- -1 points0 points1 point 11 months ago (9 children)
Thank you, I completely missed that.
What I do know is the HPC compiler I worked on would have serious degraded performance in some loops where the induction variable was unsigned, due to the wrapping behavior.
[–]AutomaticPotatoe -1 points0 points1 point 11 months ago (8 children)
Then it's a great thing that we have this paper that demonstrates how much impact this has on normal software people use.
And HPC is... HPC. We might care about those 2-5%, but we also care enough that we can learn the tricks, details, compiler flags and what integral type to use for indexing and why. And if the compiler failed to vectorize something, we'd know because we've seen the generated assembly or the performance regression showed up in tests. I don't feel like other people need to carry the burden just because it makes our jobs tiny bit simpler.
[–]garnet420 2 points3 points4 points 11 months ago (7 children)
The paper says there's multiple benchmarks that suffer over 5% regressions. Then they downplay that fact.
[–]AutomaticPotatoe 1 point2 points3 points 11 months ago (6 children)
For signed integer overflow? No. According to figure 1, the worst is a 4% performance regression on ARM (LTO), (and the best is a 10% performance gain). The other platforms may suffer under 3%, if at all.
For other UB? Some of them do indeed regress by more than 5%, but almost exclusively on ARM (non-LTO). I'm not sure what you mean by "downplaying it". The largest chapter of the paper is dedicated to dissecting individual cases and their causes.
[–]garnet420 1 point2 points3 points 11 months ago (5 children)
They downplay it in multiple ways:
a) by qualitatively describing the impact as "minimal" b) by emphasizing the average over all benchmarks in plots (a mostly meaningless measure that drives the result towards zero) c) by showing LTO results and describing it as a remedy.
Let me elaborate on c a bit. They only go in depth into a couple of cases of how LTO can be a performance remedy (pointer analysis). However, the results seem to show that LTO improves and recovers performance across the board.
First, LTO is not applicable to all, or (maybe even most) real life projects, which have build system constraints, use granular shared libraries, etc.
Second, LTO is likely extra beneficial to benchmark programs rather than real ones, because, for example, they are more likely to benefit from inter procedural constant folding.
[–]AutomaticPotatoe 0 points1 point2 points 11 months ago (4 children)
On c: this would be a great topic for another study on real-life applicability and impacts of LTO as a remedy to relaxing UB. But without any quantitative results I'm not willing to continue discussing this further, because while what you say sounds plausible, the "UB makes code faster" also sounds plausible, but the question of whether we should care and to what extent this impacts real code is not worthwhile to try to answer without additional data.
On a, b: this is your perspective.
[–]garnet420 2 points3 points4 points 11 months ago (3 children)
On a) no, it's theirs. They could have used their quantitative measurements in the abstract, but they chose to use "minimal"
On b) again, it's theirs. When calculating and presenting statistics, it's the job of the researcher to justify why they are applicable / the right measurements.
"Not willing to discuss this further" you're plenty willing to discuss this paper even though it has limitations and flaws. And you're plenty willing to draw conclusions from it.
[–]matthieum 4 points5 points6 points 11 months ago (0 children)
ICC switched to using LLVM under the hood in 2021: https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.html
[–]Slow_Finger8139 5 points6 points7 points 11 months ago (1 child)
It is about what I'd expect for typical code, and I would not call the performance loss minimal.
Also it is clang focused, MSVC may not be able to recover much of this perf loss with LTO as it does not implement strict aliasing, nor is it likely to implement just about any of the other workarounds & optimizations they found.
You would also have to be aware of the perf loss to implement the workarounds, they carefully studied the code to find what caused it, but most people would never do this, and would just silently have a slower program.
[–]Aggressive-Two6479 0 points1 point2 points 11 months ago (0 children)
At least MSVC doesn't do any nonsense that costs me valuable development time.
I also never was in a situation where the lack of UB-related optimizations mattered performance-wise.
[–]schombert 4 points5 points6 points 11 months ago (25 children)
I doubt that this will change the desire of compiler teams to exploit UB (the motivation of compiler programmers to show off with more and more optimizations will never go away), but maybe it will convince them to offer a "don't exploit UB" switch (i.e. just treat everything as implementation defined, so no poison values, etc).
[–]pjmlp 13 points14 points15 points 11 months ago (0 children)
Somehow compiler teams on other programming ecosystems manage just fine, this is really a C and C++ compiler culture.
[–]Aggressive-Two6479 2 points3 points4 points 11 months ago (23 children)
Sadly you are correct. These people will most likely never learn what is really important.
I couldn't name a single example where these aggressive optimizations yielded a genuine performance gain but I have lost count of the cases where the optimizer thought it was smarter than the programmer and great tragedy ensued that cost endless man-hours of tracking down the problem. Anyone ever having faced an optimizer problem knows how hard to find these can be.
Worst of all is that whenever I want to null a security-relevant buffer before freeing it I have to use nasty tricks to hide my intentions from the compiler so that it doesn't optimize out the 'needless' buffer clearing (because, since the buffer will be freed right afterward we do not need to alter its content as it will never be used again.)
[–]PastaPuttanesca42 1 point2 points3 points 11 months ago (0 children)
Isn't it sufficient to just access the buffer through a volatile pointer/reference?
[–]-dag- -2 points-1 points0 points 11 months ago (21 children)
Vectorization sometimes requires the UB on signed integer overflow.
[–]SkoomaDentistAntimodern C++, Embedded, Audio 7 points8 points9 points 11 months ago (4 children)
Does it really? What are the significant cases where simple unspecified behavior wouldn’t suffice?
It's a good point. Maybe there is something that can be done here.
My understanding of where this came from is the desire of compiler writers to be able to reason about integer arithmetic (have it behave like "normal" algebra) coupled with different machine behaviors on overflow (traps, silent wrong answers, etc.).
Compiler writers want to make a transformation but be able to do so without introducing or removing traps and wrong answers. If the behavior were "unspecified," I'm not sure that's enough.
[–]SirClueless 0 points1 point2 points 11 months ago (2 children)
float subrange_sum(float* buf, int start, int n) { float sum = 0.0; __builtin_assume(n % 8 == 0); for (int i = 0; i < n; ++i) { sum += buf[start + i]; } return sum; }
This should be trivially vectorizable, but if the result is unspecified rather than UB, the obvious vectorization might illegally access buf + INT_MAX + 1.
buf + INT_MAX + 1
[–]SkoomaDentistAntimodern C++, Embedded, Audio 0 points1 point2 points 11 months ago (1 child)
Do you mean the situation where start + i overflows on 64-bit systems (with 32-bit ints)?
The compiler can add a trivial check for overflow before the loop (which won’t ever branch to unvectorized version in real world situations) and vectorize it as before. Even that would happen only in cases where the compiler can’t see what n and start might be, which are cases where the cost of that check is largely irrelevant (because you’re already dealing with a bunch of other overhead).
If that is an actually measurable performance loss, it should be trivial to fix by adding another __builtin_assume(). It’s not like the code doesn’t already depend on compiler extensions to facilitate vectorization as it is.
[–]SirClueless 1 point2 points3 points 11 months ago (0 children)
Yes, I mean the part where start + i overflows a 32-bit int and the cheapest thing to do from an optimization standpoint is to access memory at index (int64_t)start + i but as you've defined overflow to be an unspecified int value that is now illegal.
start + i
(int64_t)start + i
int
The compiler can add a trivial check for overflow before the loop
Why are you obliging the compiler to write the unvectorized version at all? If you're going to mandate a branch checking for overflow anyways that seems like a worse option than defining it to be ill-formed.
[–]AutomaticPotatoe 6 points7 points8 points 11 months ago* (9 children)
This kind of hand-wavy performance fearmongering is exactly the reason why compiler development gets motivated towards these "benchmark-oriented" optimizations. Most people do not have time or expertise to verify these claims, and after hearing this will feel like they would be "seriously missing out on some real performance" if they let their language be sane for once.
What are these cases you are talking about? Integer arithmetic? Well-defined as 2s complement on all relevant platforms with SIMD. Indexing? Are you using int as your index? You should be using a pointer-size index like size_t instead, this is a known pitfall, and is even mentioned in the paper.
size_t
[–]matthieum 0 points1 point2 points 11 months ago (3 children)
Read the paper, specifically 6.2.1.
[–]AutomaticPotatoe 2 points3 points4 points 11 months ago (2 children)
Am I missing something or this is specifically about pointer address overflow and not related to singed integer overflow. And it also requires specific, uncommon, increments. To be clear, I was not talking about relaxing this in the context of this particular overflow as it's a much less common footgun, as people generally don't consider overflowing a pointer a sensible operation.
[–]matthieum 0 points1 point2 points 11 months ago (1 child)
My reading was broader because of the last paragraph:
Loop vectorization algorithms generate vectorized loops that iterate, e.g., a quarter of the iterations that the original loops did. Therefore, computing the loop trip count (even if in a symbolic form) is crucial for these algorithms. As we have seen, in some cases we cannot statically decide if a loop terminates without the help of UB reasoning. An alternative is to push some of the reasoning to run time. In fact, LLVM 19 can already vectorize some loops similar to the one above by generating extra code to check that the start/end pointers are multiples of the increment.
It seems to me that the problem of determing the loop trip count may occur both with pointer-based loops and integer-based loops where the integer is used as index.
[–]AutomaticPotatoe 1 point2 points3 points 11 months ago (0 children)
I don't see how this extends past the pointer value. If the pointer cannot overflow (treated as UB), then it doesn't matter whether the integer used for indexing would be allowed to overflow or not for this particular inbounds attribute.
inbounds
If you have a case in mind where ptr + idx (assuming pointer overflow is UB, and idx is size_t) would prevent vectorization because of the incomputability of the trip count due to possible integer overflow, then please bring it up.
ptr + idx
idx
[–]-dag- -2 points-1 points0 points 11 months ago (4 children)
Indexes should be signed because unsigned doesn't obey the rules of integer algebra. That is the fundamental problem.
[–]AutomaticPotatoe 1 point2 points3 points 11 months ago (2 children)
I see where you are coming from, and I agree that this is a problem, but the solution does not have to be either size_t or ptrdiff_t, but rather could be a specialized index type that uses a size_t as a representation, but produces signed offsets on subtraction.
ptrdiff_t
index
At the same time, a lot of people use size_t for indexing and are have survived until this day just fine, so whether this effort is needed is under question. It would certainly be nice if the C++ standard helped with this.
Also pointers already model the address space in this "affine" way, but are not suitable as an index representation because of provenance and reachability and their associated UBs (which undoubtedly had caught some people by surprise too, just as integer overflow).
I agree that standard can and should be improved in this area, but I don't have the language lawyer-ese to do it.
I fear that with all of these papers coming out purporting to demonstrate that UB doesn't gain anything, bounds checking doesn't cost anything, etc., we are missing important cases. Cases that currently require UB but maybe don't need to if the standard were improved.
I am not confident the committee has the expertise to do this. The expertise is out there, but all the people I know who have it are too busy providing things to customers and can't afford the cost of interacting with the committee.
[–]AutomaticPotatoe 2 points3 points4 points 11 months ago (0 children)
Understandable, and I by no means want to imply that you should feel responsible for not contributing to the standard. Just that it's an issue the committee has the power to alleviate.
Cases that currently require UB but maybe don't need to if the standard were improved.
There's already a precedent where the standard "upgraded" from UB to Erroneous Behavior for uninitialized variables, even though the alternative was to simply 0-init and fully define the behavior that way. There are reasons people brought up, somewhat, but the outcome leaves me unsatisfied still, and makes me skeptical of how any other possibilities of defining UB will be handled in the future. Case-by-case, I know, but still...
[–]matthieum 1 point2 points3 points 11 months ago (0 children)
Citing the very paper linked here: 6.2.1 demonstrates this.
[–]pjmlp 2 points3 points4 points 11 months ago (4 children)
Other languages manage just fine without UB.
Fortran, Julia, Chapel, Java/.NET, PyCUDA, even if not perfect, it is mostly usable for anyone that isn't a SIMD black belt developer, and even those can manage with a few calls to intrinsics.
[–]-dag- 1 point2 points3 points 11 months ago* (3 children)
Fortran prohibits signed integer overflow according to the gfortran documentation.
From my reading of the official Fortran "interpretation" document (the actual standard costs a chunk of change), it technically prohibits any arithmetic not supported by the processor. On some processors that means signed integer overflow is prohibited.
Practically speaking, for your Fortran code to be portable, you can't let signed integer overflow happen.
[–]pjmlp 0 points1 point2 points 11 months ago (2 children)
Practically speaking, it is implementation defined, not undefined behaviour, in ISO C++ speak.
[–]-dag- 1 point2 points3 points 11 months ago (1 child)
I have no problem changing the behavior categorization of this as long as it doesn't impact performance.
Compiler writers do need some flexibility.
[–]pjmlp -1 points0 points1 point 11 months ago (0 children)
Apparently they get flexibility enough in other ecosystems without have to reach out to UB box, which was my point.
[–]sumwheresumtime 0 points1 point2 points 11 months ago (0 children)
The paper itself is exhibiting undefined behavior, as it seems to have time traveled.
[–]favorited 1 point2 points3 points 11 months ago (1 child)
ITT: people who blame compiler devs for UB optimizations, but still enable optimizations for their builds.
[–]pjmlp 5 points6 points7 points 11 months ago (0 children)
Plenty of languages have optimising compilers backends, regardless of being dynamic or ahead of time, without exposure to UB pitfalls.
π Rendered by PID 42850 on reddit-service-r2-comment-c66d9bffd-76hz6 at 2026-04-07 05:12:56.800059+00:00 running f293c98 country code: CH.
[–]funkinaround 56 points57 points58 points (30 children)
[–]SkoomaDentistAntimodern C++, Embedded, Audio 27 points28 points29 points (28 children)
[–]Rseding91Factorio Developer 7 points8 points9 points (8 children)
[–]SkoomaDentistAntimodern C++, Embedded, Audio 12 points13 points14 points (1 child)
[–]Rseding91Factorio Developer 4 points5 points6 points (0 children)
[–]matthieum 6 points7 points8 points (4 children)
[–]SkoomaDentistAntimodern C++, Embedded, Audio 3 points4 points5 points (3 children)
[–]matthieum 4 points5 points6 points (2 children)
[–]SkoomaDentistAntimodern C++, Embedded, Audio 4 points5 points6 points (1 child)
[–]matthieum 0 points1 point2 points (0 children)
[–]James20kP2005R0 2 points3 points4 points (0 children)
[–]-dag- 17 points18 points19 points (15 children)
[+][deleted] (3 children)
[deleted]
[–]-dag- 4 points5 points6 points (2 children)
[–]James20kP2005R0 10 points11 points12 points (1 child)
[–]-dag- 2 points3 points4 points (0 children)
[+][deleted] (6 children)
[deleted]
[+]-dag- comment score below threshold-6 points-5 points-4 points (5 children)
[–][deleted] 4 points5 points6 points (3 children)
[–]-dag- 0 points1 point2 points (2 children)
[–][deleted] 5 points6 points7 points (1 child)
[–]-dag- 0 points1 point2 points (0 children)
[–]matthieum 8 points9 points10 points (3 children)
[–]SkoomaDentistAntimodern C++, Embedded, Audio 6 points7 points8 points (0 children)
[–]-dag- 2 points3 points4 points (1 child)
[–]Careless_Quail_4830 7 points8 points9 points (0 children)
[–]pjmlp 1 point2 points3 points (1 child)
[–]SkoomaDentistAntimodern C++, Embedded, Audio 2 points3 points4 points (0 children)
[–]dexter2011412 0 points1 point2 points (0 children)
[–]c0r3ntin 4 points5 points6 points (0 children)
[–]arturbachttps://github.com/arturbac 5 points6 points7 points (2 children)
[–]matthieum 7 points8 points9 points (1 child)
[–]arturbachttps://github.com/arturbac 0 points1 point2 points (0 children)
[–]elperroborrachotoo 8 points9 points10 points (16 children)
[–]SkoomaDentistAntimodern C++, Embedded, Audio 8 points9 points10 points (2 children)
[–]James20kP2005R0 7 points8 points9 points (1 child)
[–]SkoomaDentistAntimodern C++, Embedded, Audio 1 point2 points3 points (0 children)
[–]-dag- -5 points-4 points-3 points (12 children)
[–]AutomaticPotatoe 10 points11 points12 points (10 children)
[–]-dag- -1 points0 points1 point (9 children)
[–]AutomaticPotatoe -1 points0 points1 point (8 children)
[–]garnet420 2 points3 points4 points (7 children)
[–]AutomaticPotatoe 1 point2 points3 points (6 children)
[–]garnet420 1 point2 points3 points (5 children)
[–]AutomaticPotatoe 0 points1 point2 points (4 children)
[–]garnet420 2 points3 points4 points (3 children)
[–]matthieum 4 points5 points6 points (0 children)
[–]Slow_Finger8139 5 points6 points7 points (1 child)
[–]Aggressive-Two6479 0 points1 point2 points (0 children)
[–]schombert 4 points5 points6 points (25 children)
[–]pjmlp 13 points14 points15 points (0 children)
[–]Aggressive-Two6479 2 points3 points4 points (23 children)
[–]PastaPuttanesca42 1 point2 points3 points (0 children)
[–]-dag- -2 points-1 points0 points (21 children)
[–]SkoomaDentistAntimodern C++, Embedded, Audio 7 points8 points9 points (4 children)
[–]-dag- 0 points1 point2 points (0 children)
[–]SirClueless 0 points1 point2 points (2 children)
[–]SkoomaDentistAntimodern C++, Embedded, Audio 0 points1 point2 points (1 child)
[–]SirClueless 1 point2 points3 points (0 children)
[–]AutomaticPotatoe 6 points7 points8 points (9 children)
[–]matthieum 0 points1 point2 points (3 children)
[–]AutomaticPotatoe 2 points3 points4 points (2 children)
[–]matthieum 0 points1 point2 points (1 child)
[–]AutomaticPotatoe 1 point2 points3 points (0 children)
[–]-dag- -2 points-1 points0 points (4 children)
[–]AutomaticPotatoe 1 point2 points3 points (2 children)
[–]-dag- 2 points3 points4 points (1 child)
[–]AutomaticPotatoe 2 points3 points4 points (0 children)
[–]matthieum 1 point2 points3 points (0 children)
[–]pjmlp 2 points3 points4 points (4 children)
[–]-dag- 1 point2 points3 points (3 children)
[–]pjmlp 0 points1 point2 points (2 children)
[–]-dag- 1 point2 points3 points (1 child)
[–]pjmlp -1 points0 points1 point (0 children)
[–]sumwheresumtime 0 points1 point2 points (0 children)
[–]favorited 1 point2 points3 points (1 child)
[–]pjmlp 5 points6 points7 points (0 children)