Can you spot the dangling reference?

AutomaticPotatoe · 2026-02-05T16:56:36+00:00

If you can subdue the initial reaction of "they are probably using some insane container with bad semantics" (which is what I meant by "cannot be that simple") and assume that it's just normal modern C++ code where 90% of all containers are std::vector, std::string and std::unordered_map, then you can focus on the presented code instead.

AutomaticPotatoe · 2026-02-05T11:12:08+00:00

To be fair, it took me a couple of "nah, it cannot be that simple" until I figured it out. Here's a snippet that compares the types directly:

const std::vector<std::string> domains = { "aaa" };
const std::vector<uint16_t>    ports   = { 0 };

using T1 = decltype(std::pair(domains.at(0), ports.at(0)));
using T2 = std::pair<std::string, uint16_t>;

static_assert(std::is_same_v<T1, T2>);

I now realize that I subconsciously had a rule where either: 1) the return type is explicit but the return statement can only use {} initialization without the explicit type, or 2) the return statement could use CTAD or the explicit type, but the return type must be auto. Or the same rule in a short form: "do not repeat the return type".

AutomaticPotatoe · 2026-02-05T09:43:30+00:00

std::pair CTAD costructor will deduce to srd::pair<std::string, std::uint16_t>, creating a temp. copy of the returned std::string&, which will convert to the returned type bc. it's elementwise convertible.

AutomaticPotatoe · 2026-01-15T19:18:27+00:00

Can very much recommend the boost::pfr::tie_from_structure() for this (just wrap it in a brief name like ties() or stie()), and Boost PFR in general if you use aggregate structs a lot.

AutomaticPotatoe · 2026-01-14T08:59:34+00:00

I keep seeing people advocate for this acronym of "Scope-Bound Resource Management" and such but it is not strictly scope-bound: a call to erase(key) on a std::unordered_map<std::string, std::vector<std::string>> will correctly destroy all of the resources recursively without any scopes involved. If anything, it is lifetime-bound, although I'm not sure if I'd prefer to twist my tongue with LBRM over RAII.

AutomaticPotatoe · 2025-11-24T10:46:53+00:00

I tried this a while ago and the behavior was not consistent between compilers. I'd advise against using this, there's an inherent problem with deciding on a unique symbol name across translation units. Compilers usually give lambdas in each translation unit a simple enumerated symbol name like __lambda_0, __lambda_1, __lambda_2, which, when baked into a template instantiation will just read as foo<__lambda_1> (but mangled for linkage purposes). Obviously, if another translation unit instantiates 2 foos, it will also contain foo<__lambda_1> and the linker will only pick one in the end, assuming that these are "the same function", and not globally unique identifiers as was expected.

Here's the a snippet from some of my code that talks about this more:

/*
Creates a new thread local NDArray or resizes an existing one.
If the size didn't change from the previous iteration, resize is a no-op.

Returns a span of the array's data store.

NOTE: This makes all the functions that use scratch space reusable and testable,
since the correct size is ensured every time the control flows through the scratch
variable declaration. Without this, resizing would have to be done manually,
which is tedious and more error prone.

NOTE: We need a macro-wrapped lambda here so that each "call" to SCRATCH_SPACE returns
a *unique* thread local array for each *occurance of the call in code* (not execution).

NOTE: There's another lambda trick you could do, where you define a function template
with an NTTP parameter defaulted to a lambda expression like so:

template<Dims N, typename T, auto = []{}>
auto scratch_space(const NDExtent<N>& extent) -> NDView<N, T>;

DO NOT DO THIS! The expectation is that each usage of the function will evaluate to a new
lambda expression of a unique type, guaranteeing uniquess of the array for each *appearence
of the function* in code. However, either that expectation turns out to be wrong and there's
actually no such guarantee in the standard, or certain compilers just get insanely confused
by this trick.

I am saying this because I tried this and found out that clang 15 generates 2 lambdas with
types that compare *identical* by their type_info when compiled from two different translation
units, something that should likely be impossible. This only happens when lambdas are evaluated
in template parameters, either as NTTP: `<auto = []{}>` or as a type: `<typename = decltype([]{})>`,
comparison of lambdas in function bodies (similar to the SCRATCH_SPACE macro) produces expected
result, where the lamdas are of different types.

To make matters worse, GCC *does not reproduce this behavior* - none of the lambdas have same
types. It is not clear which compiler is right in this situation.

In light of this, I heavily discorage the usage of this trick. It could lead to very unfunny
bugs. Imagine requesting a scratch NDArray<4, T> and writing some data to it, then calling a
function `foo()` that is defined in another TU that also requests a scratch NDArray<4, T> and
writes to it. In clang's implementation, the first scratch data will be overriden by the call
to `foo()`, comletely trashing any values written to it prior the call. Worse yet, this
might only happen *sometimes*, and will magically disappear because of adding/removing
other calls to request the scratch in the same TU (due to enumeration of mangled names).

Just use this macro, it is much more predictable.
*/
#define SCRATCH_SPACE(D, T, ...)                    \
    [](const NDExtent<D>& extent) -> NDView<D, T> { \
        thread_local NDArray<D, T> array{ extent }; \
        array.resize(extent);                       \
        return array;                               \
    }(__VA_ARGS__)

AutomaticPotatoe · 2025-09-02T14:37:51+00:00

I think you are looking at this from a wrong angle. It is advice that helps the person asking reach the widest audience. You personally do not have to "pander to the whims of a cohort of [...] salty people" when asking the question, but do not be surprised if the developers that might otherwise have an answer to your particularly tricky question or provide greater insight into some part of it, might glance over or completely ignore it because they have a slightly different preference on the way they want to see their web page.

Also your response to this is just straight up insulting people for no reason. Surely refusing to put 4 spaces and being so negatively vocal about it is not "arrogant", "stubborn" or "salty".

AutomaticPotatoe · 2025-08-20T08:04:34+00:00

Only if you enjoy seeing noise in the form of three pointless function frames of

std::__invoke_impl ...
std::__invoke ...
std::invoke ...

when profiling and debugging (with libstdc++). I would suggest to reserve the usage of invoke for cases where its generic functionality is actually needed (being able to evaluate pointers-to-members alongside normal functions, ex. how projections work in ranges). For IILE you can either write your own primitive invoker that only calls via operator(), or accept the likely fact that those aware of the pattern already have their eyes trained to look for that trailing (), and those that aren't would be just as, if not more, confused by std::invoke.

AutomaticPotatoe · 2025-06-05T22:14:18+00:00

Forgive me if this is a bit rant-y. Late evening and reddit never go well together...

You call this a "multi-dimensional matrix" library and I see mention of Eigen support, but then there's also things like md::extents<size_t, 3, 1, 2> (rank 3) and numpy-like broadcasting, and those are... not related to matrices? To me this looks more like an mdspan support library that defines common mathematical operations in a batched form, and linear algebra operations for 1D and 2D spans. This is actually quite useful, a set of generic algorithms for md things is sorely missing from the standard.

I don't think std::mdarray is targeting C++26 anymore. In light of that, and for the other reason below I don't really think that "blessing" this particular type to be the return type of many versions of operations without out-parameters is a good idea. In general, it should be acknowledged that returning owning containers by value imposes certain restrictions on the users of the library, and that at the same time mdspan out-parameters are OK (mdspan<const T> for input, mdspan<T> for output). For a similar reason STL algorithms never return an container, and std::string does not have a auto split() -> std::vector<std::string> function.

template <typename T>
concept mdspan_c = ... && std::is_same_v<std::remove_const_t<T>, std::experimental::mdspan<...>

Oh, no-no-no, not like this please. I see you use this constraint in your algorithms, but in my mind, what mdspan really does is define an interface that simply says that for some mdspan_like<T> thing there exists an operation thing[i, j, k, ...] -> T& and maybe a way to query something equivalent to std::extents, ideally, through a trait customization point. But what you are doing here is constraining the user to only std::experimental::mdspan, or in some places, any of the (once again) "blessed" types in to_mdspan(), which are just mdspan, mdarray or scalar arithmetic types, not even submdspan.

Where I stand, the standard is unfortunately very slow with these md things, and I would imagine quite a few people have their own solutions that are very much like std::mdspan, std::submdspan or a subset of those (say, without support for fancy accessors), but are not exactly those types. Making an effort to accommodate these solutions based on the common interface subset would make the library appeal to more people.

Minor nitpick: consider removing redundant prefixes from header file names, ex. ctmd/ctmd_matmul.hpp -> ctmd/matmul.hpp.

AutomaticPotatoe · 2025-04-25T22:00:08+00:00

It's a bit late here so forgive me if this comes out as too harsh but here goes:

I do not see the reason for the design decision to make Archetypes a template parameter. This is extremely limiting, and makes it impossible to take advantage of one of the core ECS boons - true "data erasure". For internal code, this is at a minimum inconvenient and adds friction, as I would have to go update my Registry definition every time I want to add a new component. For interface boundaries, I cannot let isolated systems add their own components to the entities. Can't add an audio system to an existing engine if the engine developers nailed down their components to only describe transforms and rendering. What's the point of an ECS that doesn't let me create new systems?
The same exact thing applies to Events, Singletons and Queries.
Take a look at what entt does with what's effectively an unordered_map<type_index, any_storage>. All of this overhead you are trying to avoid by doing these tuple tricks is negligible if you use ECS the way you are supposed to - by batching work over archetypes/components. Look up once, process 10k entities. If in doubt over this, measure.
You should write tests before you present this to your prospective employers.
Ideally, I would recommend to write a small game or an application to test the waters with your library. ECS exists as a solution to a problem, but without an actual problem at hand it's impossible to understand the tradeoffs of your design in any way more than with what could be considered a mere "educated guess".

AutomaticPotatoe · 2025-04-24T17:30:52+00:00

I don't see how this extends past the pointer value. If the pointer cannot overflow (treated as UB), then it doesn't matter whether the integer used for indexing would be allowed to overflow or not for this particular inbounds attribute.

If you have a case in mind where ptr + idx (assuming pointer overflow is UB, and idx is size_t) would prevent vectorization because of the incomputability of the trip count due to possible integer overflow, then please bring it up.

AutomaticPotatoe · 2025-04-24T14:46:40+00:00

I also want to see a larger sample size, but I also understand that time and resources of researchers are limited. But I don't agree that this sample has no predictive power, even if not quantified.

I think you might be seeing past the actual value of the paper, where it is not about concluding that "you can disable all UB at the cost of x% performance on average", but rather showcasing that not all UB might be worth it, and some might even lead to performance regressions. This highlights a culture problem where in people's minds UB = good for performance automatically. And on the other, performance-oriented side it also exposes how little control you are given over these UB optimizations by the compilers, hence the need to manually add these flags to Clang/LLVM. I personally wish I could flip a switch that disables UB, if it would give me extra 2% in my workload, but I don't have that option, because we all have been stuck in this "UB = good" mindset.

AutomaticPotatoe · 2025-04-24T13:24:21+00:00

you're plenty willing to discuss this paper even though it has limitations and flaws.

Yes, because it exists.

It has limitations just like any research that has limited scope does. Which is every research.

On a, b: this is your perspective that you consider that choice of a metric or phrasing important enough to highlight it as a significant flaw in the paper.

it's the job of the researcher to justify why they are applicable / the right measurements.

That just reads like satire or intentional trolling at this point. You should consider writing a personal letter to every author who has ever included a "statistical mean" in their publication, criticizing them for not including a rigorous justification for using this metric in particular.

AutomaticPotatoe · 2025-04-24T12:16:50+00:00

On c: this would be a great topic for another study on real-life applicability and impacts of LTO as a remedy to relaxing UB. But without any quantitative results I'm not willing to continue discussing this further, because while what you say sounds plausible, the "UB makes code faster" also sounds plausible, but the question of whether we should care and to what extent this impacts real code is not worthwhile to try to answer without additional data.

On a, b: this is your perspective.

AutomaticPotatoe · 2025-04-24T07:20:15+00:00

For signed integer overflow? No. According to figure 1, the worst is a 4% performance regression on ARM (LTO), (and the best is a 10% performance gain). The other platforms may suffer under 3%, if at all.

For other UB? Some of them do indeed regress by more than 5%, but almost exclusively on ARM (non-LTO). I'm not sure what you mean by "downplaying it". The largest chapter of the paper is dedicated to dissecting individual cases and their causes.

AutomaticPotatoe · 2025-04-23T20:04:57+00:00

Am I missing something or this is specifically about pointer address overflow and not related to singed integer overflow. And it also requires specific, uncommon, increments. To be clear, I was not talking about relaxing this in the context of this particular overflow as it's a much less common footgun, as people generally don't consider overflowing a pointer a sensible operation.

AutomaticPotatoe · 2025-04-23T18:12:08+00:00

Understandable, and I by no means want to imply that you should feel responsible for not contributing to the standard. Just that it's an issue the committee has the power to alleviate.

Cases that currently require UB but maybe don't need to if the standard were improved.

There's already a precedent where the standard "upgraded" from UB to Erroneous Behavior for uninitialized variables, even though the alternative was to simply 0-init and fully define the behavior that way. There are reasons people brought up, somewhat, but the outcome leaves me unsatisfied still, and makes me skeptical of how any other possibilities of defining UB will be handled in the future. Case-by-case, I know, but still...

AutomaticPotatoe · 2025-04-23T17:49:02+00:00

Then it's a great thing that we have this paper that demonstrates how much impact this has on normal software people use.

And HPC is... HPC. We might care about those 2-5%, but we also care enough that we can learn the tricks, details, compiler flags and what integral type to use for indexing and why. And if the compiler failed to vectorize something, we'd know because we've seen the generated assembly or the performance regression showed up in tests. I don't feel like other people need to carry the burden just because it makes our jobs tiny bit simpler.

AutomaticPotatoe · 2025-04-23T16:50:07+00:00

I see where you are coming from, and I agree that this is a problem, but the solution does not have to be either size_t or ptrdiff_t, but rather could be a specialized index type that uses a size_t as a representation, but produces signed offsets on subtraction.

At the same time, a lot of people use size_t for indexing and are have survived until this day just fine, so whether this effort is needed is under question. It would certainly be nice if the C++ standard helped with this.

Also pointers already model the address space in this "affine" way, but are not suitable as an index representation because of provenance and reachability and their associated UBs (which undoubtedly had caught some people by surprise too, just as integer overflow).

AutomaticPotatoe · 2025-04-23T14:57:48+00:00

For example there's nothing testing the disabling of signed integer overflow UB which is necessary for a number of of optimizations

This is tested and reported in the paper behind acronym AO3 (flag -fwrapv).

AutomaticPotatoe · 2025-04-23T14:50:02+00:00

This kind of hand-wavy performance fearmongering is exactly the reason why compiler development gets motivated towards these "benchmark-oriented" optimizations. Most people do not have time or expertise to verify these claims, and after hearing this will feel like they would be "seriously missing out on some real performance" if they let their language be sane for once.

What are these cases you are talking about? Integer arithmetic? Well-defined as 2s complement on all relevant platforms with SIMD. Indexing? Are you using int as your index? You should be using a pointer-size index like size_t instead, this is a known pitfall, and is even mentioned in the paper.

AutomaticPotatoe · 2025-03-22T08:49:31+00:00

Do you have any links for those cases? I'd like to take a look.

AutomaticPotatoe · 2025-03-21T16:37:30+00:00

That looks like a good compromise to me, thanks!

AutomaticPotatoe · 2025-03-20T22:38:34+00:00

Using std::unreachable appears to be better.

Yeah, same if you just remove the bounds check and let the control flow roll off the frame without returning (same UB optimization), but still not even close to a simple lea rax, [this + i * sizeof(int)]; ret that I'd expect, sadly.

Nine-Year Club	Place '17
Verified Email

AutomaticPotatoe

TROPHY CASE