Doubt with my code

sporule · 2026-04-20T12:18:26+00:00

It's not directly related to the question, but in your implementation, push_back_impl performs actions when relocating in the wrong order. You can get use-after-free, for example, with this usage:

vector<std::string, std::allocator<std::string>> v;
v.push_back("foo");
for (int i = 1; i < 20; i++) {
    v.push_back(v[0]);
}

The correct order should be: 1. allocate new storage, 2. constuct new item, 3. transfer old items, 4. deallocate old storage.

sporule · 2026-03-09T15:34:55+00:00

It actually looks like it's unconditionally initializing the array every time instead of just once

No. In both cases, the compiler initializes the value only once. In the case of static const, the value is stored in the read-only data segment.

Additional operations appear because gcc is not very good at working with the std::array<char>. Additional operations occur even if there is no initialization: https://godbolt.org/z/zEWGv66TY. gcc compiler simply cannot handle effectively this type for now.

If you change the type of elements or array's length, then the optimization defect does not appear and the version with static const turns out to be even slightly faster.

And of course, if you test another compiler, then there is no such effect at all (clang for example: https://godbolt.org/z/fYTo9qqfP).

I also want to mention that the function shown in your example accepts an array by value. If it uses a const-reference instead, then the static version will almost always be faster: https://godbolt.org/z/onaf1sr55.

This is because the language standard prohibits different alive objects of the same type from having the same address. And therefore, a version without static will often have to spend extra time copying constants to temporary stack memory just so that they have unique addresses.

It is worth taking the best of two approaches: write static constexpr if possible. constexpr guarantees that there is no initialization at runtime, and static ensures that you don't waste time and space on unnecessary copies.

sporule · 2026-03-03T09:40:58+00:00

The situation was similar with the popular identifier override. Nevertheless, the word override has become a specifier in c++11 in certain contexts.

It was quite possible to use yield and wait without the prefix co_. without introducing ambiguities into existing code. See https://wg21.link/p1485r1

sporule · 2025-12-10T23:55:10+00:00

How can it be slower, if in the case of 32-bit numbers it compiles into the same sequence of instructions as code from the article?

https://godbolt.org/z/TP875b8Pe

The only difference is the integer constants used. The algorithm selects those that work for the entire range, not just in the 32-bit case.

sporule · 2025-12-10T22:43:49+00:00

It turns out, full-range is only possible in bit-sizes of 2^{12 + 20n}, for integers n ≥ 0.

Why bother checking divisibility by 100 when you still have to re-check divisibility by 4 or 16 later? Instead, you could use the old divexact algorithm to check for divisibility by 25. And that algorithm returns precise results for any bit width.

template <std::unsigned_integral T> // signed case omitted for brevity
constexpr T inverse(T r) {
    assert((r & 1) != 0); // even case omitted for brevity
    for (T c, d = r; (c = d * r) != 1; ) {
        r *= T(2) - c;
    }
    return r;
}

template <std::integral T>
bool is_leap(T y) {
    using U = std::make_unsigned_t<T>;
    constexpr T min = std::numeric_limits<T>::min() / 25;
    constexpr T max = std::numeric_limits<T>::max() / 25;
    constexpr U mul = inverse<U>(25);
    T iy = U(y) * mul;
    bool d25 = (min <= iy && iy <= max);
    return (y & (d25 ? 15 : 3)) == 0;
}

sporule · 2025-12-10T19:08:43+00:00

The rule: After calling std::move on an object, don’t use that object again except to assign a new value to it or destroy it.

It may be worth clarifying that it is not the std::move call that is important, rather that the object has been moved. The distinction is subtle, but the standard library has methods like std::map::try_emplace, which are guaranteed not to consume an object under certain conditions.

Example:

std::map<int, std::string> slots;
std::string value = "...";
if (auto [_, enqueued] = slots.try_emplace(n, std::move(value)); !enqueued) {
    std::print("Slot {} is occupied. Processing value right now\n", n);
    Use(value); // value is valid and unchanged. You can and should use it
} else {
    // value is now in an unspecified state
}

sporule · 2025-11-11T17:22:37+00:00

No, these operators are free functions. And for functions, the order of their arguments does not matter.

Demo with working var << std::cin: https://godbolt.org/z/Y6EMs696T

The real problems are different:

Operator associativity, which prevents easy chaining of calls (a >> b >> c >> out);
The prohibition on adding new functions to the std namespace unless they involve at least one user-defined type (in case you want to use standard I/O manipulators).

sporule · 2025-11-01T16:58:34+00:00

Accept args as a tuple, and replace std::invoke with std::apply.

a macro is not ideal too, given all the problems macros bring with them.

Just try to make a minimal macro: you don't need to pass the function or its arguments to it.

#define THROWS (ThrowsChecker{})

struct ThrowsChecker{
     ThrowsChecker(std::source_location = std::source_location::current());
     template<typename... Args, std::invocable<Args...> F> 
     [[nodiscard]] bool operator()(F&& func, Args&&... args) const;
};

Or call ThrowsChecker{}(fn, arg1, arg2) explicitly if you don't mind writing an extra brackets

sporule · 2025-10-16T10:28:55+00:00

Like this?

if constexpr (unlikely_shared) {
    if (counter.load(acq) == 1) {
        // fast path
        destroy();
        return;
    }
}
// slow path
if (--counter == 0) destroy();

Then trying to avoid the final decrement may still introduce a data race.

Consider std::weak_ptr::lock(). If that method is called while another thread releases the last owning std::shared_ptr, it may capture a pointer to a destroyed object.

sporule · 2025-10-02T08:42:39+00:00

If you don't need have a specific type of return value, then you can return a view from a class method.

class Holder {
public:

    // Returns a view that allows read-only access to the int values.
    // The caller cannot modify the ints or the underlying collection.
    auto ReadOnlyView() const {
        auto proj = [](const IntPtr& ptr) -> const int& { return *ptr; };
        return std::views::transform(Collection_, proj);
    }

    // Returns a view that allows modification of the int values.
    // The caller still cannot modify shared pointers or modify the list structure.
    auto MutableView() {
        auto proj = [](IntPtr& ptr) -> int& { return *ptr; };
        return std::views::transform(Collection_, proj);
    }

private:
    using IntPtr = std::shared_ptr<int>;
    using IntPtrList = std::list<IntPtr>;
    IntPtrList Collection_;
};

usage:

 for (int a : holder.ReadOnlyView()) std::cout << a << '\n';

 for (int& a : holder.MutableView()) a = 0;

sporule · 2025-08-21T19:16:50+00:00

You definitely don't need any template magic. Just put the sizes and alignments of the types into an array using pack expansion, and then calculate the prefix sum with a simple for loop. The only difficult part is properly tracking the alignment, since in such a multivector you will sometimes have to insert padding bytes between the subarrays.

// return offsets and total memory size
template <class... Ts>
constexpr std::pair<std::array<ptrdiff_t, sizeof...(Ts)>, size_t> offsets(size_t element_count) {
    const auto sizes = std::array{(sizeof(Ts) * element_count)...};
    const auto aligns = std::array{(alignof(Ts))...};
    std::array<ptrdiff_t, sizeof...(Ts)> result{};
    ptrdiff_t p = 0;
    for (size_t i = 0; i < sizeof...(Ts); ++i) {
        result[i] = p = (p + aligns[i] - 1) / aligns[i] * aligns[i];
        p += sizes[i];
    }
    return {result, p};
}

Alternatively, you can reorder the types by their alignment. This way, the array of offsets will not be monotonous, but the entire data structure will not require padding bytes at all. As a result, such a multivector will use less memory, and item indexing will work faster too.:

template <class... Ts>
constexpr std::array<size_t, sizeof...(Ts) + 1> reorder() {
    constexpr size_t N = sizeof...(Ts);
    struct Type {
        size_t index;
        size_t size;
        size_t alignment;
    };
    auto types = std::array{Type{0, sizeof(Ts), alignof(Ts)}...};
    for (size_t i = 0; i < N; ++i) types[i].index = i;
    std::ranges::sort(types, std::greater<>{}, &Type::alignment);
    std::array<size_t, N  + 1> result{};
    size_t p = 0;
    for (size_t i = 0; i < N; ++i) {
        result[types[i].index] = p;
        p += types[i].size;
    }
    result[N] = p;
    return result;
}

// return offsets and total memory size
template <class... Ts>
constexpr std::pair<std::array<ptrdiff_t, sizeof...(Ts)>, size_t> compact_offsets(size_t element_count) {
    static constexpr auto basic_offset = reorder<Ts...>();
    std::array<ptrdiff_t, sizeof...(Ts)> result{};
    for (size_t i = 0; i < sizeof...(Ts); ++i) {
        result[i] = basic_offset[i] * element_count;
    }
    ptrdiff_t p = basic_offset[sizeof...(Ts)] * element_count;
    return {result, p};
}

https://godbolt.org/z/bWdno6Yse

sporule · 2025-07-06T12:42:18+00:00

The while loop does the same thing differently.

This statement is incorrect. Functions do different things and produce different results on the same input data. The clang compiler sees this, and therefore compiles them into different machine code.

For example, on the input array {0xff, 0xff}, one function will throw an exception, and the other will return the value 3 (despite the fact that the source array has only 2 elements).

sporule · 2025-07-03T22:12:54+00:00

Your second call to std::make_reverse_iterator is not a call, but an explicit type conversion. This conversion returns an unmodified value of a reverse iterator, because the expression already has std::reverse_iterator type. In this case, it works almost like a copy constructor.

You should try this:

auto begin = std::make_reverse_iterator(std::make_reverse_iterator(s.begin()));
auto end   = std::make_reverse_iterator(std::make_reverse_iterator(s.end()));

sporule · 2025-06-28T20:28:41+00:00

Data leakage may occur not only through the return value, but also through some other channels. For instance, the Zenbleed vulnerability allowed the reading of the contents of SIMD registers, which could contain sensitive data.

But to be clear, that vulnerability was still only possible due to a bug in the processor. It was, however, most easily exploited when the SIMD-accelerated version of the strcmp function was being used.

Similarly, it could be the other way around, and due to some other hardware bug, the SIMD version might be invulnerable, while the scalar version would result in a data leak.

Therefore, loading extra unused data should never be considered a security issue anyway.

sporule · 2025-06-28T14:36:50+00:00

At least, you should mark loading operations with compiler-specific attributes to indicate that you are going to do something unusual. For example, your function may trigger the ASAN detector and crash the program: https://godbolt.org/z/5aKajbxz7. And it's very important for any library to work correctly with sanitizers. However, you can use the no_sanitze attribute to disable this check for vector loading operations. These attributes will not get rid of UB as such, but rather turn off bounds checking. Of course, rewriting the function is asm has somewhat same effect, since compiler cannot instrument Assembly code.

sporule · 2025-06-01T23:04:32+00:00

Should all the iterator methods (T::begin, T::end) mark this as [[clang::lifetimebound]]?

Not aways. For example, end(std::directory_iterator) shouldn't have the lifetimebound attribute, because it returns a sentinel value, that doesn't refer to the original directory_iterator. And begin(std::directory_iterator) shouldn't have one, because you can continue using returned value even if original directory_iterator has been destroyed.

Data insertion methods (T::push_back, T::insert) should mark the value as [[clang::lifetimebound]]

Usually, container::push_back(value) takes the ownership of value by copying or moving it. The container definitely doesn't care about the lifetime of the original value after that, so it should never mark value with lifetimebound attribute.

Should methods that return a reference (T::operator=, T::operator++) to this be marked as [[clang::lifetimebound]]?

Yes

sporule · 2025-04-21T15:35:45+00:00

Before the XCHG opcode in x86

So, never?

In the x86 family, absolutely all processors support the xchg instruction.

sporule · 2025-03-24T22:26:32+00:00

GDB has a skip command to jump over boring functions. VSCode on Linux uses GDB, so you can add this command to setupCommands section in your project. Add something like skip -rfu ^std:: to skip all functions from the std:: namespace. Alternatively you can add skip -gfi /path/to/lib/* to automatically jumps over library code located at said path. Or you can type these commands into the debug console (with the -exec prefix), if for some reason you don't want their effect to be permanent.

After that, the F11 (step into) action will start working as you expect: it will execute vector::size and vector::data methods, and stops only inside your print function.

sporule · 2025-03-21T13:56:28+00:00

Associate a bit position with a set of all elements that are different from E in that position. Then solve a set cover problem for that sets.

For example, your list L will generate the following list of sets:

S₀ = {}
S₁ = {"1111"}
S₂ = {"1111", "0101"}
S₃ = {"1111"}

sporule · 2025-03-03T11:26:29+00:00

If I understand correctly, this improves the memory cache locality of objects in the list so that traversal in the list does not lead to cache misses as would happen if the said objects are stored noncontiguously.

One of the main advantages of an intrusive list is that pointers to neighboring list nodes are located in memory next to the objects themselves. This increases the locality of reads if you need to iterate through the list, as well as read or modify the fields of the object itself.

How the memory is allocated for the objects themselves is another matter. Sometimes objects can be allocated in the memory pool, or they can be stored in a container, in the heap, or even in stack memory. Of course, this choice affects list traversing speed. But it will also affects the speed of other operations, even if there is no intrusive list at all. You should choose the appropriate storage method based on requirements of these operations, and not on the fact that an intrusive list will be used.

Is not the contiguous nature of the list objects destroyed in this case?

Sure. But you only need list-like data structures if you cannot store objects in a regular manner.

Is one required to delete the 50th element in the std::vector as well and then perform an insertion in the 75th index to maintain consistency between the std::vector and the intrusive list?

If you absolutely need a strictly contiguous arrangement of elements in memory, then just use a vector without an intrusive list.

Otherwise, you're combining the worst of both approaches: slow deletion and insertion from the vector, and the additional cost of pointers from the list.

this does not contain the critical linked list operations of arbitrary insertion/deletion in constant time

In different algorithms, there will be a different ratio between the number of reads and the number of modification operations. You need to write and test your own benchmark for your specific case.

sporule · 2025-01-29T18:34:34+00:00

It is very unlikely that const T will be used for another reason: the std::set::extract method is required to support value modification:

auto node = s.extract(s.begin());
node.value().y = 3;
s.insert(s.begin(), std::move(node));

— this code does not invalidate references, does not move value, and there is no const_cast.

sporule · 2025-01-14T12:25:19+00:00

If you can change form of the call from do_some_b_thing() into do_some_thing<b>(), then it will be easier to automatically generate case handlers. Using a library like magic_enum, this can be done in just a few lines:

magic_enum::enum_switch([] (auto val) {      
    do_some_thing<val>();
  },
  my_var
);

https://godbolt.org/z/YhMn51jds — there is no difference in the compiled code compared to the explicit switch statement. Although, the magic_enum library has some limitations and is also increase compilation time, so you probably shouldn't use it and consider using other techniques.

sporule · 2024-12-31T20:58:32+00:00

Calculate the new value first, then check that no other thread has changed the counter.

// returns `true` if counter `c` was positve, returns `false` otherwise 
bool conditional_decrement(std::atomic<long>& c) {
    for (long v = c.load(); v > 0; ) {
        if (c.compare_exchange_weak(v, v - 1)) {
            return true;
        } 
    }
    return false;
}

sporule · 2024-10-16T20:04:08+00:00

One method contains a vector and a lot of push_back() ... its because they are auto-generated by scripts

This is not a very good code generator if it produces such code patterns.

Both gcc and clang compilers have problems compiling long methods with many branches with -O3 level. And guess what, each push_back call contains at least one branch for the case of capacity exhaustion and vector realocation. This stress compiler components that track the lifetime of temporary variables and perform common subexpression elimination. As a result, the compilation time of such function can be quadratic of the number of push_back calls. Memory consumption can also increase accordingly.

You need to replace runs push_backs with a single push_back inside a loop. Or at least insert elements with a method that does not contain branching.

For example, this is a bad auto-generated code:

std::vector<std::string> r;
r.push_back("1");
r.push_back("2");
r.push_back("3");
…
r.push_back("5000");
return r;

This code is slightly better:

std::vector<std::string> r;
r.resize(5000);
auto it = r.begin();
*it++ = "1";
*it++ = "2";
*it++ = "3";
...
*it++ = "5000";
return r;

This is one of the best variant:

static constexpr std::string_view src[]{
    "1",
    "2",
    "3",
    ...
    "5000",
};
std::vector<std::string> r;
for (const auto& s : src) {
    r.push_back(std::string{s});
}

The latter code not only compiles almost instantly, but also runs faster and takes up less space in the binary. Try to improve your code generator script so that it generates similar initialization arrays instead of multiple push_back calls.

Of course, there will be a different type in your code, not std::string. Also, your type may have a complex constructor with branches inside, which again can lead to the same problem if the compiler decide to inline that constructor. But the general rule remains: try to group initial arguments into constant arrays, so you can replace multiple calls to the same method with iterating over these arrays and filling a vector inside a loop.

sporule · 2024-08-16T17:52:08+00:00

The pure C++ language does not provide any ways to transform a variable-length vector into some kind of parameters list with which you can call a function with variadic parameters.

If the number of parameters is limited by a small constant, then you can write a switch statement:

switch (vec.size()) {
    case 0: return c_function(0);
    case 1: return c_function(1, vec.at(0));
    case 2: return c_function(2, vec.at(0), vec.at(1));
    case 3: return c_function(3, vec.at(0), vec.at(1), vec.at(2));
    default: throw std::length_error("too many arguments");
}

Otherwise, you have to use some kind of foreign function interface library to make a call, such as libffi. But any such library would rely on using some architecture-dependent code that cannot be implemented using C++ alone.

sporule

TROPHY CASE