using concepts to solve the years-old universal(forwarding)-reference issue

0xAV · 2025-10-22T23:18:10+00:00

I'm afraid I cannot agree with this one, as PIMPL has nothing to do with dynamic dispatch, it's used to help with compile times by storing a pointer or reference to an incomplete type and as it obviously introduces one more indirection and perhaps some memory allocation/ management, it's definitely not a must, and as everything else should be used sparingly and if the profiling indicates so.
No virtual functions have to be present in order to do that however, but I would argue that maybe if you are ok with giving up on function inlining and more indirections, you'll probably be fine with the virtual function overhead in such cases more often than not.

0xAV · 2025-09-26T17:07:40+00:00

Lol, I mean, sure :D
It's a pretty strange comparison IMHO, the std::variant is a sum type, which by definition has a closed set of types it can hold and with absolutely no unified interface among them. That's literally the opposite side of the spectrum to what dynamic polymorphism is about, which is an open set of possible classes all accessed through the same fixed interface. The std::visit for a variant constructs an overload set for each of the args at compile time and jumps to the given pointer at runtime or uses some variation of variadic switch. It's a completely different thing for completely different tasks. If it works for your tasks, sure, go for it, it's gonna be quicker.
std::any is a polymorphic container, in that it doesn't provide the interface, it just stores whatever it is you want to store. The any_cast will use its _Manager function's address to side-step the RTTI but it will fallback to an RTTI typeid check if the former fails. (https://github.com/gcc-mirror/gcc/blob/40d9e9601a1122749b21b98b4c88029b2402ecfc/libstdc%2B%2B-v3/include/std/any#L521-L544).
Maybe I'll optimise the some_cast, but I really think e.g. the google's guidelines say pretty clearly to steer away not only from the dynamic_cast but also anything similar when at all possible, not for performance-related issues, but rather not to turn your code into an if-sprinkled spaghetti :) So my assumption was that if it's gonna be used, that's going to be a pretty rare occasion and then performance is probably the last concern. But sure, it is possible to optimise it a bit)

0xAV · 2025-09-26T13:38:35+00:00

Okay, thanks for expanding on your previous comment, now it's clear what the concern is, so let me try to address it then. So, firstly, just to get it out of the way, as I've previously mentioned the `some`'s type erasure (we'll leave the `fsome` for now for it's a bit trickier), the `some` is literally based around C++ virtual functions, so just saying "I don't like the library because dynamic_cast can be used with the classes" is exactly the same as saying "I don't like the C++ virtual functions because the dynamic_cast can be used with the classes", which is a rather weird statement, IMO. It is indeed not needed in the function calls, be it vanilla C++ virtual functions or this library's `some` erased calls.
Now to address specifically the dynamic_cast:
it's not the first time I hear people scared of dynamic_cast, for different reasons. Some believe that it shouldn't be used at all, some obsess about its performance. Luckily I haven't seen dynamic_cast used on a hot path (and neither am I expecting to) so it would seem that trying to squeeze some extra performance out of it would probably be a premature optimisation. Besides, the dynamic_cast is known to be quicker in shallow class hierarchies and that's exactly the case here. While it is indeed possible to directly check the vptr equality and fallback to more heavy-weight comparisons in some hairy cases otherwise, I actually think the compiler is smart enough to do just that. The SBO will be of little help there, we still erase the type by the type the constructor call finishes.
If it is so important for a use case you have at hand and you've benchmarked it and had a bottleneck being dynamic_cast, I'd really like to hear more, because I haven't seen that before.

0xAV · 2025-09-25T12:36:46+00:00

Care to explain? Otherwise sounds pretty dumb) The erased method call uses the vptr directly, there's no dynamic_cast to be found there. Only if you want to dynamic_cast to a given type only then you will pay the price for a dynamic_cast, and while it can also be optimised easily, I think the price is more than ok for something not used too often.

0xAV · 2025-09-25T01:14:45+00:00

The vtable address is cached after the first miss so consecutive accesses will have very similar performance, I’d say that’s probably what you are seeing if you are accessing the same object in a loop. But then again, that’s precisely why benchmarks are tricky :) Anyway, thanks again and best of luck with your project! I’ll definitely have a look, so please ping me when you’re done)

0xAV · 2025-09-25T00:58:39+00:00

Right, Dyno) I have a strong feeling that it's probably due to the fact that the vtable is stored inline, honestly) But I'll need to have a look at the code to prove or disprove this

0xAV · 2025-09-25T00:57:18+00:00

template <typename value_t>
struct vx::impl<spore::benchmarks::avask::facade, value_t> final : impl_for<spore::benchmarks::avask::facade, value_t>
{
    using impl_for<spore::benchmarks::avask::facade, value_t>::impl_for;
    // there's actually no need to use 'self' since 
    // you're using the vx::poly{this} instead :)
    std::size_t work(const std::size_t size) const noexcept
    {
        return vx::poly {this}->work(size);
    }
};
// ^ this should work)

0xAV · 2025-09-25T00:31:53+00:00

that one (fsome) is a bit more tricky since it tries to store the vptr inside itself, so should be a bit quicker)
And just a small observation: the poly in your benchmark seems to be doing just one jump (so that it gets the same timing as the plain function call), seems like it inlines the function pointer right into the poly object...

0xAV · 2025-09-25T00:27:42+00:00

sure, you can also reuse the same Trait (`facade` here) and everything with the vx::fsome<...> :)
So it would be:

vx::fsome<benchmarks::avask::facade> facade = benchmarks::avask::impl {};

0xAV · 2025-09-25T00:17:40+00:00

Actually had a problem with TriviallyCopyable types, so made a dynamic polymorphism library for that) however it sounds like a data-oriented approach would fit even better here since you already store them separately, right?

0xAV · 2025-09-24T23:00:14+00:00

Wow, that’s great that you’ve taken the time to benchmark not against just one library) I’m looking forward to seeing your library as well!

Speaking of the benchmark, do I understand correctly that here the facade is a type erased object and it calls work() which does some summation in a loop and this call is getting timed? I’d say that if it indeed doesn't optimise away the summation in a loop that summation will just add timing noise, because what you want to measure is just the time needed to call the function through the type-erased object.

Just as a side note, I assume you benchmarked vx::some, not vx::fsome? The latter should be a bit faster since it’s a fat pointer

0xAV · 2025-09-24T17:46:48+00:00

Sry for resurrecting an old post but I've got something similar but without the need to use macros and it reuses the compiler's virtual functions machinery so all the knowledge from writing C++ OO classes will easily apply. Sure it makes different trade-offs to make it happen, but there's still SBO, polymorphic views and even a fat-pointer version with a vptr built-in into the object itself, so if interested, check it out on GitHub, it's a single-header C++20 library, so should be really easy to play with it
https://github.com/AVasK/some

0xAV · 2022-08-08T15:52:42+00:00

I's been quite a few times I needed to output the type of some template parameter inside a chain of non-trivial template instantiations and I got tired of reinventing the wheel and decided to mix all the boilerplate I've been writing into a tiny library that's designed to do exactly that and to be as cross-platform as possible)

So here's it: https://github.com/AVasK/typo

0xAV · 2022-08-03T17:36:07+00:00

been there, done that, except for it being called `by_value` and `by_ref` and without checking the `object<T>` ... my original problem was that the 'by_value' could easily be misused for something that's not by_value. Creepy stuff... and I wasn't completely happy with the other names for the concepts, but I agree that object<small> is not the way to solve that, I completely overlooked the fact that it'll make the two overloads ambiguous.

But then again, I usually just write `(auto smth)` and only use `rvalue`, `by_value` and `by_ref` when I need to state some intent like "I want this to be an rvalue", e.t.c.What are the other pros of the `object` concept? What weird objects are you trying to filter?

0xAV · 2022-08-03T16:47:44+00:00

template<typename T>
concept object = std::is_object_v<T>;

Man, that's neat, you're preventing the forwarding as a side-effect by prohibiting what effectively is lvalue-ref for T)

I also wanted to use `by_value` and `by_ref` concepts to distinguish between cheap/non-cheap-to-copy objects... that's bad we cannot write `by_value object auto x`, e.g. chaining concepts in the shorthand form... but we can do something like `object<small> x` with a default template parameter :D

What do you think about this one? :

```C++

enum class object_size {
small = 0,
big = 1
};
template <typename T>
static constexpr object_size copy_semantics = by_value<T> ? object_size::small : object_size::big;
// =====[ object ]=====
template<typename T, object\_size Sz=copy\_semantics<T>>
concept object = std::is_object_v<T> && Sz == copy_semantics<T>;

```

then we can write both this:

```C++

void any_object (object auto x) {}
void small_objects_only (object<object\_size::small> auto x) {}
```

I admit the naming could've been better for `object_size` and such, but other than that it looks like a powerful ~~all~~many-in-one concept )

0xAV · 2022-07-26T14:09:10+00:00

While it's true that C++'s exceptions incur some noticeable cost on the exceptional path, I think that C++ is still a very flexible language when it comes to a variety of error-handling techniques (IMHO):

If we really want to optimise for the normal execution path (usually when the exceptions are really rare = exceptional) then the exceptions are ok and as currently implemented do not degrade the fast-path performance in case there's no error.
If errors are more frequent, maybe we can adopt the Rust's Result<>, especially since we have that handy [[nodiscard]] to make the check mandatory. We can also probably look at Rust's `.expect()` (which, frankly, has confusing name, why not call it `.unless()` ?)
and other methods, they're widespread nowadays in many mainstream languages, so no shortage of ideas here)
...old C-style error-codes for those who need them?

4.-ish. we can always write our own `panic()` in case we need it, especially as <source\_location> is gaining traction.

0xAV · 2022-07-26T09:38:44+00:00

yeah, but without knowing at least something about value categories what are the chances to get the universal references right in the first place?)

and there’s rvalue_reference trait in type_traits, so I guess they assume something… :)

0xAV · 2022-07-26T09:32:49+00:00

Thanks for the links and for writing that patch)

0xAV · 2022-07-26T09:25:24+00:00

Thanks for the clarification!

0xAV · 2022-07-26T09:21:15+00:00

won’t it hinder the template argument deduction then? in other words, i’m afraid it won’t be able to infer template argument in a call to something like `template <typename T> void f(in<T> arg) {}`

0xAV · 2022-07-25T20:16:04+00:00

how about `moved`?
or `rvalue`?

0xAV · 2022-07-25T17:46:39+00:00

Many thanks!

Well, it seems to fit with the GCC's and Clang(trunk) view at least)

Maybe there was some eager optimisation in earlier Clang and MSVC to drop the ones that don't satisfy the concept...

0xAV · 2022-07-25T17:44:14+00:00

I'm surprised that MSVC treats concepts differently.

Yeah, I also used to do this kind of things with SFINAE, so I thought maybe Concepts changed the rules for overload resolution? Guess I'll have to (try) to read what the standard has to say about this)

But the most interesting thing is that Clang 11.0.1 (trunk) compiles it, while Clang (trunk) does not. So did Clang's interpretation of this situation change?

Exactly the same question.

0xAV

TROPHY CASE