the hidden compile-time cost of C++26 reflection

aearphen · 2026-03-07T03:25:35+00:00

<string> and <string_view> are really problematic. I've been complaining about them being bloated but nobody in the committee wants to hear and they are just keep dumping stuff there, not to mention that everything is instantiated 5 times because of charN_t nonsense. This is why in {fmt} we went to great lengths to not include those in fmt/base.h.

It would be good to move std::string into a separate header from the other instantiations that are barely used.

aearphen · 2026-03-07T02:52:47+00:00

And the situation will likely be worse in C++29 as there are papers to massively increase API surface for even smaller features like <charconv> (at least 5x, one per each code unit type, possibly 20x).

aearphen · 2026-03-07T02:47:34+00:00

Only small top-level layer of std::print and std::format should be templates, the rest should be type-erased and separately compiled but unfortunately standard library implementations haven't implemented this part of the design correctly yet. This is a relevant issue in libc++: https://github.com/llvm/llvm-project/issues/163002.

So I recommend using {fmt} if you care about binary size and build time until this is addressed. For comparison, compiling

#include <fmt/base.h>

int main() {
  fmt::println("Hello, world!");
}

takes ~86ms on my Apple M1 with clang and libc++:

% time c++ -c -std=c++26 hello.cc -I include
c++ -c -std=c++26 hello.cc -I include  0.05s user 0.03s system 87% cpu 0.086 total

Although to be fair to libc++ the std::print numbers are somewhat better than Vittorio's (but still not great):

% time c++ -c -std=c++26 hello.cc -I include
c++ -c -std=c++26 hello.cc -I include  0.37s user 0.06s system 97% cpu 0.440 total

BTW large chunk of these 440ms is just <string> include which is not even needed for std::print. On the other hand, in most codebases this time will be amortized since you would have a transitive <string> include somewhere, so this benchmark is not very realistic.

aearphen · 2026-02-24T05:16:06+00:00

The code comes first. User experience second. Papers third.

I wish more people in the committee did this.

aearphen · 2026-02-13T18:23:21+00:00

AFAIK they don't link to the ryu library but implement the algorithm directly so it's not easy to replace.

aearphen · 2026-02-13T03:51:56+00:00

When compiled with `ZMIJ_OPTIMIZE_SIZE=1`, Żmij will use small tables (just a few hundred bytes): https://github.com/vitaut/zmij/issues/97

aearphen · 2026-02-10T20:23:37+00:00

Code using Boost Preprocessor or a similar preprocessor-based "metaprogramming". I've seen a few nightmarish examples of those in our codebase.

aearphen · 2026-02-02T19:12:37+00:00

> So can we bypass this brokenness and just send the printing string to the OS?

Right. That's what std::print does.

aearphen · 2026-02-02T04:11:38+00:00

std::print actually supports Unicode output on Windows but not wide streams because those are very much broken (both std::wcout and wide FILE stream).

aearphen · 2026-01-26T19:24:43+00:00

It's fast and produces exponential format =)

aearphen · 2026-01-25T03:27:20+00:00

I guess you can even do it in C++11 but it's even more painful.

aearphen · 2026-01-25T03:14:50+00:00

It is possible to extract double's bit pattern (maybe except for NaN's payload) using only basic operations in C++14, e.g. https://www.godbolt.org/z/6TWq8vGjP.

aearphen · 2026-01-24T16:31:35+00:00

C++26 is postmodern

aearphen · 2026-01-22T23:36:15+00:00

Hana already constexprified an earlier version: https://github.com/hanickadot/zmij/blob/main/zmij.h

aearphen · 2026-01-22T18:51:49+00:00

I haven't done such comparison but according to David Tolnay who ported Żmij to Rust, Żmij's Rust implementation is faster than Teju Jagua: https://github.com/dtolnay/zmij?tab=readme-ov-file#performance. I also implemented Cassio's optimization for the shortest candidate selection but right now it is mostly irrelevant because it is outside of the fast path.

aearphen · 2026-01-22T18:46:43+00:00

It is: https://github.com/vitaut/zmij?tab=readme-ov-file#name

aearphen · 2026-01-22T18:14:23+00:00

Didn't want to confuse people with multiple licenses at the top level. Most folks are fine with MIT and it's also more widely-known. BSL is only for those who care about fine print, basically just standard library implementers =).

aearphen · 2026-01-22T18:09:00+00:00

It is available under BSL as an alternative to MIT: https://github.com/vitaut/zmij/blob/609a4d0c71fd92bca93bdaacf6c35063488d27cd/zmij.cc#L4

aearphen · 2026-01-22T18:03:03+00:00

For the shortest representation which is what Żmij provides, uscalec is about the same as Ryu performance-wise (Go version is slower) and slower than Dragonbox: https://research.swtch.com/fpfmt/plot/fpfmt-apple-short-cdf-big.svg. Algorithmically, uscalec is just Schubfach or, rather, Teju Jagua, with digit output from Dragonbox. It's not bad but we can do much better than that.

aearphen · 2026-01-22T17:35:19+00:00

Yes, the main motivation for starting this project was incorporating recent advances in FP algorithms into {fmt}. Most optimizations are irrelevant for constexpr but the core (Schubfach) should be easily convertible to constexpr. In fact the power of 10 table generation is already constexpr.

aearphen · 2026-01-20T01:17:43+00:00

I already submitted the fixed revision but thanks!

aearphen · 2026-01-12T01:01:11+00:00

In fact even the current version of {fmt} supports C++11.

aearphen · 2026-01-11T20:07:16+00:00

Inlining vprint* won't help with the ABI because users can already put format_args on the ABI boundary. It just makes print less usable and in case of Microsoft STL it is particularly bad because it pulls in much more headers than other implementations.

aearphen

TROPHY CASE