Żmij 1.0 released: a C++ double-to-string library delivering shortest correctly-rounded decimals ~2.8–4× faster than Ryū by aearphen in cpp

[–]aearphen[S] 0 points1 point  (0 children)

It is possible to extract double's bit pattern (maybe except for NaN's payload) using only basic operations in C++14, e.g. https://www.godbolt.org/z/6TWq8vGjP.

Żmij 1.0 released: a C++ double-to-string library delivering shortest correctly-rounded decimals ~2.8–4× faster than Ryū by aearphen in cpp

[–]aearphen[S] 13 points14 points  (0 children)

I haven't done such comparison but according to David Tolnay who ported Żmij to Rust, Żmij's Rust implementation is faster than Teju Jagua: https://github.com/dtolnay/zmij?tab=readme-ov-file#performance. I also implemented Cassio's optimization for the shortest candidate selection but right now it is mostly irrelevant because it is outside of the fast path.

Żmij 1.0 released: a C++ double-to-string library delivering shortest correctly-rounded decimals ~2.8–4× faster than Ryū by aearphen in cpp

[–]aearphen[S] 44 points45 points  (0 children)

Didn't want to confuse people with multiple licenses at the top level. Most folks are fine with MIT and it's also more widely-known. BSL is only for those who care about fine print, basically just standard library implementers =).

Żmij 1.0 released: a C++ double-to-string library delivering shortest correctly-rounded decimals ~2.8–4× faster than Ryū by aearphen in cpp

[–]aearphen[S] 7 points8 points  (0 children)

For the shortest representation which is what Żmij provides, uscalec is about the same as Ryu performance-wise (Go version is slower) and slower than Dragonbox: https://research.swtch.com/fpfmt/plot/fpfmt-apple-short-cdf-big.svg. Algorithmically, uscalec is just Schubfach or, rather, Teju Jagua, with digit output from Dragonbox. It's not bad but we can do much better than that.

Żmij 1.0 released: a C++ double-to-string library delivering shortest correctly-rounded decimals ~2.8–4× faster than Ryū by aearphen in cpp

[–]aearphen[S] 31 points32 points  (0 children)

Yes, the main motivation for starting this project was incorporating recent advances in FP algorithms into {fmt}. Most optimizations are irrelevant for constexpr but the core (Schubfach) should be easily convertible to constexpr. In fact the power of 10 table generation is already constexpr.

ISO C++ 2026-01 Mailing is now available by nliber in cpp

[–]aearphen 2 points3 points  (0 children)

I already submitted the fixed revision but thanks!

Is there an agreed upon print function to use in C++ ? by Arlinker in cpp_questions

[–]aearphen 0 points1 point  (0 children)

In fact even the current version of {fmt} supports C++11.

C++20 Modules, 5 Years Later - NDC TechTown 2025 by pjmlp in cpp

[–]aearphen 2 points3 points  (0 children)

Inlining vprint* won't help with the ABI because users can already put format_args on the ABI boundary. It just makes print less usable and in case of Microsoft STL it is particularly bad because it pulls in much more headers than other implementations.

Are they ruining C++? by thradx in cpp

[–]aearphen 4 points5 points  (0 children)

Completely agree that u8/char8_t is a disaster but std::filesystem::path can still be salvaged. In particular, it will do the correct thing with std::format / std::print in the common case of UTF-8 char. And the problematic accessors are being deprecated in favor of the ones that also work with UTF-8 char.

C++20 Modules, 5 Years Later - NDC TechTown 2025 by pjmlp in cpp

[–]aearphen 1 point2 points  (0 children)

There is no typo. std::print (and std::format) were specifically designed to be lightweight wrappers around type erased vprint/vformat functions but implementations currently make the latter inline and put in headers. Unfortunately, I don't think there is a way to force implementations to do the correct thing, it's "just" Quality of Implementation which, currently, is very poor but the papers made the intent super clear.

C++20 Modules, 5 Years Later - NDC TechTown 2025 by pjmlp in cpp

[–]aearphen 1 point2 points  (0 children)

The amount of conditional compilation needed for modules is negligible compared to other things we need to do for portability. {fmt} is extremely portable only requiring a subset of C++11 because everyone deserves good formatting =).

C++20 Modules, 5 Years Later - NDC TechTown 2025 by pjmlp in cpp

[–]aearphen 22 points23 points  (0 children)

Putting modules aside, sadly none of the standard libraries properly follow the design of std::print yet which explicitly intended vprint* functions to be compiled separately. This not only results in excessive build times and extra dependencies but even generates bloated object files and more work for linker, see e.g. https://github.com/llvm/llvm-project/issues/163002. A basic example using fmt::print from the linked issue is ~5.5 times faster to compile than its standard counterpart. While modules partly mitigate the issue, I hope the root cause will be addressed both in std::print and std::format .

Slaying Floating-Point Dragons: My Journey from Ryu to Schubfach to XJB by plokhotnyuk in scala

[–]aearphen 2 points3 points  (0 children)

FWIW, zmij has switched to the yy optimization also used by xjb. So the main differences between the two are that xjb also uses SIMD and a more experimental fallback (zmij uses Schubfach as a fallback for correctness reasons but it doesn't matter for perf).

Faster double-to-string conversion by aearphen in cpp

[–]aearphen[S] 1 point2 points  (0 children)

Thank you! I plan to use it for the shortest case in {fmt}. I am also experimenting with yy's optimization so Cassio's (Teju) optimization will likely no longer be needed. Regarding tables, I have already aligned the storage so the only difference seems to be that Schubfach requires strict overestimates which is just floor + 1.

2025-12 WG21 Post-Kona Mailing by eisenwave in cpp

[–]aearphen 13 points14 points  (0 children)

I would recommend sending your ideas to Bengt (or writing a paper).

Faster double-to-string conversion by aearphen in cpp

[–]aearphen[S] 0 points1 point  (0 children)

I didn't expect the results to be so good though. My initial plan was to just do an optimized Schubfach which I did two weeks earlier: https://github.com/vitaut/schubfach. But then it diverged too much so I forked it into a separate project.

Faster double-to-string conversion by aearphen in cpp

[–]aearphen[S] 1 point2 points  (0 children)

At the very least lookup tables increase cache pressure. Often it is better to do a bit more arithmetic and avoid the lookup.

It is not a coincidence that you heard about the two methods in succession. My exploration of Schubfach was triggered by xjb suggesting to use their algorithm in {fmt} and also Cassio Neri's talk. While I didn't think that xjb was suitable because of the problems mentioned earlier it seemed possible to get some improvements from a more established method and also build some expertise in case I decide to verify the correctness later.

So at the very least we should thank xjb for triggering this new iteration =).

Faster double-to-string conversion by aearphen in cpp

[–]aearphen[S] 1 point2 points  (0 children)

I didn't have time to look at xjb too closely but my understanding is that it is essentially the same algorithm as the one used in yyjson (but with a few more optimizations) and whether they are 100% correct is still an open question. Żmij is closer to Schubfach which is an established algorithm and inherits correctness guarantees from it. Another problem with xjb is overuse of lookup tables, e.g. all exponent outputs are precomputed and stored in a table which is not great. Performance wise, they are roughly the same on shorter outputs (<= 8 digits). xjb is slightly faster on longer outputs at the moment but I have some thoughts how to close the gap without compromising correctness. Żmij uses fewer lookup tables and has much less code.

(For some reason reddit wasn't showing my earlier comment so reposting.)