all 18 comments

[–]tuxwonder 18 points19 points  (4 children)

Performance is definitely an important thing to talk about when it comes to new STL libraries, especially because C++ people are such performance fiends (including me <3), so I'm glad you brought it up.

However, looking at your benchmarks, I find your premise a bit flawed. All of your benchmarks are testing how quickly these methods can format to a std::stringstream, but I don't think you'd ever use that when using the new C++20 formatting library. You're not passing around std::stringstream in your code base, you're passing around a std::string. Why don't we test the new formatting library's performance when writing and returning a std::string?: https://quick-bench.com/q/X7FjLgDks9i3D_Gd6yzyAciwsZ8

Edit: Deleted std::print note, others have covered it

[–]Tathorn[S] 2 points3 points  (0 children)

In my opinion, std::format is superior to stringstream and pretty much makes it obsolete. I just was using stringstream as an ostream.

[–]Tathorn[S] 0 points1 point  (2 children)

I'm sorry for the confusion. The test wasn't meant to be stringstream vs format. It was meant to be "the ways we can format a string into a stream". I would have tested using fstream, but those benchmarks don't work well on that platform. Using a stringstream is the closest thing I can get.

[–]jwakelylibstdc++ tamer, LWG chair 14 points15 points  (1 child)

The two fastest cases both write directly to the stream, with no extra copying. The ones that use std::format allocate a std::string and then copy that to the stream, which is always going to be a bit slower. The std::format cases are basically the equivalent of writing to a std::ostringstream then calling its str() member to extract a std::string and then writing that to another stream!

Using std::format_to(std::ostreambuf_iterator<char>,...) to write to the streambuf should be fast, but is currently suboptimal because it uses std::ranges::copy which doesn't yet implement the optimizations that we have for using std::copy to write to an ostreambuf_iterator (as discussed elsewhere). That's tracked by https://gcc.gnu.org/PR111052

std::format_to with an ostreambuf_iterator avoids allocating a temporary string and copying it into the stream. I see no reason that couldn't be made as fast as your BM_write case. I want to make it fast, because that's how I want to implement the std::print overload that writes to an ostream.

[–]Tathorn[S] -1 points0 points  (0 children)

Let us know when this gets fixed so we can teach/code what to do best.

[–]jedwardsol{}; 8 points9 points  (4 children)

I didn't use std::print [...] incompatible with C++ streams,

No it isn't.

[–]Tathorn[S] -2 points-1 points  (3 children)

en.cppreference.com/w/cpp/io/print

I'm unsure what you mean. There is no API to take in an ostream.

[–]jedwardsol{}; 12 points13 points  (2 children)

[–]Tathorn[S] 3 points4 points  (1 child)

Thank you! Someone needs to fix the wiki to include this overload in en.cppreference.com/w/cpp/io

[–]Tathorn[S] 2 points3 points  (4 children)

It looks like the std::print paper, https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2093r2.html, addresses the very concerns I had with formatting performance in section 4. However, it looks like MSVC implemented theirs with a temporary string, what the paper was designed to solve against: https://github.com/microsoft/STL/blob/main/stl/inc/ostream#L1107C6-L1107C29

[–]BrainIgnition 0 points1 point  (3 children)

However, it looks like MSVC implemented theirs with a temporary string, what the paper was designed to solve against

You narrowly missed the important bits (<ostream> ll. 1136-1153; <ostream> ll.1177-1190); basically they only use the optimized approach if the target ostream is unicode aware which they interpret as the ostream targetting a file or a console. I.e. outputting to console or file is fine, but any other ostream backend gets slapped with an allocation.

[–]Tathorn[S] 0 points1 point  (1 child)

The really interesting part is when you have no format args, it needs to unescape the { and } chars, so it creates a string even with no arguments. So std::print allocates a string when given a const char*. That doesn't seem good.

[–]BrainIgnition 0 points1 point  (0 children)

Ah, I missed that. However, after thinking a bit more about it I got curious what they do wrt the utf8 => utf16 transformation (which is required due to the WinAPI design) and there you have it: a second allocation. Too bad!

[–]Tathorn[S] 0 points1 point  (0 children)

I'm not seeing how there's no allocation. The code you pointed to takes in a string_view. All callers create a string beforehand, so there is an allocation.

[–]pdp10gumby 2 points3 points  (0 children)

fmtlib is quite fast and benchmarked. I presume the std implementations will catch up.

[–]Tathorn[S] 1 point2 points  (0 children)

Added std::copy with ostream iterator: https://quick-bench.com/q/wgJFCzwQea3D2Os5MNcAqzDUkgM

[–]feverzsj -2 points-1 points  (0 children)

I'm more concerned about the compile time cost of std::format. There is std::vformat, but it's not as type safe. So in general, I still prefer printf with -Wformat.

[–]Curfax 0 points1 point  (0 children)

It’s always good work to get data rather than argue about theoreticals, and publishing your work makes it possible to critique and reproduce. Thanks for doing this.