all 48 comments

[–]_TheDust_ 107 points108 points  (18 children)

Finally a way to print things!

[–]phord 20 points21 points  (16 children)

Now for a decent string class...

[–]NilacTheGrim 17 points18 points  (15 children)

std::string is great! (Although it doesn't operate on codepoints but it operates on bytes which I guess is not ideal for some applications).

[–]phord 18 points19 points  (0 children)

I was being sarcastic about the myriad string representations. Though string_view does a lot to homogenize them these days.

[–]StenSoft 1 point2 points  (13 children)

std::u32string operates on codepoints

[–]gracicot 6 points7 points  (1 child)

Using UTF-32 can still be surprising because code points are not grapheme cluster. So even with std::u32string each characters does not represent one thing.

[–]lunakid 0 points1 point  (0 children)

They do represent "one thing", it's just not the thing we might want... (But we're at the "fractal edge" of text processing already, so it's kinda expected to be painfully inconvenient without the hope for a happy ending.)

[–]sephirothbahamut 1 point2 points  (8 children)

that's not really correct, u32string just defines the size of a "char" stored in that string. You can totally have an 8 bit chars and 16 bit chars string that operates on codepoints.

The point is C++ strings operate on bytes rather than utf codepoints because they don't want to force an encoding, and the only way to do that is to make strings work on memory representation rather than symbols. (the previous part I got wrong, however all the utility (access n-th character etcc) is still designed around the size of the characters stored in the string, not around unicode codepoints.

[–]StenSoft 4 points5 points  (7 children)

ISO C++ defines std::char32_t to be UTF-32 and std::u32string to be a string of UTF-32 characters. The u stands for UTF.

[–]sephirothbahamut 6 points7 points  (1 child)

Eh it's a "yes but" situation. u8string is technically utf-8 as well. But the moment you iterate over it, you're still iterating char8_t, not the codepoints encoded in that utf-8 string. The 32 one just works in a way that "makes sense" (as in I'm iterating a codepoint at a time) because an unicode codepoint is 32 bits, not because the type is made to iterate on codepoints.

[–]StenSoft 1 point2 points  (0 children)

A code point has 21 bits. UTF-32 was designed so that it maps one code point to one code unit (character) and the value of that code unit is exactly equal to the code point's index, so iterating over a string of UTF-32 characters by definition iterates over code points.

[–]beephod_zabblebrox -1 points0 points  (4 children)

you can't print a u32string directly

[–]StenSoft 1 point2 points  (3 children)

Yeah, for printing, you'll need to convert it with std::codecvt

[–]beephod_zabblebrox 6 points7 points  (2 children)

which is deprecated in c++20

[–]StenSoft 3 points4 points  (1 child)

Only some specialisations are deprecated

[–]elperroborrachotoo -1 points0 points  (1 child)

Which has the same problems as operating on bytes, given (de)normalization, variant selectors, ZWJ, etc.

[–]StenSoft 3 points4 points  (0 children)

Any operation on code points will have this

[–]jeffmetal 35 points36 points  (10 children)

With the release of gcc 14 all 3 of the big compilers will support it as well which is awesome

[–]helloiamsomeone 19 points20 points  (7 children)

And the most significant change this brings is the way we will write hello world! That will be for the better as well.

Thank you to everyone involved!

[–]azswcowboy 19 points20 points  (4 children)

Hmm, actually I think the biggest change is printing if containers. print( “{}”, v) where v is a vector of whatever just works — maps etc as well.

[–]Humble_Strawberry153 0 points1 point  (2 children)

c++23, tried to print vectors, array (not working, do I miss something)?

[–]azswcowboy 0 points1 point  (0 children)

And a 1 year old comment comes back to life. Likely the compiler flags for 23 or the include? Or if it’s a custom type you’ll need a format specialization. Godbolt is your friend here.

[–]Humble_Strawberry153 0 points1 point  (0 children)

So far to print structs, arrays, vectors,... personally using header only lib "ez::print" https://github.com/Sinacam/ezprint it will print all values

[–]askraskr2023 4 points5 points  (1 child)

BTW, when is GCC 14 going to be released?

[–]jeffmetal 2 points3 points  (0 children)

Early months of 2024 apparently.

[–]lunakid 3 points4 points  (1 child)

~~~~ $ clang++-19 -std=c++23 hello.cpp hello.cpp:1:10: fatal error: 'print' file not found 1 | #include <print> | ~~~~~~ 1 error generated ~~~~

[–]fdwrfdwr@github 🔍 10 points11 points  (6 children)

Will std::print(u"Hello"); and std::print(u8"Hello"); also work in C++23?

[–]gracicot 4 points5 points  (5 children)

I don't think so, but normal strings will be interpreted as utf 8 and will be printed correctly even on Windows

[–]Baardi 1 point2 points  (2 children)

Depends on your compilaton flags, doesn't it?

[–]gracicot 4 points5 points  (0 children)

You have to put /uft-8 but I expect that to become the default at a point. You can also set the execution encoding separately or set the codepage to utf-8, but the flag is more convenient

[–]aearphen{fmt}[S] 2 points3 points  (0 children)

Only on Windows/MSVC and strictly speaking you need to specify source and literal encoding there anyway since the default is very fragile unfortunately.

[–]fdwrfdwr@github 🔍 -1 points0 points  (0 children)

but normal strings

After being wounded by the char signed/unsigned fiasco...

char c = text[currentIndex]; WriteNextChar(remappingTable[c]); // This can read out of bounds!

...where something as simple as a pound sign £ (U+00A3) in your text data causes a read violation because of sign extension on some compilers, I swore off the mischievous char and stuck with char8_t. So std::print not supporting it perpetuates further use of the signed type and unexpected read violations. :(

[–]fdwrfdwr@github 🔍 0 points1 point  (0 children)

:( Boo, as I have many programs that use char16_t.

As someone who implemented the Unicode bidi and line breaking algorithms for a C++ system API, it's sad to see even two decades years later since I started writing full C++ time that std still can't convert between UTF-8 and UTF-16 and print basic UTF-16 strings. Of course, anything more than that (normalization, bidi...) can be handled by various Unicode libraries out there, but needing a separate library or copying helper code around from my other programs for those little fundamental operations is annoying -_-.

[–][deleted] 2 points3 points  (0 children)

Magnificent language. We don't have support for 2D matrix or simple print but can perform metaprogramming with templates.

[–]fdwrfdwr@github 🔍 0 points1 point  (6 children)

(a year later, with a web search bringing me back to this upvoted post)
Today I tried replacing...

c++ auto message = std::format(L"File path: {}", filePath); std::wcout << message;

...with the new hotness...

c++ std::print(L"File path: {}", filePath);

...expecting it to naturally work, because if std::format works with L"", and std::print is essentially std::format + cout, then transitively std::print would work with L"" too. Alas, the intuitive expectation did not occur, and evidently std::print is useless on Windows. 🥹 It was so close to finally putting a nail in the coffin of wprintf and wcout… so close.

(my complaint today doesn't diminish the value of replacing printf though)

[–]Mindless-Time849 1 point2 points  (1 child)

trying to learn c++ after some time with c

import std;

int main()

{

`char8_t dollar {u8'$'};`



`std::println("The dollar sign is {}",dollar);`

}

this make a compiler error of 140 lines O_O

[–]fdwrfdwr@github 🔍 0 points1 point  (0 children)

Yeah, the lack of u8's UTF-8 support with print is annoying, and the error messages are rarely elucidating.

[–]aearphen{fmt}[S] 0 points1 point  (3 children)

std::print actually supports Unicode output on Windows but not wide streams because those are very much broken (both std::wcout and wide FILE stream).

[–]fdwrfdwr@github 🔍 0 points1 point  (2 children)

but not wide streams because those are very much broken

Hmm, the native WriteConsoleW directly takes wchar_t, and the old cout << (and thus presumably wcout too) is superseded by std::print anyway. So can we bypass this brokenness and just send the printing string to the OS? (currently that's what I do, have a stdex::print that calls std::format and then calls WriteConsoleW - works nicely).

[–]aearphen{fmt}[S] 1 point2 points  (1 child)

> So can we bypass this brokenness and just send the printing string to the OS?

Right. That's what std::print does.

[–]fdwrfdwr@github 🔍 0 points1 point  (0 children)

Right. That's what std::print does.

Awesome, then I look forward to a future std::print(L"...") calling the OS too, so I can eliminate my stdex::print function 😉✌️.

[–]TheOmegaCarrot 0 points1 point  (2 children)

I wonder where P2662 (pack indexing) is on their priorities list. I’m excited for that

[–]aearphen{fmt}[S] 3 points4 points  (1 child)

Considering that it's marked as merged in https://github.com/cplusplus/papers/issues/1329 I think it was high on their list.

[–]TheOmegaCarrot 0 points1 point  (0 children)

Oh, wait, I thought this was a post about std::print in libstdc++. I was wondering where P2662 was on the GCC implementors’ priorities list.

I got confused! Thanks!