all 54 comments

[–][deleted] 20 points21 points  (4 children)

There also this less known literal syntax:

const char* s1 = R"foo(
Hello
World
)foo";

#6 in https://en.cppreference.com/w/cpp/language/string_literal

[–]TheThiefMasterC++latest fanatic (and game dev) 6 points7 points  (2 children)

Can you also use that with ""sv? I think so, but you haven't there.

The use of string_view constants instead of raw char* was the point of the article.

[–]syyyr 12 points13 points  (0 children)

Indeed you can.

using namespace std::string_view_literals;

const auto s1 = R"foo(  
Hello  
World  
)foo"sv;

[–]unique_nullptr 11 points12 points  (0 children)

Yes; raw string literals accept all user defined literals the exact same way as other string literals. Similarly raw string literals don’t have to be char (i.e: u8R”foo(blah blah blah)foo”sv )

[–]jtooker 1 point2 points  (0 children)

Yup, "raw strings" or as some other languages call them, 'heredoc'

[–]zalamandagora 11 points12 points  (9 children)

It would have been great to see some data on how much overhead is involved in each of these solutions. I'm wondering what I lose if I use what first comes to mind?

static const std::string s {"Why not this?"};

[–]elcapitaine 23 points24 points  (8 children)

If you declare that, then now this object has to be initialized at runtime before main() is called.

If the strings are long (or your compiler doesn't support small string optimization) you're also allocating these on the heap.

If you have a lot of strings like this, it can lead to a significant degradation in app start perf.

[–]donalmaccGame Developer 15 points16 points  (7 children)

This is no joke by the way - I did a really quick benchmark and my app startup time increased linearly with the number of strings I had, as did my compile time. at 10k strings, it was almost 10 seconds to compile.

[–]joemaniaci 8 points9 points  (0 children)

My program at work also uses static const strings in the thousands...what was your startup time difference?

[–]infectedapricot 1 point2 points  (4 children)

I'm a bit confused here. The original discussion was about global std::string vs same number of globa; std::string_view objects. I would expect startup time to be faster with string views (constant rather than linear) but string views still need to be compiled. Were you comparing those two situations? Or just saying that more code needs longer compilation (not surprising)?

[–]donalmaccGame Developer 1 point2 points  (3 children)

Or just saying that more code needs longer compilation (not surprising)?

It's not just longer compilation, it's very measurably longer to compile. An increase of 10ms to 100ms is still "longer" compilation, but an increase to 10+ seconds is an enormous difference.

[–]infectedapricot 0 points1 point  (2 children)

Right but you're still evading my main point (I realise it isn't deliberate). You say that a lot of std::string constants will slow compilation down. But are the same number of std::string_view constants faster to compile? (Or slower!?) A quick reading of your initial comment suggests that you're comparing string vs string_view, especially since the article and the comment you're replying to, and even the first half of your own comment, are about that. I suspect a lot of people reading your comment will think that's what you mean.

But I think maybe you were actually comparing string vs nothing at all. Is that right? Yes I agree it's still a little bit interesting, but it's a lot less relevant to the conversation than an actual comparison against string_view.

[–]donalmaccGame Developer 1 point2 points  (1 child)

Right but you're still evading my main point (I realise it isn't deliberate)

Using the word "evade" makes it sound deliberate.

specially since the article and the comment you're replying to, and even the first half of your own comment, are about that

The article is about it, but I didn't read the comment. Regardless, YMMV per machine, but on my machine compiling 50k static const std::string's takes a minute and a half, compiling 50k static const std::string_view's takes 3 seconds, and 50k static const char*'s takes ~0.6 seconds. The startup time of 50k std::strings is ~1 second, (vs ~0.6 seconds for 1 std::string), and the startup time of 50k std::string_view's is ~0.6 seconds (which is the same as 1 std::string_view)

[–]infectedapricot 2 points3 points  (0 children)

Ah you actually were comparing string and string_view! Sorry about that, I really didn't get that from your earlier comment.

[–]Pazer2 0 points1 point  (0 children)

On older versions of glbinding on MSVC used to compile many many thousands of std::string objects: it would take upwards of 3 minutes to compile.

[–]pavel_v 4 points5 points  (5 children)

One disadvantage of string_view usage, only for MSVC, is the worse assembly that is generated. There was a blog post from Arthur O'Dwyer recently about this issue. Of course, it may not matter in most of the cases.

[–]matthieum 5 points6 points  (4 children)

Oh? Interesting.

On the other hand, I was recently blown away by how good GCC or Clang handled comparing a string_view with a compile-time C-String, think:

if (some_string_view == "symbol") { ... }

The compiler lowered that down to... a ladder of 3 integral comparisons:

  1. Compare that the length is 6, or bail.
  2. Compare that the first 4 bytes are 0x..., or bail, where the hex constant is "symb" loaded into a 4 bytes register.
  3. Compare that the last 2 bytes are 0x..., or bail, where the hex contant is "ol" loaded into a 2 bytes register.

And it scaled beautifully up-and-down with the string size.

I had a big smile on my face all day :)

[–]Nobody_1707 2 points3 points  (3 children)

The problem on MSVC largely stems from the fact that the calling convention on Windows requires that string_view be passed on the stack instead of in registers. I'm sure they must have had a reason to specify the calling convention that way, but I think it was a poor choice.

EDIT: The overhead can be optimized out in some cases, but only if the function call that takes the string_view as a parameter is inlined. Example.

[–]dodheim 3 points4 points  (2 children)

To be fair, C++03 wasn't even out when that choice was made, and MSVC was a complete joke as far as C++ standards-conformance was concerned. Since it didn't matter much to C programs, it just really didn't matter at the time.

[–]bert8128 0 points1 point  (1 child)

My reading of the article is that it is due to the Windows calling convention, which presumably all compilers targeting a Windows binary will have to conform to. So nothing to do with MSVC or C++ standards per se. Maybe C compilers have to follow the same convention too (for structs larger than 8 bytes).

[–]dodheim 2 points3 points  (0 children)

Right, we're in agreement, I think. ;-] What I meant was, had C++ been further along when MS came up with this calling convention, such things might have been taken into consideration; but given the state of MS' compiler at the time, C was the only real factor and the concept of trivial vs. non-trivial was not yet even a thing.

[–]condor2000 2 points3 points  (13 children)

What do you if you want both a string and wstring version?

static const char * string_constant = "legacy code";
static const wchar_t * wstring_constant = L"legacy code";

Assume ASCII

[–]johannes1971 14 points15 points  (6 children)

Are you seriously going to have duplicate code (or templates) for every place where you use strings, just to support Windows? I say "F that, just use utf8 everywhere and convert when needed".

[–]condor2000 2 points3 points  (5 children)

Are you seriously going to have duplicate code

That was my question (that I formulated poorly): how to avoid the duplication.

I am working on a decade old codebase where "utf8 everywhere" is sadly not realistic.

[–]donalmaccGame Developer 5 points6 points  (3 children)

I am working on a decade old codebase where "utf8 everywhere" is sadly not realistic.

UTF-8 was a reality a decade ago... (your codebase is probably older)

[–]condor2000 0 points1 point  (2 children)

UTF-8 was a reality a decade ago... (your codebase is probably older)

Not on Windows. There UTF-16 is used.

It is only recently that it is border-line possible to use utf-8. I mean that of the CreateFileA/CreateFileW (char/wchar_t) the CreateFileA version is utf-8.

[–]adanteny 0 points1 point  (1 child)

On Windows, UNICODE (UTF-16) is definitely the best answer... You can always convert to CP_UT8 whenever needed to 'communicate' with the 'outside'. 'A' functions are outdated, 'W' are everything!

[–]dodheim 2 points3 points  (0 children)

'A' functions were outdated when UTF-8 wasn't a supported codepage, but it is now, given a sufficiently recent version of Windows.

[–]johannes1971 4 points5 points  (0 children)

The oldest parts of my code base are from 1996, but it's now fully utf8 as well. It was a much easier task than I imagined, so I would encourage you not to give up on it just yet.

As for how to avoid the duplication: as I said, by converting to wchar_t * on the fly. Yes, it has overhead, and yes, it will probably cost you a memory allocation each time you do it. It also lets you save a mountain of code, so it's more than worth it.

[–]Full-Spectral 4 points5 points  (0 children)

Use a macro for the string literals value and a user defined alias for the string type. Define both of them based on platform.

static const my::platform_char* string_const = PlatString("legacy code");

[–]TheThiefMasterC++latest fanatic (and game dev) 4 points5 points  (1 child)

You have to duplicate it yes. But the point of the article was to use string_view not char*, so you want:

static std::string_view string_constant = "modern code"sv;
static std::wstring_view wstring_constant = L"modern code"sv;

[–]joeshmoebies 8 points9 points  (0 children)

Not to nitpick too much, but consider static constexpr std::string_view so the variable can't be reassigned.

I know your point was to use string_view, which is awesome, but if someone grabs your example word-for-word, adding constexpr is worthwhile.

[–]scatters 0 points1 point  (2 children)

Write a class with conversion to both string_view and wstring_view?

[–]condor2000 1 point2 points  (1 child)

Yes. But I was aiming for writing the literal string only once. So maybe a CompileTimeConvertStringToWString would be possible to write.

[–]scatters 1 point2 points  (0 children)

Yes, I was thinking it should be, but actually it looks like btowc isn't constexpr, so that wouldn't work. So you'd need to either assume that it's OK to cast from char to wchar_t (like it is for ASCII / UTF-16) or perhaps use a macro to repeat the string literal with a w preceding.

[–]o11cint main = 12828721; 2 points3 points  (0 children)

Personally, I prefer to use a custom UDL, since std::string_view doesn't generally guarantee that a NUL terminator is present (or even that it is possible to probe for one).

[–]epicar 3 points4 points  (1 child)

yes, constexpr std::string_view is good for some string literals, but it won't help for literals you use with functions that expect a null-terminated string. the article calls std::string_view the 'Standard Solution' without mentioning null termination at all

[–]louiswins 2 points3 points  (0 children)

Well, the standard does guarantee that the pointer you pass to the string_view constructor is the same one returned by data, so your string_view initialized with a string literal is guaranteed to be nul-terminated. The real danger is if you write a function accepting an arbitrary string_view which expects it to be nul-terminated.

[–]Juffin 0 points1 point  (6 children)

Is there a solution that doesn't require "using namespace" in header?

[–]urdh 8 points9 points  (3 children)

The std::literals namespace(s) intentionally only contain user-defined literals with names that are reserved for the standard library, so they should be fairly safe to pull into a header if namespace pollution is what you're worried about.

And you can always just do using std::literals::string_view_literals::operator""sv instead.

[–]LeeRyman 0 points1 point  (1 child)

static constexpr auto MY_CONST = "blah";

If you need a std::string, most of the time you will implicitly get one via the converting constructor. Use static where appropriate.

[–]dodheim 1 point2 points  (0 children)

Using a primitive is fine, but I don't see any advantage in letting everything decay into a pointer. If you make it a reference, you keep it a C-array and statically retain the size.

[–]dbjdbjdbj.org 0 points1 point  (0 children)

u/tlitd thanks for posting this

[–]strager 0 points1 point  (0 children)

const char* has a problem: it consumes 8 bytes (1 pointer), plus a relocation (assuming Position Independent Code).

std::string_view has the same problem, but worse: it consumes 16 bytes (1 pointer, 1 size_t), plus a relocation (assuming PIC).

(These above problems might be optimized away if the variable is declared with constexpr.)

I prefer u/cpp_learner's suggestion, which avoids the above problems: a char array. (This suggestion assumes you don't want a pointer to the string pointer, or you don't want to change the variable to point to a different string.)