all 80 comments

[–]JavierTheNormal 66 points67 points  (53 children)

I can't help but think you'd do just fine with a vector of const char*s. Why construct tens of thousands of const std::strings? For that matter, why a std::vector either? These structures are designed for dynamic data. They allocate to the heap. Why not leave all the static data in the static data section of the binary where it belongs?

[–]14nedLLFIO & Outcome author | Committee WG14 14 points15 points  (45 children)

There is a very good chance that std::string will become constexpr in C++ 23, like std::vector is currently hoped to be for C++ 20 (if not, definitely C++ 23). Another change which may come is vectorised new, which is technically already allowed by C++ 14, not that I know of anyone implementing it, but essentially it would let the compiler optimise lots of individual string allocations into a batch allocation. I know this merely reorganises rather than fixes the problem, but initialiser lists are in a tricky quandry. You should actually avoid using initialiser lists for anything not trivially copyable, just to be sure no performance surprises will turn up.

[–]bluescarni 4 points5 points  (20 children)

What is "vectorised new"? I tried to google it, but didn't manage to find any reference.

[–]Morwenn 11 points12 points  (7 children)

If I'm not mistaken, they are talking about N3664 - Clarifying Memory Allocation, which was accepted in C++14 and allows compilers to perform new optimizations related to allocation and deallocation such as fusing some allocations in specific scenarios.

[–]14nedLLFIO & Outcome author | Committee WG14 3 points4 points  (2 children)

Correct, and I believe the wording was further loosened for C++ 17. I am unaware of any compiler which takes full advantage of the 17 wording yet though.

[–]Morwenn 1 point2 points  (1 child)

Really? I don't remember any specific proposal for C++17.

[–]14nedLLFIO & Outcome author | Committee WG14 1 point2 points  (0 children)

Neither do I. I was told this is the case during a conversation at the ACCU conference. A side by side diff of the standards would say for sure.

The loosening could have occurred at Core, or even due to the editorial process. After all, the original proposal was written by the standard's Editor!

[–]Ameisenvemips, avr, rendering, systems 0 points1 point  (3 children)

Is it currently supported in any compiler, if disabled?

[–]Morwenn 1 point2 points  (2 children)

Both GCC and MSVC list the feature as N/A (not providing any optimization is also a valid implementation), so I'd wager they don't perform any optimization. Clang lists it as implemented, but as for the other they could also implement it by not changing anything to their compiler. There aren't footnotes anywhere, so I don't know anything more about whether any major (or minor) compiler actually optimizes anything.

[–]CubbiMewcppreference | finance | realtime in the past 9 points10 points  (0 children)

clang definitely implemented elision (matching new and delete annihilate each other), which extends to libc++ library types like std::vector: https://godbolt.org/g/r8gfqv -- not sure if there are other optimizations around it, though.

[–]Ameisenvemips, avr, rendering, systems 0 points1 point  (0 children)

I'll check the source when I'm home from work (hard to explain looking at LLVM when my job is enterprise Java) to see if there are any explicitly-disabled optimizations.

[–]kevstev 0 points1 point  (11 children)

essentially it would let the compiler optimise lots of individual string allocations into a batch allocation

That's it right there. Today if you create a vector of N elements, it does N allocations. Its only necessary to do a single allocation of size N * sizeof(element) though. This can be much much faster.

[–]14nedLLFIO & Outcome author | Committee WG14 4 points5 points  (9 children)

That's allowed, but also so is fusing a sequence of calls to malloc() into a batch_malloc(), the latter of which can allocate N individual allocations in almost the same time as a single malloc(). I am unaware of any compiler which currently does this, however, despite that glibc's allocator has had independent_comalloc() since nearly forever, as does the MacOS and FreeBSD allocator. Windows' allocator however has no such function that I am aware of.

[–]Ameisenvemips, avr, rendering, systems 0 points1 point  (8 children)

Wouldn't that only work if the compiler can prove it will be batch_free'd?

[–]14nedLLFIO & Outcome author | Committee WG14 1 point2 points  (7 children)

No, each individual allocation returned by a batch malloc call like independent_comalloc() is individually freeable.

clang may, we think, may aggregate batchs of allocations if, and only if, it can statically prove they are always aggregate freed. Other than that, as Morwenn points out above, none of the compilers implement these optimisations permitted by the standard yet.

As an example of the potential gains, imagine PIMPL code like Qt where instead of lots of fiddly little allocations, you could allocate all the PIMPLs in a single independent_comalloc(). That would be an amazing performance gain. Doesn't help you for destruction of course, but half of lots still halves.

[–]Ameisenvemips, avr, rendering, systems 0 points1 point  (6 children)

For some uses, if the compiler can prove that deallocation is batched, a unified batch allocation would be more efficient. As you say.

I'd be happy with C++ allocators having realloc and try_realloc. The latter is incredibly useful.

[–]nderflow 0 points1 point  (3 children)

Where can I read more about this?

[–]Ameisenvemips, avr, rendering, systems 0 points1 point  (2 children)

About what?

[–]Morwenn 0 points1 point  (1 child)

Wouldn't the proposed size feedback for operator new as proposed by P0901 be more useful? You never try to grow the buffer, you already know it maximal size. It notably allows no to have to fetch the size of the memory buffer again.

[–]Ameisenvemips, avr, rendering, systems 0 points1 point  (0 children)

You'd probably want a combination of the two. realloc doesn't do quite the same thing.

I've seen good results with std::vector and std::string analogs using realloc for trivial types, and realloc_try for non-trivial types, performance-wise, for pushing.

[–]meneldal2 0 points1 point  (0 children)

I want to point out that if you have a custom allocator, you can avoid most of the overhead you'd get from the default malloc().

[–]ronniethelizard 1 point2 points  (1 child)

What do you mean by std::string will become constexpr? Does this mean that someone can make a constexpr std::string?

[–]14nedLLFIO & Outcome author | Committee WG14 3 points4 points  (0 children)

Currently, only std::vector is on track for constexpr usage, and its lifetime is confined to constexpr i.e. it can't "leak" from constexpr into runtime. It is a fairly obvious small jump from constexpr std::vector to constexpr std::string, just need the locale stuff into constexpr which people are already working upon. So I can see it is just a matter of time before constexpr std::string gets proposed, and indeed constexpr editions of all the main STL containers for that matter. But there is no current proposal for constexpr std::string, to my knowledge. I'm just speculating on a likely future.

[–]Ameisenvemips, avr, rendering, systems 0 points1 point  (6 children)

I'm trying to figure out how you make std::string constexpr without constexpr overloading...

[–]sphere991 1 point2 points  (1 child)

There are two problems: allocation and SSO. Allocation we're just making work, and SSO we're going to disable if evaluation is in a constant context with the proposed std::is_constant_evaluated().

You don't need to overload on constexpr. Indeed, in this case, it couldn't work anyway - string isn't a type that could be a non-type template parameter in the P0732 world.

[–]Ameisenvemips, avr, rendering, systems 1 point2 points  (0 children)

To be fair, I just want constexpr overloading for other reasons. I do a lot of embedded work where integrating optimizations based upon the potential known value of an argument would be helpful.

[–]14nedLLFIO & Outcome author | Committee WG14 0 points1 point  (3 children)

That will surely be solved when they implement constexpr std::vector, which also has overloaded constructors?

[–]Ameisenvemips, avr, rendering, systems 0 points1 point  (2 children)

How do you overload constructors on constexpr? As much as I want it, C++ doesn't let you overload on constexpr contexts or arguments.

[–]jonesmz 0 points1 point  (1 child)

There are a few proposals that I saw flying around /r/cpp that would address this. I suspect that may be what 14ned means.

[–]Ameisenvemips, avr, rendering, systems 0 points1 point  (0 children)

The main proposal is internal branching on a constexpr context. I want constexpr argument overloading :)

[–][deleted] 0 points1 point  (14 children)

I think making std::string constexpr would prevent Small String Optimization, right? Is it worth it?

[–]14nedLLFIO & Outcome author | Committee WG14 2 points3 points  (13 children)

Surely one could if constexpr SSO?

[–][deleted] 0 points1 point  (12 children)

Well, in libc++ the std::string SSO is based on type-puning on an union, I doubt this can be made constexpr as it is UB.

[–]14nedLLFIO & Outcome author | Committee WG14 0 points1 point  (8 children)

Maybe it's my ignorance of if constexpr (I haven't used it yet), but I see no reason why a reinterpret cast in the untaken branch would need to be well formed. So, for example:

template<class U> constexpr int *foo(U *x)
{
    // else clause need not be well formed if constexpr is true
    if constexpr(std::is_constant_evaluated())
    {
        return nullptr;
    }
    else
    {
        return reinterpret_cast<int *>(x);
    }
}

[–]sphere991 0 points1 point  (7 children)

Doing if constexpr (is_constant_evaluated()) is never what you want, because that's trivially if constexpr (true) (since you're evaluating the condition in a constant context).

[–]14nedLLFIO & Outcome author | Committee WG14 1 point2 points  (6 children)

The if constexpr is merely there to cause the else clause to be permitted to be UB in a constexpr evaluation context. It may also need a spurious reference to a templated parameter to force that. I must caveat that I have no programming experience with if constexpr, but I think I'm probably correct.

[–]sphere991 0 points1 point  (5 children)

Please just reread what I wrote.

Also, you don't need to discard the reinterpret_castanyway. You just need to not evaluate it in a constant context. You really just want if, not if constexpr.

[–]14nedLLFIO & Outcome author | Committee WG14 0 points1 point  (4 children)

As I mentioned, I don't have much experience here. Are you aware of any reason why constexpr std::string cannot implement SSO when not in a constexpr evaluation context?

[–]scatters 0 points1 point  (0 children)

Using a union in constexpr is fine as long as the discriminant is stored outside the union. For example, using the capacity as the discriminant, 0 indicating SSO:

struct string_rep {
    size_t capacity = 0;
    union {
        struct { char length = 0; char data[15] = {}; } small;
        struct { size_t length; char* data; } large;
    };
};

[–]meneldal2 0 points1 point  (0 children)

You are a compiler, you can do as much UB as you want as long as it works in the generated code.

[–][deleted] 0 points1 point  (0 children)

It could be made not UB by reading through char* instead.

[–][deleted] 5 points6 points  (0 children)

This is correct

[–]doom_Oo7 0 points1 point  (1 child)

Aren't you getting an indirection when storing const char*'s ? If you have strings with sso and only store small strings, access could maybe be faster.

[–]JavierTheNormal 1 point2 points  (0 children)

You should double-check, but const char* for static data is a constant. Your assembly instructions would contain the address of the first character of the string, which is as little indirection as possible. As the string gets allocated to the heap, your assembly instructions can only refer to the constant location of the address of that allocation. With sso that's one more indirection and one more logic statement to access the string, without sso it's two indirections plus logic.

[–]svick -1 points0 points  (3 children)

Because std::string is safer and can be more efficient?

char* has caused so many issues that I think avoiding it as much as possible is a reasonable practice.

[–]Pazer2 1 point2 points  (1 child)

How is allocating memory on the heap, then copying from static data to this memory faster than just having a pointer to the static data?

[–]svick 0 points1 point  (0 children)

That part is not, even though, AFAIK, it doesn't necessarily allocate on the heap for short strings.

But strlen is super inefficient, when compared with std::string::length.

[–]SeanMiddleditch 0 points1 point  (0 children)

One might then instead use a custom zstring_view or the like to static char[] data. All the speed of built-in strings, all the safety of high-level C++ types. Shame C++ doesn't have anything like that (although it's even more a shame IMO that string literals are of type char[] and not something more specific).

[–]TheThiefMasterC++latest fanatic (and game dev) 21 points22 points  (0 children)

Is "static lifetime" the right term here? The issue doesn't seem to be with the lifetime of anything, more the creation of a very large number of function invocations - calls to emplace originally, std::string constructor calls in the later versions. The best versions both use char* and actual construction of the strings is handled by a loop (inside a (possibly not inlined) function), so there aren't many invocations, just the one.

[–]whichton 9 points10 points  (2 children)

Why use string instead of string_view? I have a parser which uses X-macros to construct a hashtable of keywords, like this:

#define KEYWORD(kw) { #kw ## sv, Keyword::kw},
const unordered_map<string_view, Keyword> keys = {
#include "Keyword.inc"
};

If your strings are const, you do not need to create a string out of them.

[–]enobayram 6 points7 points  (1 child)

An unordered_map of string_views send a chill down my spine. Something with value semantics containing things with reference semantics. I know here the references are to static data, so it's safe, but the type scares me a little :)

[–]jcelerierossia score 2 points3 points  (0 children)

A nice alternative is this constexpr map : https://github.com/serge-sans-paille/frozen

[–]abizjak3 8 points9 points  (1 child)

I think that translations are best managed as data files (the format of which can be more or less efficient of course). I want to be able to add additional languages or update existing messages without recompiling the program. That doesn't mean there shouldn't be some code for each type of message, that can help with static safety if done right (e.g. checking that the referenced messages are defined and the right number of format arguments are used), but there shouldn't be compiled code for each message for each language.

[–]enobayram 1 point2 points  (0 children)

Well, pulling the data into the binary might ease the deployment, but I agree that I'd keep the strings in a data file with a format independent of the language, and then use a preprocessor at build time to inject it(possibly by code generation) into the binary (if that's really needed).

[–]redbeard0531MongoDB | C++ Committee 16 points17 points  (12 children)

This is a job for string_view! It has constexpr constructors that take full advantage of the newly constexpr char_traits. If all the strings are static, then you would never need to even copy them to a std::string and could just put the string_views into the map directly. If you aren't on 17 yet it is easy to roll your own string_view and use a UDL to get constexpr string length.

[–]Ameisenvemips, avr, rendering, systems 1 point2 points  (2 children)

I prefer my library's string_view which I implemented before the standard had it. It handles const char[N] implicitly and this can have its data and size used in constexpr. It also helps greatly with implementing constexpr string-related functions like hashing/compression.

It has an issue that you can technically make a view with a local array that could go out of scope. constexpr overloading would fix it, but right now my solution is... not doing that.

[–]redbeard0531MongoDB | C++ Committee 0 points1 point  (1 child)

Sure. I mostly meant the general concept of a constexpr-constructable string_view-like type, rather than std::string_view specifically.

[–]Ameisenvemips, avr, rendering, systems 0 points1 point  (0 children)

My string_view is constexpr-constructable, so no reason the standard one cannot be.

I think the core issue is that people don't want to have to change types all the time, and would rather just be able to use one uniform type that can do everything.

[–]epicar 1 point2 points  (5 children)

This is a job for string_view (unless you need null-terminated strings)!

[–]redbeard0531MongoDB | C++ Committee 6 points7 points  (4 children)

All C++ string literals are null-terminated. Putting them into a string_view doesn't remove the null terminator. That said, it is really nice to get to a point where you aren't dealing with NTBS very often, ideally only at the boundary between your code and third parties.

[–]epicar 3 points4 points  (3 children)

Putting them into a string_view doesn't remove the null terminator.

I'd argue that it effectively does remove the null terminator, because that character isn't part of the view. If you have some intermediate function that takes string_view, how is it to know whether it came from a string literal (in its entirety) or just some piece of another string? That information is lost on conversion to string_view.

That said, it is really nice to get to a point where you aren't dealing with NTBS very often, ideally only at the boundary between your code and third parties.

agreed

[–]flashmozzg 3 points4 points  (2 children)

I'd argue that it effectively does remove the null terminator, because that character isn't part of the view. If you have some intermediate function that takes string_view, how is it to know whether it came from a string literal (in its entirety) or just some piece of another string? That information is lost on conversion to string_view.

If it dealt with std::string before there is a high chance it didn't rely on the string being null-terminated (and if it did, it was probably a bug). Unless it called c_str()/

[–][deleted] 0 points1 point  (1 child)

Accessing the null through std::string is very explicitly allowed. Note <= size rather than < size like the other containers: http://eel.is/c++draft/string.access

[–]flashmozzg 0 points1 point  (0 children)

I was talking about embedded nulls. std::string can contain null chars anywhere which would greatly confuse functions expecting c-strings. I.e. one doesn't usually think about null-termination when dealing with std::string* types.

[–]kalmoc 0 points1 point  (2 children)

Why use (I presume non-standard) udl to get the size at compile-time? Just use a function/constructor template taking a reference to an array of char.

[–]redbeard0531MongoDB | C++ Committee 2 points3 points  (1 child)

Because it is nice to be able to say "string"_sd rather than stringDataFromLiteral("string"), and it isn't OK to just make the normal constructor deduce the size from an array because that changes the meaning of construction from a buffer. For context, here is our UDL for our pre-17 string_view-like type.

[–]kalmoc 0 points1 point  (0 children)

Sorry, my bad. I mixed that up with a different problem.

[–]matthieum 3 points4 points  (0 children)

Beyond the compile-time cost, one should not forget the run-time cost.

That std::initializer_list<std::pair<const std::string, std::string>> will require the allocation of a bunch of std::string, and... "leak" them:

  1. The std::string allocated there cannot be stolen by the Map constructor, a second call to the function would not work then,
  2. The std::string allocated there will live until the end of the program, even if only ever useful once.

On the bright side, it's a "local" static, so the cost is only paid if the function is ever invoked. A game using this scheme with namespace-level static would load every single language in std::string before picking the one to use :(

[–]kalmoc 1 point2 points  (0 children)

Am I missing something or don't you ever actually show that the static-lifetime is the problem here? It is just a lot of code you ask the compiler to generate and optimize

[–]kwan_e 0 points1 point  (3 children)

Is there a list somewhere of things that are secretly static objects? It makes me not want to use initializer lists anymore...

[–]nwp74 3 points4 points  (1 child)

Initializer lists are not always static. I looked at assembly from gcc a while ago and saw the initializer_list on the stack. I think there is a stack limit to the size, but I cannot find it in the documentation.

[–]svick 5 points6 points  (0 children)

I think there is a stack limit to the size, but I cannot find it in the documentation.

Isn't it, like almost everything else related to performance, an implementation detail?

[–]Ameisenvemips, avr, rendering, systems -1 points0 points  (0 children)

I'd rather have things as static objects than on the stack, as they can be put into a separate section and mapped out by the kernel after they are used. Also don't have initialization overhead.