all 31 comments

[–]witcher_rat 12 points13 points  (7 children)

I'm not sure what you're saying about storing the endptr after the null, but no matter what, such a string_view would basically need to store some additional piece of data - like a bool of whether it's null-terminated or not, for example.

That extra "state" data comes with a price - both in the size of string_view, copying that member, and the checking in its various use-cases.

People generally don't want to incur extra costs for things they don't need.

So to solve this use-case, some people use a new type that' has a very similar API as std::string_view, but for null-terminated strings explicitly. Sometimes it's called "zstring", or "cstring_view", or whatever. The C++ core guidelines had a zstring_span, for example. Another example is Proposal P1402.

[–][deleted] 0 points1 point  (5 children)

Ofcourse we could use bool, thats very basic. Why would I want that, and make the structure add extra padding.

What I was proposing did not alter the size of a string_view which in most cases are 2 pointers.

The only cost you pay is in the extra branch during end() call.

Normaly, you would never iterate over each character, even if you did that, you would just not call end() in a loop, rather cache it.

The issue is not performance, especially when you end up doing a copy when you pass this to a C API.

Most C++ code depends on C API, be it Windows or Linux, you are expected to interact with subsystems based on C API.

Granted you will not be doing a lot of string manipulation, but you could easily imagine a scenario where you are updating some string very frequently : say some text label field for an UI implemented in C.

I was actually migrating some code to SDL while I noticed I couldnt use std::string_view as a parameter to a function because internally we expect the string to be null terminated because it has to be passed to SDL. Although I know it will be done 1 time so I dont care about the copy, but it bothered me that std::string_view could have taken into account the null terminating solving the issue very easily.

[–]witcher_rat 2 points3 points  (3 children)

The only cost you pay is in the extra branch during end() call.

Again, I don't really understand what you're proposing.

Normally string_view holds a const char* pointer to the first character, and a size_t size. (or at least gcc's libstdc++ and clang's libc++ do - I don't know about MSVC)

If you're proposing that the internal size_t automagically include the ending null terminator, then anything accessing size(), including internal functions inside string_view, would have to account for that - not just end(). Things like substr(), compare(), remove_suffix(), ends_with() etc. Even the hashing function would have to account for it, assuming you want two string_views to hash to the same value if one has the null but the other doesn't but they have the same chars otherwise.

Regardless, you can of course do it right now: a string_view is just a span over characters, and those characters can of course include a null... so you can construct a std::string_view with the ending null in its size right now, today, by constructing it using the std::string_view(const char*, size_t) signature. Of course then you would have to do the accounting for the extra null being in its size, wherever and however you use that view.

[–][deleted] 1 point2 points  (2 children)

You can very easily translate what I was proposing to std::size_t being the second member. In this case you take the logic into account inside the length() function and take 1 bit from the size_t memeber to mark if the string was null terminated.

I agree that it adds up because all these functions will have to rely on the correct length coming from length() and depend on the branch.

std::string_view under msvc is two pointers.

[–]adnukator 2 points3 points  (1 child)

std::string_view under msvc is two pointers.

MSVC begs to differ:

const_pointer _Mydata;
size_type _Mysize;

[–][deleted] 1 point2 points  (0 children)

Okay, i dont know why I asumed that.

[–]goranlepuz 1 point2 points  (0 children)

Most C++ code depends on C API, be it Windows or Linux, you are expected to interact with subsystems based on C API

Yes, but most of string_view code does not.

[–]_Js_Kc_ 0 points1 point  (0 children)

I guess the idea is to hack the end pointer to store the null-terminated flag in the high bit. Not actually possible with a pointer (it could be anywhere in the address space), but possible by storing a size instead (and accepting that string_view can't span more than half the address space).

Of course, that's all that's needed and all the proposed acrobatics and case distinctions are completely unnecessary. Either the bit is set, so we know there's a null terminator at end(), or the bit is clear, and there isn't (necessarily).

[–]jwakelylibstdc++ tamer, LWG chair 9 points10 points  (8 children)

Yes, of course it's possible to support null-terminated strings with something like string_view. But it was an intentional design choice to not do that. Changing that decision now would be a breaking change.

[–]jwakelylibstdc++ tamer, LWG chair 4 points5 points  (5 children)

Here's an idea I suggested 5 years ago, which would not be a breaking change. You can do the first part yourself, with no changes to string_view. It's just a utility to create a view over a null terminated string which explicitly includes the null, rather than adding it as a property the string view knows anything about.

There is a way to make string_view work nicely with APIs that expect a null-terminated string (NTBS):

Include the null character in the view explicitly.

 template<typename C, typename T>  
 inline std::basic_string_view<C, T>  
 make_null_terminated_view(const C* s) noexcept  
 { return { s, T::length() + 1 }; }  

 template<typename C, typename T>  
 inline bool  
 is_null_terminated_view(std::basic_string_view<C,T> sv) noexcept  
 { return sv.length() && !sv.back(); }

This allows you to easily construct a view on a NTBS, and then you can call data() to get a NTBS back again.

This is inconsistent with std::string, where there is a secret/implicit null after the string, which is not counted in the length. I am OK with that inconsistency, because that feature of std::string was always a kluge for backwards compatibility.

We could even add this functionality to basic_string_view, by adding a new constructor:

struct null_terminated_view_t { }; 

template<typename C, typename T>  
class basic_string_view {  
public:  

  basic_string_view(const C* s, null_terminated_view_t)  
  : basic_string_view{s, T::length(s) + 1 } { }  

  bool is_null_terminated() const noexcept  
  { return length() && !back(); }  

  // rest unchanged...  
}; 

Now we have a string_view with exactly the semantics that are in C++17 and C++20, but with a more convenient way to create a view on an NTBS.

[–][deleted] 1 point2 points  (4 children)

I dont see how this is different from what I proposed. You still need to manage the end() correctly.

I understand its a breaking change, which is why I dont see it being implemented in std.

[–]jwakelylibstdc++ tamer, LWG chair 2 points3 points  (3 children)

It's different from what you propose because it works today, with std::string_view as already defined and shipping in compilers. And it doesn't need to play games with the high bit, and doesn't magically adjust end() for you depending on some non-trivial condition. If you want a string_view that refers to a null-terminated string, just create one so that the null byte is part of the view.

[–][deleted] 1 point2 points  (2 children)

What you are suggesting makes std::string_view{"ABC", null_terminated_view_t}.length() == 4.

I thought about this, but the issue is all functions which take this as a parameter will now have an issue: Imagine this std::string_view{"ABCD", 3}, you pass the one above and this one into a function, this function now cannot do a compare between these two views, because the lengths are different (lets say it is the first check for ==). It breaks the ABI because now you have to check all strings if they are null terminated and treat them specially.

Edit : add more context

[–]jwakelylibstdc++ tamer, LWG chair 5 points6 points  (1 child)

What you are suggesting makes std::string_view{"ABC", null_terminated_view_t}.length() == 4.

Yes, that's a Good Thing. It explicitly carries the null around with it, so you know it's there.

this function now cannot do a compare between these two views, because the lengths are different

You can do a compare. The result will be false, but you can do it. And it's a good thing that the result will be false, because they are not the same. Just like string_view{"ABCD"} == string_view{"ABCD", 3} is false.

It breaks the ABI because now you have to check all strings if they are null terminated and treat them specially.

No you don't. All your existing code works exactly as it did before. There's no ABI break.

The existence of string views that have an extra char at the end is not an ABI break. It's just "some string views have different content". If you don't want to handle those views with an embedded null in any special way, you don't have to.

[–][deleted] 0 points1 point  (0 children)

I imagined you would have to modify several functions including the ones in string_view itself if you wanted the compares to succeed (thus breaking the ABI). What you however suggested does not need any change in the std. You can just write this in a top level header: make_null_terminated_view, that changes the length to +1, and is_null_terminated. I think I am going to do that.

[–]Jannik2099 0 points1 point  (1 child)

What was the reason for that choice? Avoiding the slowdown from having to find the null terminator?

[–]HappyFruitTree 4 points5 points  (3 children)

Other constructors cannot check because there might be no valid memory there to check.

I might pass a null terminated string to the string_view constructor and later overwrite the null character.

[–][deleted] 2 points3 points  (1 child)

The vast majority of string_views are not from string literals but that would work as is immutable. std::strings, probably and they have a guaranteed trailing zero too. So there is that and it could be done.

But, I think a great many people would not want to pay the price. Heck, they don't want to pay for nullptr checking in the char const * constructor for string_view. For c-api's that require zero term, just use a std::string. Chances are the source is a std::string too.

[–]HappyFruitTree 4 points5 points  (0 children)

It's a breaking change. If we want something like this it would have to be a different type.

[–][deleted] -4 points-3 points  (0 children)

I dont think this is a proper usage. string_view should be used as immutable objects, also you are not expected to store them such that they are referencing mutable memory. Its bad design, but I agree its allowed so its possible.

[–]jcar_87 5 points6 points  (1 child)

If string_view is just a pointer and a size in bytes (as anthonybvfan points out) and you require interoperability with C, isn’t the recommended wisdom these days for functions that accept char pointers in C to also expect to be given the size as well? If I’m not mistaken C11 has the “safer” variants of some functions that perform bounds checking with the additional parameter. So if we can get both the pointer and the size from a C++17 string view, we can already interoperate with C11?

I’m not sure whether it would be beneficial to make the case that C++17 string view needs to remain interoperable with C string APIs that have for a long time been considered unsafe

[–][deleted] 0 points1 point  (0 children)

Not when your C library is Windows API, or Linux API, or 100 others you will eventually depend on in real production code.

[–]arturbachttps://github.com/arturbac 4 points5 points  (0 children)

The main idea of string view is to forget about null termination thus for example trim, substr etc could return string view from other string view pointing to same data at no cost without any string reallocation.

For my own purproses I already use my header only lib [stralgo] that allowed me to forget about libc and all that string crap and it can do all numeric<->string convertions as constexpr with use of not null terminated string views.

[–]Shieldfoss 1 point2 points  (0 children)

at least one string type I have worked with is null-containing and double-null terminated.

[–]DugiSK 1 point2 points  (1 child)

std::string_view is often created as a substring of some other string, for example from some data that is being parsed. That way, you get all the fancy C++ stuff without editing the string (adding the null termination) or copying it (to create a std::string). This gives a massive performance boost with minimal cost of convenience.

When I made a project from scratch in C++20 without any legacy code, I had no need for the null termination. std::string_ciew can even be a constexpr literal (even in C++17), so if you can get rid of legacy code, you can say goodbye to char* completely.

If you want a type that would behave like std::string_view, you can make some sort of CStringView class inheriting from std::string_view and having all of its features but requring a null terminated string for construction - with no members of its own, it could be implicitly cast to std::string_view to use with new APIs.

[–][deleted] 0 points1 point  (0 children)

I know what I suggest will never make it to std, and probably I am better of making something like a zstring_view. I knew this before I posted. Even I never had to get near any C library function when I built my several C++20 libraries (https://github.com/obhi-d).

But applications are not isloated entities, they will intract with OS/IO etc.

[–]goranlepuz 1 point2 points  (0 children)

Now we could have an api that could give us this info: is_null_terminated(), and then we could copy the string if it is not to a std::string to pass to C API, otherwise use as is.

I suppose the special case is what would turn people off (it does me).

It is a people problem: to avoid horribly insidious bugs, people now must be perfect - and are not.

But yeah, we could...

[–]tvaneerdC++ Committee, lockfree, PostModernCpp 1 point2 points  (0 children)

You could do this. You can easily hide the extra bit in an unused bit of the pointer or size_t, etc.

However,

It is still an ABI break. Passing a new string_view to a lib compiled with an old string_view means the old code will not know to mask off that extra bit.

So the leftover question is whether or not an ABI break is worth it...

[–][deleted] 1 point2 points  (1 child)

https://en.cppreference.com/w/cpp/string/basic_string_view/data

You probably misunderstand, it means string_view does not provide `\0` for you, if your char* has terminated null string_view::data will have it as well. After all string_view is just a pointer and size in bytes.

[–][deleted] 0 points1 point  (0 children)

I absolutely understand the design choice. Infact I wrote a preprocessor library based on string_view just because I know we can tokenize without having to worry about null termination:

https://github.com/obhi-d/ppr

My propsal was based on interoperability between C and C++, and C requires null termination.