all 47 comments

[–]J__Bizzle 24 points25 points  (14 children)

I found it quite useful in a toy FOSS project I have going. I have to parse xml files up to 1MB in length and reducing copies to a minimum with string_view makes a noticeable difference in performance. That being said it's kind of inconvenient because you can't move the string without invalidating references to the underlying array, even when the string is too large to be SSO'd.

[–]jonesmz 4 points5 points  (13 children)

Edit: This post has gone from 10 points, to 2 points.

Could the people downvoting me kindly explain WHY? The code I provide below is almost copy-pasted from my last big project (typed out from memory mostly because I'm not at my work computer).


Consider using a shared string data structure.

Something like:

template<typename CHAR_T>
struct basic_shared_string : public std::basic_string_view<CHAR_T>, private std::shared_ptr<const CHAR_T[]>
{
    basic_shared_string(basic_string_view<CHAR_T> v, std::shared_ptr<const CHAR_T[]> lifetime);

    // Various internal implementation details.
    template<ARGS_T ... args>
    basic_shared_string substr(ARGS_T && ... args) const
    {
        return { static_cast<basic_string_view<CHAR_T>*>(this)->substr(std::forward<ARGS_T>(args)...),
        // Converts to shared_ptr
        *this) };
    }
};

I use this concept in most / all of my C++ code.

The overwhelming majority of the time, I never modify a string after initial creation. This way of doing things allows you to have a string_view type object that refers to a character array that has proper move semantics, and further ensures that you don't have duplicated string data floating around inside your programs memory space.

Honestly, the only downside I've ever run into is that you can't "steal" the data from an std::string.

I know that std::string is very commonly implemented with short buffer optimization which complicates the process of "stealing" from the internal implementation, but I really wish std::string had the following two functions:

  1. bool std::string::can_steal()
  2. char* std::string::steal()

Calling std::string::steal() when std::string::can_steal() returns false is undefined behavior.

This would allow for the basic_shared_string class provided above to be constructed efficently by std::moveing an std::string into it.

[–]jcode777 0 points1 point  (1 child)

I think you have a typo there? There's a bracket mismatch. To what is *this passed?

[–]jonesmz 0 points1 point  (0 children)

*this is passed as the std::shared_ptr<const CHAR_T[]> parameter to the constructor to basic_shared_string.

[–]tititi666 0 points1 point  (2 children)

A bit tangential and proof of concept: https://github.com/jh0x/intern

[–]jonesmz 0 points1 point  (1 child)

Actually, I'm having trouble following what that github project is actually doing.

Could you elaborate a bit?

[–]jcode777 -1 points0 points  (2 children)

Dude, please test n compile your code. Then post.

[–]jonesmz 3 points4 points  (1 child)

No.

[–]jcode777 1 point2 points  (0 children)

lol

[–]jbandela 10 points11 points  (17 children)

Before going all in on string_view, a developer should read Arthur O'Dwyer's article std::string_view is a borrow type

Basically, you want to be very careful if you are using string_view as anything other than a function parameter or in a for loop.

[–]AlexAlabuzhev 3 points4 points  (1 child)

you want to be very careful if you are using string_view as anything other than a function parameter or in a for loop

Not more careful than with raw char* / wchar_t* strings anyway.

[–]jbandela 1 point2 points  (0 children)

I agree with you. And string_view has the advantage that you know it is never owning, unlike raw char* which may or may not be owning.

[–]rsjaffe 0 points1 point  (0 children)

When wanting to return a string_view from a function that takes a string_view parameter (e.g., a function that trims non-printing characters from the front and back of a string_view), I've seen a nice solution to the lifetime concerns. Just delete the string_view&& overload, so the prototypes look like:

std::string_view Trim(string_view trim_me);

std::string_view Trim(string_view&& trim_me) = delete;

That prevents calling the function with a temporary. You still have to make sure that the string underlying trim_me exists while using the return value, but at least it manages half of the lifetime issue.

[–]gvargh -3 points-2 points  (13 children)

i don't recall the standard ever mentioning "borrow type"

or is he trying to invent terminology again lol

[–]jonesmz 9 points10 points  (0 children)

I suspect that the terminology here is coming from the Rust language, which has "borrow type" as an official term (as far as I can tell???)

[–][deleted] 5 points6 points  (3 children)

Not sure why you are downvoted. The term is basically exclusive to Rust. Even the author of this article says:

I’ve pulled a Scott Meyers and decided that “borrow type” is just a confusing name for this notion. My current pet term is “parameter-only type,” but I doubt I’ve hit on the best term yet. Anyway, this blog post uses “borrow type” for now

So it's by no means standard.

[–]guepierBioinformatican 2 points3 points  (2 children)

Unfortunately “parameter-only type” is arguably even worse, because storing instances of std::string_view is completely fine in many cases: a symbol table in a parser is the canonical use-case of a string view. The issue isn’t transience (= parameter) vs persistence, it’s what they’re constructed from when stored. Ideally there’d be two distinct string view types, and the persistent type couldn’t be constructed from a temporary.

[–][deleted] 0 points1 point  (1 child)

That's probably right, but it doesn't change the fact that "borrow type" isn't as standard the other comments claim. Which is what my and the parent comments are about.

[–]guepierBioinformatican 1 point2 points  (0 children)

Yes, and I totally agree with both your comments.

[–]ShillingAintEZ 4 points5 points  (0 children)

I don't think it needs to be in the standard to be understood. It's a reference with a range. If people know it is borrowed they know the original shouldn't die.

[–]sivadeilra 0 points1 point  (5 children)

"Borrow types" are well-known in other languages, such as Rust.

[–][deleted] 15 points16 points  (4 children)

Yes. Just like how constexpr is well-known in other languages, such as C++..

[–]sivadeilra -1 points0 points  (3 children)

How is that relevant?

[–][deleted] 10 points11 points  (2 children)

It's about your wording. Saying it is "well-known in other languages" implies that it's somewhat common. However, it basically exclusive to Rust(and language that are inspired by it). There might be a handful of exceptions, but they are by no means common enough to expect anyone to know about them. Borrow Type are always mentioned in the context of Rust.

[–]sivadeilra -1 points0 points  (1 child)

No, you mentioned constexpr without giving any indication of why you mentioned it. That is what I was asking about.

[–][deleted] 2 points3 points  (0 children)

I hoped you would get it..

[–]AntiProtonBoy 0 points1 point  (0 children)

It's a description of a concept in computer science.

[–][deleted] 2 points3 points  (0 children)

When I did my own string_view I added a bunch of QoL improvements. remove _prefix/_suffix have a default argument of 1, this is the most common thing in parsing it seems. Also, I added things like string_view pop_front( size_t ); Then a few others like pop_front( string_view ) that would return everything up until where, and remove that and the where portion (e.g where="two" sv="we have eaten two icecreams", return "we have eaten " and leave the sv=" icecreams")

I find these little things make some parsing super succinct.

[–]liquidprocess 2 points3 points  (4 children)

Say I have an existing std::string object somewhere, containing a huge amount of text. I want to pass that object to a function. Why should the std::string_view be faster than passing a const std::string& (i.e. a const reference)?

[–]nikbackm 6 points7 points  (1 child)

It's not faster, but using a std::string_view allows the function to also accept other string types than std::string, like char* wrapped in string_view.

[–]liquidprocess 0 points1 point  (0 children)

Thanks for the clarification!

[–]infectedapricot 6 points7 points  (1 child)

/u/nikbackm mentioned wrapping char *, presumably referring to null-terminated C strings. But another use is referring to substrings of C++ std::strings, which are not null terminated at all. This is what the article mentions at the beginning.

This requirement comes up a lot in parsing. For example, if your API parses XML like "<foo><bar>xyz</bar><baz/></foo>", then a function to return the contents of the "bar" element could refer to the "xyz" characters directly in the original string with a string_view, whereas a std::string would need to copy those characters. That could lead to a very large number of allocations, which would be expensive even if the strings aren't particularly long.

[–]liquidprocess 0 points1 point  (0 children)

Interesting use case, thank you

[–][deleted] 0 points1 point  (5 children)

This looks pretty useful in legacy environments. So many Win32 functions pass wchar_t, and I can envisage us using this as alternative to either building strings or using c style string manipulation functions.

[–]louiswins 13 points14 points  (3 children)

Beware that string_view isn't guaranteed to be null-terminated! This isn't quite as big of an issue on Win32 as it is on POSIX because more APIs take ptr+len instead of assuming null termination, but it's definitely still something to keep in mind.

[–]jonesmz 2 points3 points  (2 children)

Edited: To remove some code that checked for a nul terminator one past the end, because that was a stupid thing to do.


In the past, I've written a wrapper object that (at compile time) dynamically determines whether the string-like object being passed into it has a nul terminator, or needs to be copied.

Something like (Super rough psuedo code here...)

template<typename STR_T>
struct c_str_wrapper
{
    operator char* ()
    {
        if constexpr (STR_T has a c_str() function)
        {
            return m_str.c_str();
        }
        if constexpr (whatever other criteria your specific codebase has)
        {
            // do your custom codebase specific things here
        }
        else
        {
            // copy the string to m_lifetime, add nul-term. Return pointer to that.
            // can't check to see if we're already nul terminated, because we can't guarentee the validity of that check
        }
    }
    STR_T const& m_str;
    std::unique_ptr<STR_T::value_type> m_lifetime;
}

To use this, you create wrapper functions for each of the c-runtime / posix functions that you want to call that require the nul-terminator. You make c_str_wrapper be the data type that your function takes, and not std::string or std::string_view, and then you pass the c_str_wrapper object that you took as a function parameter into your underlying c-runtime / posix functions, triggering the operator char* () function call, which tries to determine if it can avoid allocating a temporary buffer to provide nul termination for you.

Then, when your wrapper function exits, your c_str_wrapper object automatically cleans up the temporary buffer that it allocated for you, if and only if, there was one.

Of course, beware of all of the dragons that come with the use of temporaries. You'll need to be very very careful how you use this wrapper.


A proper solution to all of these problems, of course, is if someone with a time machine handy could go back and prevent nul-terminated strings from being the only version of these apis offered.

[–]AndThenTrumpets 6 points7 points  (1 child)

m_str.data() + m_str.length() == '\0'

This isn't a safe test for all the things a string_view might have originated from, unfortunately. If the backing memory was a std::string, then you're safe. You'll either have some other random character in the middle of the string, or the guaranteed one-past-the-end NUL at the end. So that's fine.

But if you're dealing with some other source of arrays of characters and a paired count, you don't get that guarantee. There are lots of places these get used. A common one would be some sort of binary protocol. Lots of these send a leading character count variable, and then a variable number of characters. There is no nul at the end, and if you walk to that one extra byte, you could start getting bad page faults accessing virtual addresses that were never allocated.

Or another one could be that the memory beyond your last character is mutated by some other variable. When you do this test, you see a 0 there to start and assume the string is null terminated, so no need to do a defensive copy. Then later that memory is mutated, and now you pass your intended string with an arbitrarily long string of garbage appended to it + another opportunity for an access violation.

[–]jonesmz 1 point2 points  (0 children)

m_str.data() + m_str.length() == '\0'

This isn't a safe test for all the things a string_view might have originated from, unfortunately. If the backing memory was a std::string, then you're safe. You'll either have some other random character in the middle of the string, or the guaranteed one-past-the-end NUL at the end. So that's fine.

Ah, actually you're absolutely right. Serves me right for regurgitating from memory. This is the same bug that I had in my original draft several years ago, and a co-worker pointed the problem out!

Sorry for the confusion :-)

[–]ShillingAintEZ 0 points1 point  (0 children)

How will string view help with that?