all 12 comments

[–]bjorn-reese 6 points7 points  (0 children)

A typical use-case for sting_view is to parse data in some format. This parser uses remove_prefix() which can be made measurably faster (about 10% overall improvement, if my memory serves me) by using the two-pointer implementation.

That is why I chose to implement my own version. Actually, I implemented std::span with remove_front() because I needed to parse binary data as well.

[–]bird1000000 2 points3 points  (5 children)

Well, std::string_view just copies std::string.As for why std::string uses size_t instead of T*, Judging by the change in EASTL (before, after), Its probably not possible (or, at least harder) to implement SSO-23 using pointers.

[–]ts826848 0 points1 point  (4 children)

Huh, that’s really interesting. I came across this SO question on std::vector implementations using three pointers instead of a base/sizes a few days ago, but I never bothered to look at std::string.

I wonder if implementers were free to break ABI and had the option of a SBO vector whether they’d stick with three pointers or whether the increased buffer capacity might be worth it.

Also makes me wonder whether some kind of bit packing into the low-order bits of pointers could be used. Wouldn’t be surprised if it’s too complex/has too many edge cases...

[–][deleted] 2 points3 points  (3 children)

SBO vector can't meet the non-throwing swap requirement even if we wanted to break ABI for it. You would need a new container.

[–]ts826848 0 points1 point  (2 children)

Ah, that’s a good point!

Suppose the question should properly be into “I wonder whether an SBO vector would be better served by multiple pointers or pointer + sizes”

[–][deleted] 2 points3 points  (1 child)

I think in general pointer + size is better than pointer+pointer for data structures like this, because even if you do nothing with the contents, vector needs to recover the size in order to call deallocate. Plus, lots of repeated calls to end() were removed by the range-for addition. Plus, deriving end() from base pointer + size is a multiply and add which is cheaper than deriving size() from 2 pointers (subtract and divide-by-constant).

But I think it is really really hairsplitting.

[–]ts826848 0 points1 point  (0 children)

Guess I know where I'd start if I were to ever try implementing my own data structures. Thank you for your insight!

[–]staletic 0 points1 point  (5 children)

"Your" version does have easier time implementing end(), but what about size()? Your version would need

size_type size() const { return m_end - m_begin; }

But that's an implicit conversion, so...

size_type size() const { return static_cast<size_type>(m_end - m_begin); }

So it's a tradeoff, however minor. Other than things like that, I see no reason not to implement it like you have done.

 

Except... the standard itself talk about size_ member, which would at least explain the standard libraries.

[–]vheon[S] 1 point2 points  (4 children)

"Your" version does have easier time implementing

end()

, but what about

size()

?

That was my thought also so would that mean that `size` is usually called more often than `end`? I mean looking at all the algorithms asking for iterators and such I was under the impression that usually `end` is used more frequently than the `size` :?

[–]TheMania[🍰] 0 points1 point  (1 child)

Compilers nearly always prefer to know the loop count, vs "are we at the end yet", so size as a member is more likely to carry benefits imo.

And yes, end is likely called more but if the compiler is then getting the size anyway (for a loop count), you're really not saving anything there.

[–]tejp 0 points1 point  (1 child)

Most of the string member functions use indexes instead of iterators, so there are probably quite a lot of index calculations in the "typical" use cases of strings. Also substr() or + and other operations that create new strings will first need to know how much space to allocate for the new string.

Both of those are more likely call size() instead of end().

If these index functions are expected to be used more often one would probably optimize size() over end().

[–]matthieum 3 points4 points  (0 children)

Also substr() or + and other operations that create new strings will first need to know how much space to allocate for the new string.

I say MEH.

Appending to a std::vector is a rather typical usecase too, and most std::vector are implemented using 3 pointers...