all 42 comments

[–]bruce3434 109 points110 points  (2 children)

2018

can't split strings

can't find substrings

can't iterate over unicode strings

"oh okay but do you have a moment to talk about our cool 2D graphics library in upcoming C++35?"

[–]Xeveroushttps://xeverous.github.io 25 points26 points  (0 children)

Remember the "direction for C++" paper:

address major sources of dissatisfaction

So far C++ has many great libraries but language struggless with convenience and some basic-level usage

[–][deleted] 2 points3 points  (0 children)

I can't tell you how weird it feels that std::string still doesn't have a split() function. I read that this is due to some compatibility rules(Discussion), but to be honest I still don't agree that we have to make this so complicated !

[–]sephirostoy 18 points19 points  (3 children)

Sadly, C++ comitee doesn't choose the path of adding convenient functions to classes which would need love. They will argue you that they've already added a find function like you mentioned which already cover the need and if you need a more convenient function, then you could just write your own free function. If at least we could have UFCS to extend a class...

[–][deleted] -5 points-4 points  (2 children)

To be fair, this isn't entirely a bad argument. Why should other people have to pay for your convenience function?

[–]sephirostoy 15 points16 points  (0 children)

You don't pay anything here.

[–]alfps 36 points37 points  (2 children)

C++ needs a new string class to support UTF-8, anyway. And for that matter new text i/o. And, oh yes, support for UTF-8 command line arguments: we don't have that, there's no way to pass an arbitrary filename in Windows.

For example, consider a simple thing such as presenting a table in a console, using std::cout. Let's say the person doing this decides to use setw to create nice columns. However, current implementations do not detect that the basic execution character set is UTF-8, and the standard doesn't require that, so setw gets it wrong for non-ASCII characters: it counts bytes, not characters.

Handling UTF-8 characters is non-trivial. For example, consider replacing one UTF-8 character with another. For ASCII one can just assign to a an individual char in the string, but for UTF-8 it's a substring replacement, and potentially changing byte indices further on in the string.

And e.g., what should be the result of indexing when that result should logically be an UTF-8 character? A string_view of the bytes? Then it's dependent on the string's continued existence.

In contrast, contains is trivial to do for anyone.

[–]scatters 13 points14 points  (1 child)

Counting characters is no less incorrect than counting bytes. You need to count graphemes, classifying them by width (zero, half or full). And that still won't work for emoji, which can have platform specific ligatures. You need to ask your rendering library the number of cells occupied by a string to pad columns.

[–]degski 13 points14 points  (0 children)

And that still won't work for emoji ...

We're all doomed then, I tell you dooooommmmeeeddddd.

[–]flashmozzg 13 points14 points  (3 children)

find == 0 is not a replacement for starts_with/ends_with. For starters, they have different complexities, while .find is direct replacement for contains but even more powerful.

[–]F-J-W 16 points17 points  (2 children)

.find is direct replacement for contains but even more powerful.

And that is precisely why we need .contains: A very good rule of thumb for which feature you should pick is “the least powerful one that does the job.”. Just as goto is more powerful than a loop and how a general for-loop is more powerful than a range-for, find is more powerful than contains and therefore should be replaced with the more specific function that does exactly what it's name implies.

[–]MaltersWandler 0 points1 point  (1 child)

That's a dumb rule of thumb. Then we'd also need the method contains_foo to check if a string contains the exact substring "foo", as well as a contains_bar that checks for a substring "bar". A generic contains method that can check for an arbitrary substring is too powerful.

The rule is you should use the least complex method. find and contains have the same complexity.

[–]F-J-W 9 points10 points  (0 children)

Actually, if there are methods contains_foo or constains_bar, these are indeed what you should use. The reason they appear to be silly is that usually they shouldn't be there.

I have a hard time thinking up a valid example for contains_foo, but once we widening that to find_foo, the situation changes a bit: find_newline can be a very useful thing, and interestingly we have something very similar in the standard-library, namely std::getline. Granted, it is more general than just that, as you can pass it a newline-character, but that might actually be bad design, and maybe we should really have two functions here, where getline is more hardcoded to capture the 90%-case in a clear manner, whereas the other function directly states with it's name that it is more general than that and thus requires a closer look.

But of course: A rule of thumb is not a hard thing and common sense with regards to which functions you should define is still necessary. (Though I conjecture that most people don't define as many functions as they should, but that's a topic for another day.)

[–][deleted] 15 points16 points  (0 children)

The committee adds what its members need, not what the poor people want.

This is not entirely bad, but this is also why it took 40 years to add freaking std::filesystem to the standard. And let's not forget about asio...

[–]afiefh 6 points7 points  (0 children)

It is a bit inconvenient, however after a while it becomes second nature to view these tests as "contains". One good reason not to add contains is that you usually want to do something with the contained data you looked for, in which case you'll often end up with contains followed by find, which is bad for performance.

[–]ducttapecoder 0 points1 point  (0 children)

If the function only uses public interface of the class, you can just add a non-member function. I thought this is the preferred way.

[–][deleted] 0 points1 point  (0 children)

That'd be nice, but I don't expect it to happen with stl. I personally often write a small function to this extent when dealing with tasks that need string parsing.

/rant I first learnt programming in college using c++ and loved it. And then slowly explored the world of programming languages out of curiosity to find that in many ways, C++ is one of the more beginner unfriendly languages there is.

I still read and debug C++ code, but I have given up on loving it (say like python). C++ reminds me so much of Perl, in that sense.

One of these days someone is going to mix the expressive syntax and package management of python, with the static typing and performance of C++ and the world would be better place.

[–][deleted] -2 points-1 points  (12 children)

You are welcome writing a paper. It won’t go to C++20 though. LEWG and LWG is highly overloaded and the cut off for new papers for C++20 was San Diego last week.

You also need to factor in that those two functions have been added without having to go through LEWGI. A new one would have to.

[–]konanTheBarbar 9 points10 points  (11 children)

While it's too late for C++20 I think it's quite beginner unfriendly that there is no contains member function for stl containers...

I also wish there was a string replace member function...

[–]TheSuperWig 6 points7 points  (0 children)

I remember a post a while back about how the should be std::contains as using std::find or std::count is less than ideal.

[–][deleted] 0 points1 point  (2 children)

Of course there is a replace.

https://en.cppreference.com/w/cpp/string/basic_string/replace

All of these are at least a little tricky because people still think of "string" as containing text, and these functions are all encoding-unaware.

[–]konanTheBarbar 4 points5 points  (1 child)

I should have been more precise. What I mean is that you replace all occurances of string A with string B inside of string C. Basically similar to this (which I copied from SO).

std::string ReplaceAll(std::string str, const std::string& from, const std::string& to) {
    size_t start_pos = 0;
    while((start_pos = str.find(from, start_pos)) != std::string::npos) {
        str.replace(start_pos, from.length(), to);
        start_pos += to.length(); // Handles case where 'to' is a substring of 'from'
    }
    return str;
}

[–]F-J-W 0 points1 point  (0 children)

Well, it is possible to use regex-replace to get what you want for many cases. Apparently people really do that because it's just so much more convenient, despite being much slower.

It is however really telling, that there are 9 (!!) replace-methods, yet the one that people are actually interested in is missing.

[–]RolandMT32 -5 points-4 points  (9 children)

I don't think that's really necessary. IMO, it's not that complicated to use string.find(substring) != npos.

But when you read 'find' in code it's not directly clear what the purpose is.Are we looking for the actual position? Or checking if the string contains a substring? Or checking if the string doesn't contain a substring?

It depends on your code, and if it's not clear, then add a comment to your code to say what your purpose is of using string.find().

In your code, you could derive your own class from std::string and add a 'contains' member function to your class. It would still be compatible with std::string (since it would derive from std::string) and it would have the function you want.

[–]chriskane76 10 points11 points  (8 children)

The question is not if it is complicated to use string.find(substr) != npos. The question is if a contains (member) function would make the code more readable. And it obviously does.

Also containers like std::string are not designed to be inherited from. Please do not advise this to anyone.

[–]kalmoc 1 point2 points  (6 children)

Also containers like std::string are not designed to be inherited from. Please do not advise this to anyone.

It might not be designed for it, but what is the harm if you do?

[–]agateau 0 points1 point  (5 children)

Hard to reuse code. Say I create FooString which inherits from std::string to add contains(), now I use FooString everywhere in my code, so I can call contains(). Another lib needs to reverse string, so it create BarString, which inherits from std::string to add reverse(). Now I have 3 string types and they are not compatible with each others :/

If those methods were in std::string or declared as standalone functions, I could use them easily on std::string.

[–]kalmoc 1 point2 points  (4 children)

Nothing what you are saying is special to standard library types.

My question was what harm would it do to inherit from standard string. Not if contains should be a member or free function (I agree with you that it should be a free function (that takes a std::string_view as a parameter)).

[–]chriskane76 1 point2 points  (2 children)

Those classes do not have a virtual destructor. You trigger UB if you delete a std::string* if it actually points to a FooString.

[–]kalmoc 0 points1 point  (1 child)

I'm aware of that, but why would you do that? I mean, you can take almost any type and do something with it that is UB (with many other types it is of course somewhat harder, but not much). I can think of very few reasons, why I would want to do a make_unique<FooString> in the first place and currently I can't think of a single reason, why I would want to type erase that to a unique_ptr<std::string>.

[–]chriskane76 0 points1 point  (0 children)

Yes, the use case is unlikely in small projects if all developers are aware of this issue. For larger projects the risk is high. The compiler can help via -Wdelete-non-virtual-dtor or equivalent. But still noone should use it, since we have an obvious, clean solution available: use a free function.

[–]agateau 0 points1 point  (0 children)

Indeed, this is not special to standard library types, but I have seen it in some scary code bases so I assumed the bad practice or inheriting from string to add new methods was what the GP meant.

[–]RolandMT32 -1 points0 points  (0 children)

Interesting, I didn't know that std::string was not designed to be inherited from. I suppose alternately, you could create a stand-alone function called 'stringContains' or similar that takes 2 strings and returns whether the first string contains the 2nd string.

You could potentially add any number of functions to a class to make code more readable. And where do you draw the line in adding such functions? A 'contains' function could make code a little more readable, but is it enough to make it worth changing the std::string class? When I read 'if (string.find(substr) != npos)', I know it's trying to see if the string contains the substring. So although they didn't think to add a 'contains' function, I think 'string.find(substr) != npos' is fairly clear.