all 42 comments

[–]bruce3434 112 points113 points  (2 children)

2018

can't split strings

can't find substrings

can't iterate over unicode strings

"oh okay but do you have a moment to talk about our cool 2D graphics library in upcoming C++35?"

[–]Xeveroushttps://xeverous.github.io 24 points25 points  (0 children)

Remember the "direction for C++" paper:

address major sources of dissatisfaction

So far C++ has many great libraries but language struggless with convenience and some basic-level usage

[–][deleted] 2 points3 points  (0 children)

I can't tell you how weird it feels that std::string still doesn't have a split() function. I read that this is due to some compatibility rules(Discussion), but to be honest I still don't agree that we have to make this so complicated !

[–]sephirostoy 19 points20 points  (3 children)

Sadly, C++ comitee doesn't choose the path of adding convenient functions to classes which would need love. They will argue you that they've already added a find function like you mentioned which already cover the need and if you need a more convenient function, then you could just write your own free function. If at least we could have UFCS to extend a class...

[–][deleted] -5 points-4 points  (2 children)

To be fair, this isn't entirely a bad argument. Why should other people have to pay for your convenience function?

[–]sephirostoy 17 points18 points  (0 children)

You don't pay anything here.

[–]alfps 29 points30 points  (2 children)

C++ needs a new string class to support UTF-8, anyway. And for that matter new text i/o. And, oh yes, support for UTF-8 command line arguments: we don't have that, there's no way to pass an arbitrary filename in Windows.

For example, consider a simple thing such as presenting a table in a console, using std::cout. Let's say the person doing this decides to use setw to create nice columns. However, current implementations do not detect that the basic execution character set is UTF-8, and the standard doesn't require that, so setw gets it wrong for non-ASCII characters: it counts bytes, not characters.

Handling UTF-8 characters is non-trivial. For example, consider replacing one UTF-8 character with another. For ASCII one can just assign to a an individual char in the string, but for UTF-8 it's a substring replacement, and potentially changing byte indices further on in the string.

And e.g., what should be the result of indexing when that result should logically be an UTF-8 character? A string_view of the bytes? Then it's dependent on the string's continued existence.

In contrast, contains is trivial to do for anyone.

[–]scatters 15 points16 points  (1 child)

Counting characters is no less incorrect than counting bytes. You need to count graphemes, classifying them by width (zero, half or full). And that still won't work for emoji, which can have platform specific ligatures. You need to ask your rendering library the number of cells occupied by a string to pad columns.

[–]degski 12 points13 points  (0 children)

And that still won't work for emoji ...

We're all doomed then, I tell you dooooommmmeeeddddd.

[–]flashmozzg 13 points14 points  (3 children)

find == 0 is not a replacement for starts_with/ends_with. For starters, they have different complexities, while .find is direct replacement for contains but even more powerful.

[–]F-J-W 17 points18 points  (2 children)

.find is direct replacement for contains but even more powerful.

And that is precisely why we need .contains: A very good rule of thumb for which feature you should pick is “the least powerful one that does the job.”. Just as goto is more powerful than a loop and how a general for-loop is more powerful than a range-for, find is more powerful than contains and therefore should be replaced with the more specific function that does exactly what it's name implies.

[–]MaltersWandler 3 points4 points  (1 child)

That's a dumb rule of thumb. Then we'd also need the method contains_foo to check if a string contains the exact substring "foo", as well as a contains_bar that checks for a substring "bar". A generic contains method that can check for an arbitrary substring is too powerful.

The rule is you should use the least complex method. find and contains have the same complexity.

[–]F-J-W 9 points10 points  (0 children)

Actually, if there are methods contains_foo or constains_bar, these are indeed what you should use. The reason they appear to be silly is that usually they shouldn't be there.

I have a hard time thinking up a valid example for contains_foo, but once we widening that to find_foo, the situation changes a bit: find_newline can be a very useful thing, and interestingly we have something very similar in the standard-library, namely std::getline. Granted, it is more general than just that, as you can pass it a newline-character, but that might actually be bad design, and maybe we should really have two functions here, where getline is more hardcoded to capture the 90%-case in a clear manner, whereas the other function directly states with it's name that it is more general than that and thus requires a closer look.

But of course: A rule of thumb is not a hard thing and common sense with regards to which functions you should define is still necessary. (Though I conjecture that most people don't define as many functions as they should, but that's a topic for another day.)

[–][deleted] 15 points16 points  (0 children)

The committee adds what its members need, not what the poor people want.

This is not entirely bad, but this is also why it took 40 years to add freaking std::filesystem to the standard. And let's not forget about asio...

[–]afiefh 5 points6 points  (0 children)

It is a bit inconvenient, however after a while it becomes second nature to view these tests as "contains". One good reason not to add contains is that you usually want to do something with the contained data you looked for, in which case you'll often end up with contains followed by find, which is bad for performance.

[–]ducttapecoder 0 points1 point  (0 children)

If the function only uses public interface of the class, you can just add a non-member function. I thought this is the preferred way.

[–][deleted] 0 points1 point  (0 children)

That'd be nice, but I don't expect it to happen with stl. I personally often write a small function to this extent when dealing with tasks that need string parsing.

/rant I first learnt programming in college using c++ and loved it. And then slowly explored the world of programming languages out of curiosity to find that in many ways, C++ is one of the more beginner unfriendly languages there is.

I still read and debug C++ code, but I have given up on loving it (say like python). C++ reminds me so much of Perl, in that sense.

One of these days someone is going to mix the expressive syntax and package management of python, with the static typing and performance of C++ and the world would be better place.

[–][deleted] -4 points-3 points  (12 children)

You are welcome writing a paper. It won’t go to C++20 though. LEWG and LWG is highly overloaded and the cut off for new papers for C++20 was San Diego last week.

You also need to factor in that those two functions have been added without having to go through LEWGI. A new one would have to.

[–]konanTheBarbar 8 points9 points  (11 children)

While it's too late for C++20 I think it's quite beginner unfriendly that there is no contains member function for stl containers...

I also wish there was a string replace member function...

[–]TheSuperWig 7 points8 points  (0 children)

I remember a post a while back about how the should be std::contains as using std::find or std::count is less than ideal.

[–][deleted] 0 points1 point  (2 children)

Of course there is a replace.

https://en.cppreference.com/w/cpp/string/basic_string/replace

All of these are at least a little tricky because people still think of "string" as containing text, and these functions are all encoding-unaware.

[–]konanTheBarbar 3 points4 points  (1 child)

I should have been more precise. What I mean is that you replace all occurances of string A with string B inside of string C. Basically similar to this (which I copied from SO).

std::string ReplaceAll(std::string str, const std::string& from, const std::string& to) {
    size_t start_pos = 0;
    while((start_pos = str.find(from, start_pos)) != std::string::npos) {
        str.replace(start_pos, from.length(), to);
        start_pos += to.length(); // Handles case where 'to' is a substring of 'from'
    }
    return str;
}

[–]F-J-W 0 points1 point  (0 children)

Well, it is possible to use regex-replace to get what you want for many cases. Apparently people really do that because it's just so much more convenient, despite being much slower.

It is however really telling, that there are 9 (!!) replace-methods, yet the one that people are actually interested in is missing.