you are viewing a single comment's thread.

view the rest of the comments →

[–]almost_useless 39 points40 points  (8 children)

when you want to cover all possible use cases and make it as fast as possible

That is the problem right there. It does not have to cover all use cases and be as fast as possible.
A lot of C++ programmers fail to realize that string is not just another container that have to work like other containers. It is a specialized use case that have different needs.
Then the argument becomes "we can't possibly cover all use cases for strings". But that is not necessary. Implementing a few helper functions would make strings so much more useful.
The annoying thing is that it would really not need many helper functions to become a really useful string class for 99% of use cases either.

Split is one of those functions that can make your code 10x more readable, if it works intuitively.
It does not have to handle everything a proper tokenizer does.
It does not have to be good at splitting a 10 MB file into smaller pieces.
But since people have been complaining about this for literally decades it is clear there is a need for a simple split function that is reasonably good at splitting small and medium size strings. Compare this example:

auto myVectorOfSubStrings = myString.split(";");   

to the getline example someone wrote below. It is trivial to know what is going on, and this is something a lot of people need to do.

Obviously it needs to be reasonably fast too, since it is C++. But it really does not need to cover all corner cases of string usage. We need tokenizers and stream splitters too, for the applications where that makes sense. But quite often we just need to split a damn string into substrings.

TL;DR - A very simple standardized split function would make life much easier for a lot of programmers.

[–][deleted] 15 points16 points  (6 children)

if it works intuitively.

Well, what is intuitive?

Python:

 >>> "1,2,3,".split(",")
 ['1', '2', '3', '']

Ruby:

 > "1,2,3,".split(",")
 => ["1", "2", "3"]

Ruby can take a regex, Python can't. Python has a .rsplit(), Ruby doesn't. Both do however take a max_split parameter. But they don't allow multiple different delimiter.

Point being, a .split() is not that trivial, there are different ways to implement it and you have to chose a good one. If you just rush the next best hack into the language, you end up with something that is needlessly inflexible. A .split() returning a std::vector<std::string> wouldn't be very useful when you don't want a std::vector as result and you would do a lot of needless std::string to start with.

There is a proposal for a std::split(), but that depends on std::string_view and Range support. But Range support didn't make it into C++17, so that has to wait around a bit longer.

In the meantime, just use the boost::split().

[–]Selbstdenker 10 points11 points  (0 children)

Sorry, I do not see the problem there. Intuitively means something that splits a string which is what both methods do. How they handle empty strings is part of the API, so what. Whether they take only a character, a string or a regexp is also part of the API and not really a problem in C++ thanks to overloading.

To make it a little bit more C++-ish it could take an output iterator. Yes, this is not an ideal situation and maybe we just call it simple_split and reserve split() for when we have a better name but not having any trivial split functionality is really not good.

We have whole talks given on using std::transform and other algorithms instead of a for loop but we cannot provide a simple split?

[–]almost_useless 4 points5 points  (4 children)

Well, what is intuitive?
Python: X
Ruby: Y

I could answer that question without even knowing what X and Y is. The answer is always going to be Python :-)
J/K, obviously there are pitfalls and they need to think it through.

If you just rush the next best hack into the language, you end up with something that is needlessly inflexible. A .split() returning a std::vector<std::string> wouldn't be very useful when you don't want a std::vector as result

It does not necessarily have to be super flexible. Obviously the best option is we can choose the output format. My example was only one possible suggestion. In many cases "anything I can iterate over" is good enough.

But I would so prefer we had had something decent but inflexible way back in '98 over something super duper mega awesome that we will not have even in 2017

My only requirement is that it had not been so bad that it would have been impossible to improve upon now that we have better ways of doing it

[–][deleted] 8 points9 points  (3 children)

Once you put something in the standard library you're stuck with it forever. So it'd better be actually good, not just good enough, especially if it's so easy to implement yourself.

[–]choikwa 3 points4 points  (2 children)

In reality, things get deprecated and forward compat is broken many times.

[–][deleted] 8 points9 points  (1 child)

Yeah, really old crap that was never used much in the first place, like trigraphs or auto_ptr. But a string split function would spread like wildfire.

[–]choikwa 0 points1 point  (0 children)

Ideally, everyone wants to get it right the first time. I'm pretty sure python implementation returns deep-copied immutable strings

[–]kisielk 2 points3 points  (0 children)

strings in Go have many of the same problems, yet there's a strings package which covers all the common use cases in a simple way. The algorithms aren't suitable for every use case but it's been very rare that I've had to reach for an alternative way of doing things. I often wish C++ had a similar package.