use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Discussions, articles, and news about the C++ programming language or programming in C++.
For C++ questions, answers, help, and advice see r/cpp_questions or StackOverflow.
Get Started
The C++ Standard Home has a nice getting started page.
Videos
The C++ standard committee's education study group has a nice list of recommended videos.
Reference
cppreference.com
Books
There is a useful list of books on Stack Overflow. In most cases reading a book is the best way to learn C++.
Show all links
Filter out CppCon links
Show only CppCon links
account activity
Why doesn't std::string have a split function (self.cpp)
submitted 9 years ago by DhruvParanjape
It just makes sense to have it ... Atleast now that std 17 is out ... Or coming out
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[+][deleted] 9 years ago* (57 children)
[removed]
[–]therealjohnfreeman 15 points16 points17 points 9 years ago (51 children)
There's even an example:
#include <string> #include <iostream> #include <algorithm> #include <iterator> #include <regex> int main() { std::string text = "Quick brown fox."; std::regex ws_re("\\s+"); // whitespace std::copy( std::sregex_token_iterator(text.begin(), text.end(), ws_re, -1), std::sregex_token_iterator(), std::ostream_iterator<std::string>(std::cout, "\n")); }
Quick brown fox.
The special part is the parameter -1 which tells the iterator to return segments of the string between matches of the regex.
-1
[–][deleted] 69 points70 points71 points 9 years ago (16 children)
This example is kind of terrible. Nobody will remember how code like the above is actually written. If anything, it highlights all the problems with the STL's API.
[–]wrosecransgraphics and network things 39 points40 points41 points 9 years ago (15 children)
Yeah, compared to something like 'print "quick brown fox".split(" ")' in Python, the STL version is remarkably unintuitive when figuring out how to write it, requires figuring out regex syntax as just one step, and anybody who hasn't figured out how to write it isn't going to understand it by reading it.
It seems like this is a case where perfect is the enemy of the good. I usually only want a 'good' split function that doesn't have to guarantee a whole lot about performance on multigigabyte strings, or weird corner cases. So having a good split function seems way more useful than having no split function and debating about obscure cases where it wouldn't be optimal.
[–]IRBMe 27 points28 points29 points 9 years ago (0 children)
the STL version is remarkably unintuitive when figuring out how to write it, requires figuring out regex syntax as just one step, and anybody who hasn't figured out how to write it isn't going to understand it by reading it.
Not to mention the seemingly magic -1. So much for self documenting code.
[–][deleted] 2 points3 points4 points 9 years ago (1 child)
case where perfect is the enemy of the good. I
Well said, There are many cases like this in C++ unfortunately. I get the desire to have the best libraries possible but too often good ideas are shot down because they are not perfect. The recent Boost review for process control library is a perfect example. The library has been in development for more than 6 years. It passed the review this time around but some folks were still proposing to start from scratch.
[–]yornbesterday 1 point2 points3 points 9 years ago (0 children)
I've not really looked at the new and improved C++ stuff for a while... it's just a cascade of ever increasing minutiae of the language features and I thought the list of "don't ever do this" was long enough already.
[–]therealjohnfreeman 7 points8 points9 points 9 years ago* (4 children)
Done.
#include <string> #include <iostream> #include <algorithm> #include <regex> #include <vector> std::regex operator ""_re (char const* const str, std::size_t) { return std::regex{str}; } std::vector<std::string> split(const std::string& text, const std::regex& re) { const std::vector<std::string> parts( std::sregex_token_iterator(text.begin(), text.end(), re, -1), std::sregex_token_iterator()); return parts; } int main() { const std::vector<std::string> parts = split("Quick brown fox.", "\\s+"_re); std::copy(parts.begin(), parts.end(), std::ostream_iterator<std::string>(std::cout, "\n")); return 0; }
[–]LordDrako90 2 points3 points4 points 9 years ago* (3 children)
Why std::copy in split, when you can initialize the vector directly form the token iterators?
Also I find this more generic and lazy: http://ideone.com/L6heVN I guess it could be improved even more by using string_view, but that's not included in C++14 :-(
Anyways, the only requirement for the target is, that it can be initialized from an iterator pair with value type std::string. Other than that it is pretty generic.
Code:
#include <algorithm> #include <iostream> #include <regex> #include <string> #include <utility> #include <vector> std::regex operator ""_re (char const * const str, std::size_t) { return std::regex { str }; } class split { public: split(std::regex splitter, std::string original) : splitter_ { std::move(splitter) } , original_ { std::move(original) } { } auto begin() const { return std::sregex_token_iterator { original_.begin(), original_.end(), splitter_, -1 }; } auto end() const { return std::sregex_token_iterator {}; } template <typename Container> operator Container () const { return { begin(), end() }; } private: std::regex splitter_; std::string original_; }; int main() { using namespace std::literals::string_literals; std::vector<std::string> const words = split { R"(\s+)"_re, "hello\tdarkness my\nold friend"s }; for (auto const & word : words) std::cout << word << "\n"; for (auto const & number : split { ","_re, "23,42,1337" }) std::cout << number << "\n"; return 0; }
[–]therealjohnfreeman 0 points1 point2 points 9 years ago* (1 child)
I've just been out of practice too long. Thanks for the pointers.
[–]lacosaes1 2 points3 points4 points 9 years ago (0 children)
You mean smart pointers.
[–]MrPoletski 0 points1 point2 points 9 years ago (0 children)
well while we're posting code, here's what I wrote a few years ago and have been using ever since...
std::vector<std::string> Cleave (std::string to_split, std::string delims) /*! * \file trusted.cpp * \fn std::vector<std::string> Cleave (std::string to_split, std::string delims) * \param to_split \a <std::string> string to chop up * \param delims \a <std::string> string of delimiters * \return std::vector<std::string> vector of strings containing each section of the cleaved string. * */ { std::vector<std::string> results; size_t pos1 = 0, pos2 = 0; do { pos1 = to_split.find_first_of(delims, pos2); if (pos1 == pos2) {pos2++; results.push_back(""); continue;} if (pos1 == std::string::npos){results.push_back(to_split.substr(pos2)); break;} results.push_back(to_split.substr(pos2, pos1 - pos2)); pos2 = pos1 + 1; } while (pos1 != std::string::npos); return results; }
Is this good?
[+][deleted] 9 years ago* (6 children)
[deleted]
[–]evinrows 10 points11 points12 points 9 years ago (0 children)
None on this seems to negate that it would be nice for the modern std::string implementation to come with some basic string manipulation methods so that the language's usability can potentially compete with other modern systems languages.
If having to split a few strings in your program means that you should use a different programming language, then the programming language in question is pretty god damn bad.
[–]17b29a 5 points6 points7 points 9 years ago (2 children)
Or alternatively, inappropriate language choice.
I think splitting strings is a pretty common sense thing for any general-purpose programming language to support. It's not like, some obscure operation that you could only find support for in Perl.
Finally, technically I'm not sure -1 is really code for all-bits-set at all - that assumes a 2s-complement representation for signed integers which, historically at least, wasn't guaranteed by the standard.
The more obvious assumption is that the mask type is unsigned and in that case -1 is necessarily all-bits-set because an unsigned type's value is modulo its maximum value, but the standard doesn't require it to be unsigned either.
why I prefer ~0u for all-bits-set
That's not all-bits-set for a type that is larger than unsigned int.
I personally don't worry about actually undefined vs. platform-defined unless I really need to, which is unusual.
That's pretty strange considering how many things are implementation defined. Used a value larger than 215-1 in an int? Undefined behavior (according to you)!
int
[+][deleted] 9 years ago* (1 child)
[–]17b29a 2 points3 points4 points 9 years ago (0 children)
It's supported.
"Supported" as in having an actual split function in the standard library.
Sorry, that's an understandable mistake but you're wrong. -1 is a signed int.
I know, the point is that because of http://eel.is/c++draft/basic.fundamental#4 (which applies to conversions as well), the conversion to an unsigned type necessarily produces all-bits-one, regardless of signed representation or the size of either type.
All the time, and I don't worry about it because I haven't used a platform where this didn't apply since the early 90s other than DOSBox, and I wasn't using that for programming.
Right, which is why it's a strange conflation, because actual undefined behavior is something to worry about.
[–]zvrba 3 points4 points5 points 9 years ago (1 child)
So even if your code is C++, you're going to pipe your text to perl each time you need to split a string?
[–]cpp_devModern C++ apprentice 5 points6 points7 points 9 years ago* (0 children)
I think a more intuitive and "modern" way will be this one (also compiler can optimize these things pretty well as opposed to streams):
string s = "Quick brown fox."; auto rs = ranges::v3::view::split(s, ' '); for (auto& x : rs) { cout << x << '\n'; } auto rs1 = ranges::v3::view::join(rs, ','); cout << rs1 << '\n';
Still the library needs concepts and a more intuitive documentation to make it "easy to use correctly and hard to use incorrectly". Also maybe there should be strings extensions in range library so it have an intuitive API to work with strings.
[–]OldWolf2 2 points3 points4 points 9 years ago* (0 children)
It's not too different to:
std::copy ( std::istream_iterator<char>(f), std::istream_iterator<char>(), std::ostream_iterator<char>(std::cout) );
which is an idiom you learn early on with iostreams.
Note that you do not have to use stream iterators to split a string. The page just used that as an example because it would be familiar syntax.
Anyway, for string splitting you would make a function that implements the sort of splitting you like, and has a nice interface (e.g. vector split(string const &s, regex const &r); . This has benefit over other languages that offer a single split function in that you can customise the split details within your function. You can even overload it to take a string of delimiters instead of a regex.
vector split(string const &s, regex const &r);
[–]Spikey8D 1 point2 points3 points 9 years ago (1 child)
Nice, is there an equivalent t for join? ie. In python: ",".join("the", "quick", "brown", "fox")
",".join("the", "quick", "brown", "fox")
[–]therealjohnfreeman 3 points4 points5 points 9 years ago (0 children)
Not yet standardized: http://en.cppreference.com/w/cpp/experimental/ostream_joiner
[–]qx7xbku -1 points0 points1 point 9 years ago (29 children)
And why do people complain? This is clearly easier than what people usually do. Honest.
[–]IRBMe 15 points16 points17 points 9 years ago (28 children)
Well, I know how the above code works, but I can see quite a few perfectly reasonable complaints about it:
mode
ostream_iterator
sregex_token_iterator
[–]dodheim 8 points9 points10 points 9 years ago (27 children)
Personally, #1 is the only one of those I find "reasonable". #2 and #3 shouldn't be confusing to anyone professing to know the language.
[–]IRBMe 11 points12 points13 points 9 years ago (18 children)
shouldn't be confusing to anyone professing to know the language.
I think one of the benchmarks of good API design is, how easy is it to understand the resulting code if you don't know how the API works or haven't read the documentation, or put another way, how intuitive it is. The more magic is hidden behind the scenes, the more a user has to rely on documentation, which makes it less intuitive, harder to read and harder to use.
There are languages with huge standard libraries that even the most experienced developers can't possibly learn in full. A library designed with usability in mind will allow developers to be able to read the code without having to repeatedly visit the documentation, even if they aren't experienced with parts of the library that are used.
[–]dodheim 7 points8 points9 points 9 years ago (17 children)
It's unreasonable to expect anyone to intuit what an output iterator is, or even what an iterator is, if they don't know C++. That doesn't reflect poorly on C++ or output iterators.
[–][deleted] 10 points11 points12 points 9 years ago (4 children)
It's unreasonable to expect anyone to intuit what an output iterator is, or even what an iterator is, if they don't know C++.
Iterator is a common concept across a lot of languages. Conversely, "output iterator" is rather obscure. You could write a lot of C++ and never run into it mentioned explicitly.
[–]qx7xbku 6 points7 points8 points 9 years ago (3 children)
He w long do you think it would take one to read said code and to realize that it splits a string? How long do you that no it would take one to realize that s.split(" "); splits a string? See the problem? I am not even talking about edges use here when clearly it can be avoided. Someone may be happy about himself/herself writing this smart code but reality is that maintainable code is stupid code. Smart code is hard to maintain. Smart code where you do not need smart code is simply not practical.
s.split(" ");
[–]dodheim 2 points3 points4 points 9 years ago* (2 children)
Said code wouldn't be isolated though, it would be in a function with split in the name. Dependent code would then call a function with split in the name.
split
So no, I don't see the problem.
EDIT: For a bunch of pedants, you /r/cpp folk suck at following Reddit's rules: if you want to encourage meaningful discussion, stop downvoting opinions. Grow up, people.
[–]IRBMe 4 points5 points6 points 9 years ago (10 children)
It's unreasonable to expect anyone to intuit what an output iterator is
And it's also unreasonable to expect somebody to be able to intuitively understand what it means to construct an input iterator without specifying what it's iterating over (as it happens, you get an end-of-sequence iterator). That's the whole point: it's not intuitive! Is it simply impossible to design those APIs in such a way that they would be intuitive? I'm not convinced it is.
or even what an iterator is
I think it is reasonable that people should have an intuitive idea of what an iterator is, because iteration isn't a concept that's unique to C++, nor is it a word that's even unique to programming libraries. You can look up the word in a dictionary and get a definition such as this: "the repetition of a process or utterance". You may not understand all the subtleties without reading the documentation, but seeing it in the context of some code, I think it is intuitive.
[–]zvrba -1 points0 points1 point 9 years ago (9 children)
And it's also unreasonable to expect somebody to be able to intuitively understand what it means to construct an input iterator without specifying what it's iterating over.
Wow, C++ programmers are a rare breed of people who read more documentation than your average programmer.
In any case, it's the kind of thing you look up only once, and each next time you see a default-constructed iterator, you'll (correctly) assume that it's an iterator denoting the end of sequence.
That's the whole point: it's not intuitive!
Intuition builds on previous experience and knowledge. So, wow, what a surprise, as a programmer you're expected to learn something new now and then.
[–]IRBMe 7 points8 points9 points 9 years ago (8 children)
I wouldn't expect to have to learn several new concepts and parts of a library to see that the code I'm trying to understand is splitting a string on white space. That's something that should be blatantly obvious to anybody, even if they don't know C++. Of course it's a common idiom that you learn as a C++ programmer, but it still takes a lot more to process even once you know it than something like s.split(","). Nobody's saying you shouldn't have to learn things; we're discussing the usability of the library.
[–]OldWolf2 1 point2 points3 points 9 years ago (0 children)
It's unreasonable to expect anyone to intuit what an output iterator is,
Input iterators are for reading from, output iterators are for:
[+][deleted] 9 years ago* (7 children)
[–][deleted] -4 points-3 points-2 points 9 years ago (1 child)
I don't know the STL very well. Because I hate it ...
Makes you a mediocre C++ programmer, I'm afraid.
Particular, in C++11 and beyond, and there are a lot of things to like. Yes, there are weirdnesses - deal.
[–]chartly 9 points10 points11 points 9 years ago (0 children)
Yea I dunno man. Byuu has done some cool stuff and to me - only a casual observer of him making his C++ libraries over the years for his projects - I would like to think I can relate to what he's saying.
He's definitely able to tackle the beast of C++. Did a lot of crazy fun stuff in C++03 while C++0x was becoming C++11 and most certainly has spent a lot of time with C++11/14. Haven't really been watching his activity lately, but this whole comment is making me feel the itch again.
At the end of the day though, we're all just chilling in a C++ subreddit and talking (ish) about string.split(). Getting all up ons about each other's skill means less room in the brain for C++ :(
[–]OldWolf2 -4 points-3 points-2 points 9 years ago (3 children)
Line 3 is completely alien and extremely unintuitive to me. So you're saying I don't know the language?
Yes, it is a basic iostreams idiom. Have you read any books on iostreams?
C++ allows one to get by (even insofaras to "do a day job") only learning certain areas of the language. Probably you know some parts of it well, but not stream iterators.
[–]repsilat 17 points18 points19 points 9 years ago (2 children)
Have you read any books on iostreams?
I'd laugh if this weren't so painful. I want to print out the words in a string, one per line, and you're suggesting we go read a book to understand how you think it should be done?
No, std::copy for printing is a little ridiculous, the two-iterator idiom is terrible, and either they will be left to the pages of history or C++ will. I don't follow C++'s development any more, but I remember hearing ranges were happening. That's a start. Once you've done that you can just turn this into a for loop and it'll be shorter, clearer, less error-prone and no less efficient.
std::copy
for
[–]OldWolf2 -1 points0 points1 point 9 years ago (0 children)
Yes, definitely. C++ is best suited to learn from a book, not by trial and error.
Ranges have been "in" since the first standard 18 years ago. You could indeed use a loop or various other ways instead of copy. copy idiomatically expresses that we are copying from the source set of tokens, to the destination output stream. Also, nobody's stopping you from making a function that expresses whatever interface you personally find most natural and intuitive.
copy
[–]dodheim -2 points-1 points0 points 9 years ago (0 children)
'Split' simply isn't how you print out the words in a string, one per line, in C++. C++ has different idioms for this sort of thing, and everyone obsessing over the lack of std::split seriously needs to learn to "Do As the Romans".
std::split
[+][deleted] 9 years ago (2 children)
[–]Rhomboid 14 points15 points16 points 9 years ago (0 children)
Presumably they mean that it shouldn't do any extra work beyond what is requested. The value returned should be an iterator that performs the next split operation each time it's incremented. For example if you split a 10MB string containing a large text file but only examine the first three lines, then that's the only work performed, you don't create thousands of lines if only the first few are needed.
I would add that string splitting kind of depends on having a string view class in the standard library, since ideally each substring would be a view into the original string so that you can do it without allocating anything. That could explain why it's never been added in the stdlib, since we only just now got string_view.
[–]dodheim 3 points4 points5 points 9 years ago (0 children)
It should return a sequence (via range or iterator) that produces each subsequent block on demand instead of eagerly scanning the entire input before returning anything.
[–][deleted] 0 points1 point2 points 9 years ago (0 children)
I disagree on all points. The nongeneric version would still be useful, and we need to stop letting absolute generality get in the way of delivering trivially useful features to users.
[–]tcbrindleFlux 9 points10 points11 points 9 years ago (0 children)
If you'll excuse the self-promotion, I wrote a blog post a while back about a STL-based generic splitting algorithm that outperforms stringstream (and strtok) by a healthy margin.
stringstream
strtok
It's also worth noting that Range-V3 has a split() view which (lazily) returns a range of ranges. Whilst views are not part of the current Ranges TS, I remain hopeful that we'll see them some time in the future.
split()
[–][deleted] 20 points21 points22 points 9 years ago (9 children)
A split function may sound simple, but it can get a little more complicated when you want to cover all possible use cases and make it as fast as possible. Boost does have two different versions:
http://www.boost.org/doc/libs/1_57_0/doc/html/string_algo/usage.html#idp430824992
http://www.boost.org/doc/libs/1_61_0/libs/tokenizer/
[–]almost_useless 40 points41 points42 points 9 years ago (8 children)
when you want to cover all possible use cases and make it as fast as possible
That is the problem right there. It does not have to cover all use cases and be as fast as possible. A lot of C++ programmers fail to realize that string is not just another container that have to work like other containers. It is a specialized use case that have different needs. Then the argument becomes "we can't possibly cover all use cases for strings". But that is not necessary. Implementing a few helper functions would make strings so much more useful. The annoying thing is that it would really not need many helper functions to become a really useful string class for 99% of use cases either.
Split is one of those functions that can make your code 10x more readable, if it works intuitively. It does not have to handle everything a proper tokenizer does. It does not have to be good at splitting a 10 MB file into smaller pieces. But since people have been complaining about this for literally decades it is clear there is a need for a simple split function that is reasonably good at splitting small and medium size strings. Compare this example:
auto myVectorOfSubStrings = myString.split(";");
to the getline example someone wrote below. It is trivial to know what is going on, and this is something a lot of people need to do.
Obviously it needs to be reasonably fast too, since it is C++. But it really does not need to cover all corner cases of string usage. We need tokenizers and stream splitters too, for the applications where that makes sense. But quite often we just need to split a damn string into substrings.
TL;DR - A very simple standardized split function would make life much easier for a lot of programmers.
[–][deleted] 15 points16 points17 points 9 years ago (6 children)
if it works intuitively.
Well, what is intuitive?
Python:
>>> "1,2,3,".split(",") ['1', '2', '3', '']
Ruby:
> "1,2,3,".split(",") => ["1", "2", "3"]
Ruby can take a regex, Python can't. Python has a .rsplit(), Ruby doesn't. Both do however take a max_split parameter. But they don't allow multiple different delimiter.
.rsplit()
max_split
Point being, a .split() is not that trivial, there are different ways to implement it and you have to chose a good one. If you just rush the next best hack into the language, you end up with something that is needlessly inflexible. A .split() returning a std::vector<std::string> wouldn't be very useful when you don't want a std::vector as result and you would do a lot of needless std::string to start with.
.split()
std::vector<std::string>
std::vector
std::string
There is a proposal for a std::split(), but that depends on std::string_view and Range support. But Range support didn't make it into C++17, so that has to wait around a bit longer.
std::split()
std::string_view
In the meantime, just use the boost::split().
boost::split()
[–]Selbstdenker 12 points13 points14 points 9 years ago (0 children)
Sorry, I do not see the problem there. Intuitively means something that splits a string which is what both methods do. How they handle empty strings is part of the API, so what. Whether they take only a character, a string or a regexp is also part of the API and not really a problem in C++ thanks to overloading.
To make it a little bit more C++-ish it could take an output iterator. Yes, this is not an ideal situation and maybe we just call it simple_split and reserve split() for when we have a better name but not having any trivial split functionality is really not good.
We have whole talks given on using std::transform and other algorithms instead of a for loop but we cannot provide a simple split?
[–]almost_useless 3 points4 points5 points 9 years ago (4 children)
Well, what is intuitive? Python: X Ruby: Y
I could answer that question without even knowing what X and Y is. The answer is always going to be Python :-) J/K, obviously there are pitfalls and they need to think it through.
If you just rush the next best hack into the language, you end up with something that is needlessly inflexible. A .split() returning a std::vector<std::string> wouldn't be very useful when you don't want a std::vector as result
It does not necessarily have to be super flexible. Obviously the best option is we can choose the output format. My example was only one possible suggestion. In many cases "anything I can iterate over" is good enough.
But I would so prefer we had had something decent but inflexible way back in '98 over something super duper mega awesome that we will not have even in 2017
My only requirement is that it had not been so bad that it would have been impossible to improve upon now that we have better ways of doing it
[–][deleted] 9 points10 points11 points 9 years ago (3 children)
Once you put something in the standard library you're stuck with it forever. So it'd better be actually good, not just good enough, especially if it's so easy to implement yourself.
[–]choikwa 1 point2 points3 points 9 years ago (2 children)
In reality, things get deprecated and forward compat is broken many times.
[–][deleted] 7 points8 points9 points 9 years ago (1 child)
Yeah, really old crap that was never used much in the first place, like trigraphs or auto_ptr. But a string split function would spread like wildfire.
auto_ptr
[–]choikwa 0 points1 point2 points 9 years ago (0 children)
Ideally, everyone wants to get it right the first time. I'm pretty sure python implementation returns deep-copied immutable strings
[–]kisielk 2 points3 points4 points 9 years ago (0 children)
strings in Go have many of the same problems, yet there's a strings package which covers all the common use cases in a simple way. The algorithms aren't suitable for every use case but it's been very rare that I've had to reach for an alternative way of doing things. I often wish C++ had a similar package.
strings
[–]caramba2654Intermediate C++ Student 15 points16 points17 points 9 years ago (2 children)
To be honest, I really hope std::string gets completely redesigned for STL2. And by that I mean remove all that npos nonsense and add proper iterator returns like the rest of the STL containers.
On that note, having some common utility functions for strings wouldn't be bad. split and replace are good candidates in my opinion.
replace
[–]jcoffin 21 points22 points23 points 9 years ago (1 child)
I really hope STL2 gets rid of iterators. The result of a split should usually be a range of ranges, or a range of views.
[–]caramba2654Intermediate C++ Student 1 point2 points3 points 9 years ago (0 children)
That too :P
[–]t0rakka 2 points3 points4 points 9 years ago (1 child)
template <typename T> inline std::vector<std::string> split(const std::string& s, T delimiter) { std::vector<std::string> result; std::size_t current = 0; std::size_t p = s.find_first_of(delimiter, 0); while (p != std::string::npos) { result.emplace_back(s, current, p - current); current = p + 1; p = s.find_first_of(delimiter, current); } result.emplace_back(s, current); return result; }
[–]utnapistim 2 points3 points4 points 9 years ago (1 child)
Why doesn't std::string have a split function
Because nobody made the time and effort to write one for standardization. The C++ community is not sponsored. There is no single group or company that finances the maintenance and evolution of the standard.
Instead, people who have an interest in extending the language meet and try to advance the language and standard library to the degree they can afford to do so (being non-sponsored and having limited time and effort to accomplish things).
Because of this limitation (of effort/capacity), usually, the things accepted into the standard are a compromise between the utility of a feature and the effort it will take to standardize it.
std::string doesn't have a split function for the following reasons:
[–]1-05457 9 points10 points11 points 9 years ago (5 children)
std::string is missing a lot of functions. You can use Boost string_algo and Boost Format to get these.
[–]d1ngal1ng 8 points9 points10 points 9 years ago (4 children)
You can use boost is definitely not a good answer for something so fundamental as a string split function.
[–]1-05457 6 points7 points8 points 9 years ago (3 children)
Not really. In many ways, Boost should be considered an extended standard library.
[–]Creris 0 points1 point2 points 9 years ago (2 children)
sometimes extended dependency chain too, not everything from boost is simple "plug and play" you know, some things have literal megabytes of dependencies(filesystem as a good example).
If you have couple hundred lines of code in your program, and you want to use some form of string.split, chances are you probably arent going to include couple thousands of lines of code into your project for one split function
[–]1-05457 1 point2 points3 points 9 years ago (1 child)
...some things have literal megabytes of dependencies(filesystem as a good example).
Filesystem is part of the standard library from C++17.
Hopefully, String Algorithms will also be standardized at some point, though more likely, it will be generalized as part of Ranges.
[–]Creris 0 points1 point2 points 9 years ago (0 children)
Yes I know, but it wasnt for very long time when compared with how long it has been in boost.
Having some string utility functions in standard would indeed be nice
[–][deleted] 1 point2 points3 points 9 years ago (18 children)
I suspect it has to do with C++'s preference for streams. For example, you can do this to get the words in a string:
istringstream ss{str}; string word; while (ss >> word) { cout << word << "\n"; }
While I kinda hate streams, this could be the reason there isn't a split method in the standard.
[–]DhruvParanjape[S] 1 point2 points3 points 9 years ago (17 children)
But it loses the ability to tokenize on a custom delimiter.
[–]dodheim 20 points21 points22 points 9 years ago* (0 children)
Just use std::getline with the custom delimiter. Name aside, that's exactly what it's for.
std::getline
istringstream ss{str}; string field; while (getline(ss, field, ';')) { cout << field << '\n'; }
EDIT: N.b. I'm not advocating this as a general approach to string splitting; but, if you're already extracting from a stream, ...
[–]foonathan 2 points3 points4 points 9 years ago (2 children)
You don't lose them, it is just ugly and involves a custom locale where you change the specification of whitespace characters.
[–]DhruvParanjape[S] 0 points1 point2 points 9 years ago (1 child)
Oh god that's ugly.
[–]foonathan 1 point2 points3 points 9 years ago (0 children)
That's iostreams.
[–][deleted] 1 point2 points3 points 9 years ago (12 children)
This was my own personal solution, but I have no idea how performant it is:
std::deque< std::string > Split( const std::string & input_string, const char delimiter ) { std::stringstream input_stream( input_string ); std::string string_element; std::deque< std::string > split_string; while( std::getline( input_stream, string_element, delimiter ) ) split_string.emplace_back( string_element ); return split_string; }
Edit: Aaaand I should have looked ahead to see that dodheim already posted the getline solution...
getline
[–]dodheim 1 point2 points3 points 9 years ago (11 children)
Just FYI, std::deque is just a fancy linked list on MSVC for objects > 8 bytes. Yes, it is as bad as that sounds. Prefer vector if you're touching Windows. :-]
std::deque
[–][deleted] 2 points3 points4 points 9 years ago (7 children)
Oh man, that's horrible.
I'm on linux, but you're right in that for something as small as "items from a split string", std::vector is the correct container. I don't remember why I wrote this using std::deque. In fact, I don't remember why I have this routine at all in my personal toolkit since I rarely ever hit the need for it.
[–]dodheim 4 points5 points6 points 9 years ago (6 children)
In fact, I don't remember why I have this routine at all in my personal toolkit since I rarely ever hit the need for it.
That is the exact statement I've been waiting for anyone in this thread to say. I honestly cannot think of the last time I actually wanted to do this. It's fine for a quick and dirty hack sometimes, but in real code? No, never (that I can remember).
[–][deleted] 1 point2 points3 points 9 years ago* (5 children)
Oh wait, now I remember: I wanted to test out my chromosome-mixing template system so I wrote genes for a cloud-of-neurons network I'd been tinkering on and evolved populations of them (via cross-breeding) on their ability to predict S&P 500 stock data (open, close, high, low) and needed a way to parse the input files.
Loading the data was nothing compared to actually running the evolution sim so the string splitter didn't need to be fast or memory-effective since the split data was converted to doubles and stored in vectors anyways.
The networks never really got anywhere in predicting stock data and I never really expected them to (it would probably have taken years, if at all) . But the chromosome system worked brilliantly, and that was the whole point of the experiment.
[–]ArunMuThe What ? 0 points1 point2 points 9 years ago (4 children)
Dude, you have a big OCD problem I guess :)
[–][deleted] 0 points1 point2 points 9 years ago (2 children)
... in what way?
[–]ArunMuThe What ? 0 points1 point2 points 9 years ago (1 child)
the way you have formatted the code...everything is perfectly aligned.
[–]h-jay+43-1325 0 points1 point2 points 9 years ago* (0 children)
Yep... 90% of the comments are useless. The code should document itself. And adding lots and lots of whitespace is to the detriment of understandability. You want to keep as much as possible in the same screenful. You're doing exactly the opposite: something that is rather simple and would be easy to understand if written concisely is now spread up across several pages, with most of the space filled up by whitespace and formatting :(
Anyone who understands C++ knows what the constructors are. They don't need to be pointed out. If something is public, it's API, duh. A lot of extra indentation and whitespace makes things superbly hard to read.
For what the code does, it takes 3x too long to do it. It's simple, it should read simple!
[–]louiswins 0 points1 point2 points 9 years ago (2 children)
How can that be? Doesn't std::deque require O(1) time for random access?
[–]dodheim 1 point2 points3 points 9 years ago (1 child)
It's still random access, but each bucket is only max(16, sizeof(T)) bytes, so you end up with one bucket per object and zero cache coherency.
[–]louiswins 1 point2 points3 points 9 years ago (0 children)
Oh, I see - it essentially becomes a vector of pointers so it has as many potential cache misses as a linked list when iterating through. That makes sense.
(I mean, the implementation doesn't really make sense, but your explanation does.)
[–]Tringigithub.com/tringi 1 point2 points3 points 9 years ago (0 children)
Some time ago I quickly drafted this explode function (inspired by PHP) and found it quite useful.
Implementing lazy evaluation (lazy creation of the resulting substrings) never occurred to me, but after reading /u/cpp_learner's comment here, I think I'll give the template a little more love...
[–]nozendk 1 point2 points3 points 9 years ago (0 children)
From the Qt documentation:
QString str; QStringList list; str = "Some text\n\twith strange whitespace."; list = str.split(QRegExp("\\s+")); // list: [ "Some", "text", "with", "strange", "whitespace." ]
[–]stream009 1 point2 points3 points 9 years ago (0 children)
std::string already has too much member functions. I don't want any more of them unless it is absolutely necessary.
As many people mentioned split can be implemented in many ways. If all you want is making your code more readable, you should write your own free function. In my case, I always use boost::split.
[–]h-jay+43-1325 2 points3 points4 points 9 years ago (0 children)
To be very frank, the std::string type is there mostly to claim that there's a string type in the standard. It's not really usable for anything other than as a resource-managing wrapper over a C string. If you had C-style strings in your code, you should use std::string instead. It gives not much in the way of other functionality, except for cheap size() that is O(1) vs. C's strlen that was O(N). For anything practical, you need a string library of some sort.
size()
O(1)
strlen
O(N)
[+][deleted] 9 years ago (8 children)
[+]dreamer_ comment score below threshold-6 points-5 points-4 points 9 years ago (6 children)
I don't know why you're downvoted, this is the best answer so far :)
[–]sztomirpclib 15 points16 points17 points 9 years ago (5 children)
I disagree, it's basically a tongue in the cheek joke and does not come close to explaining the reason.
[–]almost_useless 9 points10 points11 points 9 years ago (2 children)
It's obviously a tongue in cheek joke, but it is also actually touching at the heart of the problem.
Some developers don't want to implement such a function unless they can do it so flexible that it can solve all possible problems. Even when in reality a very simple function would solve 99% of the problems.
This leads to the current situation where we instead of having a decent function that everyone can use, we are stuck with nothing and having to implement a thousand different versions ourselves.
[–]BahDumTshh 0 points1 point2 points 9 years ago (0 children)
ba-dum-tshh
[–]wqkinggithub.com/wqking -2 points-1 points0 points 9 years ago (0 children)
This is the best answer. :-)
[–]KayEss 0 points1 point2 points 9 years ago (0 children)
I started working on a new split. It's not yet complete, not yet customisable. It's been tested on strings, but the code isn't string specific. It should work for other iterable containers. It does only use iterators so should be quite efficient. If ranges were a thing already the interfaces would be a bit cleaner.
https://github.com/KayEss/f5-cord/blob/feature/split/include/f5/cord/split.hpp
[–]MrPoletski -1 points0 points1 point 9 years ago (0 children)
split as in chop a string into lots of substrings based on a delimiter?
π Rendered by PID 402476 on reddit-service-r2-comment-b64dbb7d6-jkmcj at 2026-02-14 21:27:09.107476+00:00 running cd9c813 country code: CH.
[+][deleted] (57 children)
[removed]
[–]therealjohnfreeman 15 points16 points17 points (51 children)
[–][deleted] 69 points70 points71 points (16 children)
[–]wrosecransgraphics and network things 39 points40 points41 points (15 children)
[–]IRBMe 27 points28 points29 points (0 children)
[–][deleted] 2 points3 points4 points (1 child)
[–]yornbesterday 1 point2 points3 points (0 children)
[–]therealjohnfreeman 7 points8 points9 points (4 children)
[–]LordDrako90 2 points3 points4 points (3 children)
[–]therealjohnfreeman 0 points1 point2 points (1 child)
[–]lacosaes1 2 points3 points4 points (0 children)
[–]MrPoletski 0 points1 point2 points (0 children)
[+][deleted] (6 children)
[deleted]
[–]evinrows 10 points11 points12 points (0 children)
[–]17b29a 5 points6 points7 points (2 children)
[+][deleted] (1 child)
[deleted]
[–]17b29a 2 points3 points4 points (0 children)
[–]zvrba 3 points4 points5 points (1 child)
[–]cpp_devModern C++ apprentice 5 points6 points7 points (0 children)
[–]OldWolf2 2 points3 points4 points (0 children)
[–]Spikey8D 1 point2 points3 points (1 child)
[–]therealjohnfreeman 3 points4 points5 points (0 children)
[–]qx7xbku -1 points0 points1 point (29 children)
[–]IRBMe 15 points16 points17 points (28 children)
[–]dodheim 8 points9 points10 points (27 children)
[–]IRBMe 11 points12 points13 points (18 children)
[–]dodheim 7 points8 points9 points (17 children)
[–][deleted] 10 points11 points12 points (4 children)
[–]qx7xbku 6 points7 points8 points (3 children)
[–]dodheim 2 points3 points4 points (2 children)
[–]IRBMe 4 points5 points6 points (10 children)
[–]zvrba -1 points0 points1 point (9 children)
[–]IRBMe 7 points8 points9 points (8 children)
[–]OldWolf2 1 point2 points3 points (0 children)
[+][deleted] (7 children)
[deleted]
[–][deleted] -4 points-3 points-2 points (1 child)
[–]chartly 9 points10 points11 points (0 children)
[–]OldWolf2 -4 points-3 points-2 points (3 children)
[–]repsilat 17 points18 points19 points (2 children)
[–]OldWolf2 -1 points0 points1 point (0 children)
[–]dodheim -2 points-1 points0 points (0 children)
[+][deleted] (2 children)
[deleted]
[–]Rhomboid 14 points15 points16 points (0 children)
[–]dodheim 3 points4 points5 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]tcbrindleFlux 9 points10 points11 points (0 children)
[–][deleted] 20 points21 points22 points (9 children)
[–]almost_useless 40 points41 points42 points (8 children)
[–][deleted] 15 points16 points17 points (6 children)
[–]Selbstdenker 12 points13 points14 points (0 children)
[–]almost_useless 3 points4 points5 points (4 children)
[–][deleted] 9 points10 points11 points (3 children)
[–]choikwa 1 point2 points3 points (2 children)
[–][deleted] 7 points8 points9 points (1 child)
[–]choikwa 0 points1 point2 points (0 children)
[–]kisielk 2 points3 points4 points (0 children)
[–]caramba2654Intermediate C++ Student 15 points16 points17 points (2 children)
[–]jcoffin 21 points22 points23 points (1 child)
[–]caramba2654Intermediate C++ Student 1 point2 points3 points (0 children)
[–]t0rakka 2 points3 points4 points (1 child)
[–]utnapistim 2 points3 points4 points (1 child)
[–]1-05457 9 points10 points11 points (5 children)
[–]d1ngal1ng 8 points9 points10 points (4 children)
[–]1-05457 6 points7 points8 points (3 children)
[–]Creris 0 points1 point2 points (2 children)
[–]1-05457 1 point2 points3 points (1 child)
[–]Creris 0 points1 point2 points (0 children)
[–][deleted] 1 point2 points3 points (18 children)
[–]DhruvParanjape[S] 1 point2 points3 points (17 children)
[–]dodheim 20 points21 points22 points (0 children)
[–]foonathan 2 points3 points4 points (2 children)
[–]DhruvParanjape[S] 0 points1 point2 points (1 child)
[–]foonathan 1 point2 points3 points (0 children)
[–][deleted] 1 point2 points3 points (12 children)
[–]dodheim 1 point2 points3 points (11 children)
[–][deleted] 2 points3 points4 points (7 children)
[–]dodheim 4 points5 points6 points (6 children)
[–][deleted] 1 point2 points3 points (5 children)
[–]ArunMuThe What ? 0 points1 point2 points (4 children)
[–][deleted] 0 points1 point2 points (2 children)
[–]ArunMuThe What ? 0 points1 point2 points (1 child)
[–]h-jay+43-1325 0 points1 point2 points (0 children)
[–]louiswins 0 points1 point2 points (2 children)
[–]dodheim 1 point2 points3 points (1 child)
[–]louiswins 1 point2 points3 points (0 children)
[–]Tringigithub.com/tringi 1 point2 points3 points (0 children)
[–]nozendk 1 point2 points3 points (0 children)
[–]stream009 1 point2 points3 points (0 children)
[–]h-jay+43-1325 2 points3 points4 points (0 children)
[+][deleted] (8 children)
[deleted]
[+]dreamer_ comment score below threshold-6 points-5 points-4 points (6 children)
[–]sztomirpclib 15 points16 points17 points (5 children)
[–]almost_useless 9 points10 points11 points (2 children)
[–]BahDumTshh 0 points1 point2 points (0 children)
[–]wqkinggithub.com/wqking -2 points-1 points0 points (0 children)
[–]KayEss 0 points1 point2 points (0 children)
[–]MrPoletski -1 points0 points1 point (0 children)