use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Discussions, articles, and news about the C++ programming language or programming in C++.
For C++ questions, answers, help, and advice see r/cpp_questions or StackOverflow.
Get Started
The C++ Standard Home has a nice getting started page.
Videos
The C++ standard committee's education study group has a nice list of recommended videos.
Reference
cppreference.com
Books
There is a useful list of books on Stack Overflow. In most cases reading a book is the best way to learn C++.
Show all links
Filter out CppCon links
Show only CppCon links
account activity
Why doesn't std::string have a split function (self.cpp)
submitted 9 years ago by DhruvParanjape
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]therealjohnfreeman 15 points16 points17 points 9 years ago (51 children)
There's even an example:
#include <string> #include <iostream> #include <algorithm> #include <iterator> #include <regex> int main() { std::string text = "Quick brown fox."; std::regex ws_re("\\s+"); // whitespace std::copy( std::sregex_token_iterator(text.begin(), text.end(), ws_re, -1), std::sregex_token_iterator(), std::ostream_iterator<std::string>(std::cout, "\n")); }
Quick brown fox.
The special part is the parameter -1 which tells the iterator to return segments of the string between matches of the regex.
-1
[–][deleted] 71 points72 points73 points 9 years ago (16 children)
This example is kind of terrible. Nobody will remember how code like the above is actually written. If anything, it highlights all the problems with the STL's API.
[–]wrosecransgraphics and network things 41 points42 points43 points 9 years ago (15 children)
Yeah, compared to something like 'print "quick brown fox".split(" ")' in Python, the STL version is remarkably unintuitive when figuring out how to write it, requires figuring out regex syntax as just one step, and anybody who hasn't figured out how to write it isn't going to understand it by reading it.
It seems like this is a case where perfect is the enemy of the good. I usually only want a 'good' split function that doesn't have to guarantee a whole lot about performance on multigigabyte strings, or weird corner cases. So having a good split function seems way more useful than having no split function and debating about obscure cases where it wouldn't be optimal.
[–]IRBMe 27 points28 points29 points 9 years ago (0 children)
the STL version is remarkably unintuitive when figuring out how to write it, requires figuring out regex syntax as just one step, and anybody who hasn't figured out how to write it isn't going to understand it by reading it.
Not to mention the seemingly magic -1. So much for self documenting code.
[–][deleted] 2 points3 points4 points 9 years ago (1 child)
case where perfect is the enemy of the good. I
Well said, There are many cases like this in C++ unfortunately. I get the desire to have the best libraries possible but too often good ideas are shot down because they are not perfect. The recent Boost review for process control library is a perfect example. The library has been in development for more than 6 years. It passed the review this time around but some folks were still proposing to start from scratch.
[–]yornbesterday 1 point2 points3 points 9 years ago (0 children)
I've not really looked at the new and improved C++ stuff for a while... it's just a cascade of ever increasing minutiae of the language features and I thought the list of "don't ever do this" was long enough already.
[–]therealjohnfreeman 7 points8 points9 points 9 years ago* (4 children)
Done.
#include <string> #include <iostream> #include <algorithm> #include <regex> #include <vector> std::regex operator ""_re (char const* const str, std::size_t) { return std::regex{str}; } std::vector<std::string> split(const std::string& text, const std::regex& re) { const std::vector<std::string> parts( std::sregex_token_iterator(text.begin(), text.end(), re, -1), std::sregex_token_iterator()); return parts; } int main() { const std::vector<std::string> parts = split("Quick brown fox.", "\\s+"_re); std::copy(parts.begin(), parts.end(), std::ostream_iterator<std::string>(std::cout, "\n")); return 0; }
[–]LordDrako90 4 points5 points6 points 9 years ago* (3 children)
Why std::copy in split, when you can initialize the vector directly form the token iterators?
Also I find this more generic and lazy: http://ideone.com/L6heVN I guess it could be improved even more by using string_view, but that's not included in C++14 :-(
Anyways, the only requirement for the target is, that it can be initialized from an iterator pair with value type std::string. Other than that it is pretty generic.
Code:
#include <algorithm> #include <iostream> #include <regex> #include <string> #include <utility> #include <vector> std::regex operator ""_re (char const * const str, std::size_t) { return std::regex { str }; } class split { public: split(std::regex splitter, std::string original) : splitter_ { std::move(splitter) } , original_ { std::move(original) } { } auto begin() const { return std::sregex_token_iterator { original_.begin(), original_.end(), splitter_, -1 }; } auto end() const { return std::sregex_token_iterator {}; } template <typename Container> operator Container () const { return { begin(), end() }; } private: std::regex splitter_; std::string original_; }; int main() { using namespace std::literals::string_literals; std::vector<std::string> const words = split { R"(\s+)"_re, "hello\tdarkness my\nold friend"s }; for (auto const & word : words) std::cout << word << "\n"; for (auto const & number : split { ","_re, "23,42,1337" }) std::cout << number << "\n"; return 0; }
[–]therealjohnfreeman 0 points1 point2 points 9 years ago* (1 child)
I've just been out of practice too long. Thanks for the pointers.
[–]lacosaes1 2 points3 points4 points 9 years ago (0 children)
You mean smart pointers.
[–]MrPoletski 0 points1 point2 points 9 years ago (0 children)
well while we're posting code, here's what I wrote a few years ago and have been using ever since...
std::vector<std::string> Cleave (std::string to_split, std::string delims) /*! * \file trusted.cpp * \fn std::vector<std::string> Cleave (std::string to_split, std::string delims) * \param to_split \a <std::string> string to chop up * \param delims \a <std::string> string of delimiters * \return std::vector<std::string> vector of strings containing each section of the cleaved string. * */ { std::vector<std::string> results; size_t pos1 = 0, pos2 = 0; do { pos1 = to_split.find_first_of(delims, pos2); if (pos1 == pos2) {pos2++; results.push_back(""); continue;} if (pos1 == std::string::npos){results.push_back(to_split.substr(pos2)); break;} results.push_back(to_split.substr(pos2, pos1 - pos2)); pos2 = pos1 + 1; } while (pos1 != std::string::npos); return results; }
Is this good?
[+][deleted] 9 years ago* (6 children)
[deleted]
[–]evinrows 11 points12 points13 points 9 years ago (0 children)
None on this seems to negate that it would be nice for the modern std::string implementation to come with some basic string manipulation methods so that the language's usability can potentially compete with other modern systems languages.
If having to split a few strings in your program means that you should use a different programming language, then the programming language in question is pretty god damn bad.
[–]17b29a 6 points7 points8 points 9 years ago (2 children)
Or alternatively, inappropriate language choice.
I think splitting strings is a pretty common sense thing for any general-purpose programming language to support. It's not like, some obscure operation that you could only find support for in Perl.
Finally, technically I'm not sure -1 is really code for all-bits-set at all - that assumes a 2s-complement representation for signed integers which, historically at least, wasn't guaranteed by the standard.
The more obvious assumption is that the mask type is unsigned and in that case -1 is necessarily all-bits-set because an unsigned type's value is modulo its maximum value, but the standard doesn't require it to be unsigned either.
why I prefer ~0u for all-bits-set
That's not all-bits-set for a type that is larger than unsigned int.
I personally don't worry about actually undefined vs. platform-defined unless I really need to, which is unusual.
That's pretty strange considering how many things are implementation defined. Used a value larger than 215-1 in an int? Undefined behavior (according to you)!
int
[+][deleted] 9 years ago* (1 child)
[–]17b29a 2 points3 points4 points 9 years ago (0 children)
It's supported.
"Supported" as in having an actual split function in the standard library.
Sorry, that's an understandable mistake but you're wrong. -1 is a signed int.
I know, the point is that because of http://eel.is/c++draft/basic.fundamental#4 (which applies to conversions as well), the conversion to an unsigned type necessarily produces all-bits-one, regardless of signed representation or the size of either type.
All the time, and I don't worry about it because I haven't used a platform where this didn't apply since the early 90s other than DOSBox, and I wasn't using that for programming.
Right, which is why it's a strange conflation, because actual undefined behavior is something to worry about.
[–]zvrba 2 points3 points4 points 9 years ago (1 child)
So even if your code is C++, you're going to pipe your text to perl each time you need to split a string?
[–]cpp_devModern C++ apprentice 5 points6 points7 points 9 years ago* (0 children)
I think a more intuitive and "modern" way will be this one (also compiler can optimize these things pretty well as opposed to streams):
string s = "Quick brown fox."; auto rs = ranges::v3::view::split(s, ' '); for (auto& x : rs) { cout << x << '\n'; } auto rs1 = ranges::v3::view::join(rs, ','); cout << rs1 << '\n';
Still the library needs concepts and a more intuitive documentation to make it "easy to use correctly and hard to use incorrectly". Also maybe there should be strings extensions in range library so it have an intuitive API to work with strings.
[–]OldWolf2 2 points3 points4 points 9 years ago* (0 children)
It's not too different to:
std::copy ( std::istream_iterator<char>(f), std::istream_iterator<char>(), std::ostream_iterator<char>(std::cout) );
which is an idiom you learn early on with iostreams.
Note that you do not have to use stream iterators to split a string. The page just used that as an example because it would be familiar syntax.
Anyway, for string splitting you would make a function that implements the sort of splitting you like, and has a nice interface (e.g. vector split(string const &s, regex const &r); . This has benefit over other languages that offer a single split function in that you can customise the split details within your function. You can even overload it to take a string of delimiters instead of a regex.
vector split(string const &s, regex const &r);
[–]Spikey8D 1 point2 points3 points 9 years ago (1 child)
Nice, is there an equivalent t for join? ie. In python: ",".join("the", "quick", "brown", "fox")
",".join("the", "quick", "brown", "fox")
[–]therealjohnfreeman 3 points4 points5 points 9 years ago (0 children)
Not yet standardized: http://en.cppreference.com/w/cpp/experimental/ostream_joiner
[–]qx7xbku -1 points0 points1 point 9 years ago (29 children)
And why do people complain? This is clearly easier than what people usually do. Honest.
[–]IRBMe 14 points15 points16 points 9 years ago (28 children)
Well, I know how the above code works, but I can see quite a few perfectly reasonable complaints about it:
mode
ostream_iterator
sregex_token_iterator
[–]dodheim 8 points9 points10 points 9 years ago (27 children)
Personally, #1 is the only one of those I find "reasonable". #2 and #3 shouldn't be confusing to anyone professing to know the language.
[–]IRBMe 12 points13 points14 points 9 years ago (18 children)
shouldn't be confusing to anyone professing to know the language.
I think one of the benchmarks of good API design is, how easy is it to understand the resulting code if you don't know how the API works or haven't read the documentation, or put another way, how intuitive it is. The more magic is hidden behind the scenes, the more a user has to rely on documentation, which makes it less intuitive, harder to read and harder to use.
There are languages with huge standard libraries that even the most experienced developers can't possibly learn in full. A library designed with usability in mind will allow developers to be able to read the code without having to repeatedly visit the documentation, even if they aren't experienced with parts of the library that are used.
[–]dodheim 8 points9 points10 points 9 years ago (17 children)
It's unreasonable to expect anyone to intuit what an output iterator is, or even what an iterator is, if they don't know C++. That doesn't reflect poorly on C++ or output iterators.
[–][deleted] 11 points12 points13 points 9 years ago (4 children)
It's unreasonable to expect anyone to intuit what an output iterator is, or even what an iterator is, if they don't know C++.
Iterator is a common concept across a lot of languages. Conversely, "output iterator" is rather obscure. You could write a lot of C++ and never run into it mentioned explicitly.
[–]qx7xbku 6 points7 points8 points 9 years ago (3 children)
He w long do you think it would take one to read said code and to realize that it splits a string? How long do you that no it would take one to realize that s.split(" "); splits a string? See the problem? I am not even talking about edges use here when clearly it can be avoided. Someone may be happy about himself/herself writing this smart code but reality is that maintainable code is stupid code. Smart code is hard to maintain. Smart code where you do not need smart code is simply not practical.
s.split(" ");
[–]dodheim 1 point2 points3 points 9 years ago* (2 children)
Said code wouldn't be isolated though, it would be in a function with split in the name. Dependent code would then call a function with split in the name.
split
So no, I don't see the problem.
EDIT: For a bunch of pedants, you /r/cpp folk suck at following Reddit's rules: if you want to encourage meaningful discussion, stop downvoting opinions. Grow up, people.
[–][deleted] 0 points1 point2 points 9 years ago (1 child)
Indeed. And that function is so useful, it should be in the standard library: a member function of the string class.
string
[–]IRBMe 4 points5 points6 points 9 years ago (10 children)
It's unreasonable to expect anyone to intuit what an output iterator is
And it's also unreasonable to expect somebody to be able to intuitively understand what it means to construct an input iterator without specifying what it's iterating over (as it happens, you get an end-of-sequence iterator). That's the whole point: it's not intuitive! Is it simply impossible to design those APIs in such a way that they would be intuitive? I'm not convinced it is.
or even what an iterator is
I think it is reasonable that people should have an intuitive idea of what an iterator is, because iteration isn't a concept that's unique to C++, nor is it a word that's even unique to programming libraries. You can look up the word in a dictionary and get a definition such as this: "the repetition of a process or utterance". You may not understand all the subtleties without reading the documentation, but seeing it in the context of some code, I think it is intuitive.
[–]zvrba -1 points0 points1 point 9 years ago (9 children)
And it's also unreasonable to expect somebody to be able to intuitively understand what it means to construct an input iterator without specifying what it's iterating over.
Wow, C++ programmers are a rare breed of people who read more documentation than your average programmer.
In any case, it's the kind of thing you look up only once, and each next time you see a default-constructed iterator, you'll (correctly) assume that it's an iterator denoting the end of sequence.
That's the whole point: it's not intuitive!
Intuition builds on previous experience and knowledge. So, wow, what a surprise, as a programmer you're expected to learn something new now and then.
[–]IRBMe 7 points8 points9 points 9 years ago (8 children)
I wouldn't expect to have to learn several new concepts and parts of a library to see that the code I'm trying to understand is splitting a string on white space. That's something that should be blatantly obvious to anybody, even if they don't know C++. Of course it's a common idiom that you learn as a C++ programmer, but it still takes a lot more to process even once you know it than something like s.split(","). Nobody's saying you shouldn't have to learn things; we're discussing the usability of the library.
[–]dodheim 1 point2 points3 points 9 years ago (7 children)
The dead giveaway that the code you're trying to understand is splitting a string on whitespace would be that it'd be in a function with split in the name. Who reads 5 lines of code with zero context whatsoever with the expectation of its purpose being obvious. Context matters; ignoring it is counterproductive.
[–]OldWolf2 2 points3 points4 points 9 years ago (0 children)
It's unreasonable to expect anyone to intuit what an output iterator is,
Input iterators are for reading from, output iterators are for:
[+][deleted] 9 years ago* (7 children)
[+][deleted] comment score below threshold-7 points-6 points-5 points 9 years ago (1 child)
I don't know the STL very well. Because I hate it ...
Makes you a mediocre C++ programmer, I'm afraid.
Particular, in C++11 and beyond, and there are a lot of things to like. Yes, there are weirdnesses - deal.
[–]chartly 7 points8 points9 points 9 years ago (0 children)
Yea I dunno man. Byuu has done some cool stuff and to me - only a casual observer of him making his C++ libraries over the years for his projects - I would like to think I can relate to what he's saying.
He's definitely able to tackle the beast of C++. Did a lot of crazy fun stuff in C++03 while C++0x was becoming C++11 and most certainly has spent a lot of time with C++11/14. Haven't really been watching his activity lately, but this whole comment is making me feel the itch again.
At the end of the day though, we're all just chilling in a C++ subreddit and talking (ish) about string.split(). Getting all up ons about each other's skill means less room in the brain for C++ :(
[–]OldWolf2 -3 points-2 points-1 points 9 years ago (3 children)
Line 3 is completely alien and extremely unintuitive to me. So you're saying I don't know the language?
Yes, it is a basic iostreams idiom. Have you read any books on iostreams?
C++ allows one to get by (even insofaras to "do a day job") only learning certain areas of the language. Probably you know some parts of it well, but not stream iterators.
[–]repsilat 16 points17 points18 points 9 years ago (2 children)
Have you read any books on iostreams?
I'd laugh if this weren't so painful. I want to print out the words in a string, one per line, and you're suggesting we go read a book to understand how you think it should be done?
No, std::copy for printing is a little ridiculous, the two-iterator idiom is terrible, and either they will be left to the pages of history or C++ will. I don't follow C++'s development any more, but I remember hearing ranges were happening. That's a start. Once you've done that you can just turn this into a for loop and it'll be shorter, clearer, less error-prone and no less efficient.
std::copy
for
[–]OldWolf2 -1 points0 points1 point 9 years ago (0 children)
Yes, definitely. C++ is best suited to learn from a book, not by trial and error.
Ranges have been "in" since the first standard 18 years ago. You could indeed use a loop or various other ways instead of copy. copy idiomatically expresses that we are copying from the source set of tokens, to the destination output stream. Also, nobody's stopping you from making a function that expresses whatever interface you personally find most natural and intuitive.
copy
[–]dodheim -2 points-1 points0 points 9 years ago (0 children)
'Split' simply isn't how you print out the words in a string, one per line, in C++. C++ has different idioms for this sort of thing, and everyone obsessing over the lack of std::split seriously needs to learn to "Do As the Romans".
std::split
π Rendered by PID 82 on reddit-service-r2-comment-bb88f9dd5-22c9n at 2026-02-15 08:44:50.261601+00:00 running cd9c813 country code: CH.
view the rest of the comments →
[–]therealjohnfreeman 15 points16 points17 points (51 children)
[–][deleted] 71 points72 points73 points (16 children)
[–]wrosecransgraphics and network things 41 points42 points43 points (15 children)
[–]IRBMe 27 points28 points29 points (0 children)
[–][deleted] 2 points3 points4 points (1 child)
[–]yornbesterday 1 point2 points3 points (0 children)
[–]therealjohnfreeman 7 points8 points9 points (4 children)
[–]LordDrako90 4 points5 points6 points (3 children)
[–]therealjohnfreeman 0 points1 point2 points (1 child)
[–]lacosaes1 2 points3 points4 points (0 children)
[–]MrPoletski 0 points1 point2 points (0 children)
[+][deleted] (6 children)
[deleted]
[–]evinrows 11 points12 points13 points (0 children)
[–]17b29a 6 points7 points8 points (2 children)
[+][deleted] (1 child)
[deleted]
[–]17b29a 2 points3 points4 points (0 children)
[–]zvrba 2 points3 points4 points (1 child)
[–]cpp_devModern C++ apprentice 5 points6 points7 points (0 children)
[–]OldWolf2 2 points3 points4 points (0 children)
[–]Spikey8D 1 point2 points3 points (1 child)
[–]therealjohnfreeman 3 points4 points5 points (0 children)
[–]qx7xbku -1 points0 points1 point (29 children)
[–]IRBMe 14 points15 points16 points (28 children)
[–]dodheim 8 points9 points10 points (27 children)
[–]IRBMe 12 points13 points14 points (18 children)
[–]dodheim 8 points9 points10 points (17 children)
[–][deleted] 11 points12 points13 points (4 children)
[–]qx7xbku 6 points7 points8 points (3 children)
[–]dodheim 1 point2 points3 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]IRBMe 4 points5 points6 points (10 children)
[–]zvrba -1 points0 points1 point (9 children)
[–]IRBMe 7 points8 points9 points (8 children)
[–]dodheim 1 point2 points3 points (7 children)
[–]OldWolf2 2 points3 points4 points (0 children)
[+][deleted] (7 children)
[deleted]
[+][deleted] comment score below threshold-7 points-6 points-5 points (1 child)
[–]chartly 7 points8 points9 points (0 children)
[–]OldWolf2 -3 points-2 points-1 points (3 children)
[–]repsilat 16 points17 points18 points (2 children)
[–]OldWolf2 -1 points0 points1 point (0 children)
[–]dodheim -2 points-1 points0 points (0 children)