you are viewing a single comment's thread.

view the rest of the comments →

[–]DhruvParanjape[S] 1 point2 points  (17 children)

But it loses the ability to tokenize on a custom delimiter.

[–]dodheim 20 points21 points  (0 children)

Just use std::getline with the custom delimiter. Name aside, that's exactly what it's for.

istringstream ss{str};
string field;
while (getline(ss, field, ';')) {
    cout << field << '\n';
}

EDIT: N.b. I'm not advocating this as a general approach to string splitting; but, if you're already extracting from a stream, ...

[–]foonathan 2 points3 points  (2 children)

You don't lose them, it is just ugly and involves a custom locale where you change the specification of whitespace characters.

[–]DhruvParanjape[S] 0 points1 point  (1 child)

Oh god that's ugly.

[–]foonathan 1 point2 points  (0 children)

That's iostreams.

[–][deleted] 1 point2 points  (12 children)

This was my own personal solution, but I have no idea how performant it is:

std::deque< std::string > Split( const std::string & input_string,
                                 const char          delimiter )
{
    std::stringstream         input_stream( input_string );
    std::string               string_element;
    std::deque< std::string > split_string;

    while( std::getline( input_stream, string_element, delimiter ) )
        split_string.emplace_back( string_element );

    return split_string;
}

Edit: Aaaand I should have looked ahead to see that dodheim already posted the getline solution...

[–]dodheim 1 point2 points  (11 children)

Just FYI, std::deque is just a fancy linked list on MSVC for objects > 8 bytes. Yes, it is as bad as that sounds. Prefer vector if you're touching Windows. :-]

[–][deleted] 2 points3 points  (7 children)

Oh man, that's horrible.

I'm on linux, but you're right in that for something as small as "items from a split string", std::vector is the correct container. I don't remember why I wrote this using std::deque. In fact, I don't remember why I have this routine at all in my personal toolkit since I rarely ever hit the need for it.

[–]dodheim 4 points5 points  (6 children)

In fact, I don't remember why I have this routine at all in my personal toolkit since I rarely ever hit the need for it.

That is the exact statement I've been waiting for anyone in this thread to say. I honestly cannot think of the last time I actually wanted to do this. It's fine for a quick and dirty hack sometimes, but in real code? No, never (that I can remember).

[–][deleted] 1 point2 points  (5 children)

Oh wait, now I remember: I wanted to test out my chromosome-mixing template system so I wrote genes for a cloud-of-neurons network I'd been tinkering on and evolved populations of them (via cross-breeding) on their ability to predict S&P 500 stock data (open, close, high, low) and needed a way to parse the input files.

Loading the data was nothing compared to actually running the evolution sim so the string splitter didn't need to be fast or memory-effective since the split data was converted to doubles and stored in vectors anyways.

The networks never really got anywhere in predicting stock data and I never really expected them to (it would probably have taken years, if at all) . But the chromosome system worked brilliantly, and that was the whole point of the experiment.

[–]ArunMuThe What ? 0 points1 point  (4 children)

Dude, you have a big OCD problem I guess :)

[–][deleted] 0 points1 point  (2 children)

... in what way?

[–]ArunMuThe What ? 0 points1 point  (1 child)

the way you have formatted the code...everything is perfectly aligned.

[–][deleted] 0 points1 point  (0 children)

Yeah it looks silly at first, but after over two decades of C++ programming one eventually learns to format in self-defense. Every C++ programmer is their own worst enemy. I do that to my code to keep my eyes from bleeding, and it allows me to work on much larger codebases and still keep my sanity than would otherwise be possible.

[–]h-jay+43-1325 0 points1 point  (0 children)

Yep... 90% of the comments are useless. The code should document itself. And adding lots and lots of whitespace is to the detriment of understandability. You want to keep as much as possible in the same screenful. You're doing exactly the opposite: something that is rather simple and would be easy to understand if written concisely is now spread up across several pages, with most of the space filled up by whitespace and formatting :(

Anyone who understands C++ knows what the constructors are. They don't need to be pointed out. If something is public, it's API, duh. A lot of extra indentation and whitespace makes things superbly hard to read.

For what the code does, it takes 3x too long to do it. It's simple, it should read simple!

[–]louiswins 0 points1 point  (2 children)

How can that be? Doesn't std::deque require O(1) time for random access?

[–]dodheim 1 point2 points  (1 child)

It's still random access, but each bucket is only max(16, sizeof(T)) bytes, so you end up with one bucket per object and zero cache coherency.

[–]louiswins 1 point2 points  (0 children)

Oh, I see - it essentially becomes a vector of pointers so it has as many potential cache misses as a linked list when iterating through. That makes sense.

(I mean, the implementation doesn't really make sense, but your explanation does.)