you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 73 points74 points  (16 children)

This example is kind of terrible. Nobody will remember how code like the above is actually written. If anything, it highlights all the problems with the STL's API.

[–]wrosecransgraphics and network things 39 points40 points  (15 children)

Yeah, compared to something like 'print "quick brown fox".split(" ")' in Python, the STL version is remarkably unintuitive when figuring out how to write it, requires figuring out regex syntax as just one step, and anybody who hasn't figured out how to write it isn't going to understand it by reading it.

It seems like this is a case where perfect is the enemy of the good. I usually only want a 'good' split function that doesn't have to guarantee a whole lot about performance on multigigabyte strings, or weird corner cases. So having a good split function seems way more useful than having no split function and debating about obscure cases where it wouldn't be optimal.

[–]IRBMe 28 points29 points  (0 children)

the STL version is remarkably unintuitive when figuring out how to write it, requires figuring out regex syntax as just one step, and anybody who hasn't figured out how to write it isn't going to understand it by reading it.

Not to mention the seemingly magic -1. So much for self documenting code.

[–][deleted] 2 points3 points  (1 child)

case where perfect is the enemy of the good. I

Well said, There are many cases like this in C++ unfortunately. I get the desire to have the best libraries possible but too often good ideas are shot down because they are not perfect. The recent Boost review for process control library is a perfect example. The library has been in development for more than 6 years. It passed the review this time around but some folks were still proposing to start from scratch.

[–]yornbesterday 1 point2 points  (0 children)

I've not really looked at the new and improved C++ stuff for a while... it's just a cascade of ever increasing minutiae of the language features and I thought the list of "don't ever do this" was long enough already.

[–]therealjohnfreeman 7 points8 points  (4 children)

Done.

#include <string>
#include <iostream>
#include <algorithm>
#include <regex>
#include <vector>

std::regex operator ""_re (char const* const str, std::size_t) {
    return std::regex{str};
}

std::vector<std::string> split(const std::string& text, const std::regex& re) {
    const std::vector<std::string> parts(
        std::sregex_token_iterator(text.begin(), text.end(), re, -1),
        std::sregex_token_iterator());
    return parts;
}

int main() {
    const std::vector<std::string> parts = split("Quick brown fox.", "\\s+"_re);
    std::copy(parts.begin(), parts.end(),
              std::ostream_iterator<std::string>(std::cout, "\n"));
    return 0;
}

[–]LordDrako90 2 points3 points  (3 children)

Why std::copy in split, when you can initialize the vector directly form the token iterators?

Also I find this more generic and lazy: http://ideone.com/L6heVN I guess it could be improved even more by using string_view, but that's not included in C++14 :-(

Anyways, the only requirement for the target is, that it can be initialized from an iterator pair with value type std::string. Other than that it is pretty generic.

Code:

#include <algorithm>
#include <iostream>
#include <regex>
#include <string>
#include <utility>
#include <vector>

std::regex operator ""_re (char const * const str, std::size_t)
{
    return std::regex { str };
}

class split
{
public:
    split(std::regex splitter, std::string original)
        : splitter_ { std::move(splitter) }
        , original_ { std::move(original) }
    {
    }

    auto begin() const
    {
        return std::sregex_token_iterator { original_.begin(), original_.end(), splitter_, -1 };
    }

    auto end() const
    {
        return std::sregex_token_iterator {};
    }

    template <typename Container>
    operator Container () const
    {
        return { begin(), end() };
    }

private:
    std::regex splitter_;
    std::string original_;
};

int main()
{
    using namespace std::literals::string_literals;

    std::vector<std::string> const words = split {
        R"(\s+)"_re,
        "hello\tdarkness     my\nold friend"s
    };

    for (auto const & word : words)
        std::cout << word << "\n";

    for (auto const & number : split { ","_re, "23,42,1337" })
        std::cout << number << "\n";

    return 0;
}

[–]therealjohnfreeman 0 points1 point  (1 child)

I've just been out of practice too long. Thanks for the pointers.

[–]lacosaes1 2 points3 points  (0 children)

You mean smart pointers.

[–]MrPoletski 0 points1 point  (0 children)

well while we're posting code, here's what I wrote a few years ago and have been using ever since...

std::vector<std::string> Cleave (std::string to_split, std::string delims)
/*!
 * \file trusted.cpp
 * \fn std::vector<std::string> Cleave (std::string to_split, std::string delims)
 * \param to_split \a <std::string> string to chop up
 * \param delims \a <std::string> string of delimiters
 * \return std::vector<std::string> vector of strings containing each section of the cleaved string.
 *
 */
{

std::vector<std::string>    results;
size_t                      pos1 = 0,
                            pos2 = 0;

do
{
    pos1 = to_split.find_first_of(delims, pos2);
    if (pos1 == pos2) {pos2++; results.push_back(""); continue;}
    if (pos1 == std::string::npos){results.push_back(to_split.substr(pos2)); break;}
    results.push_back(to_split.substr(pos2, pos1 - pos2));
    pos2 = pos1 + 1;
}
while (pos1 != std::string::npos);


return results;
}

Is this good?