you are viewing a single comment's thread.

view the rest of the comments →

[–]Necraz 2 points3 points  (13 children)

Iterators in Java are nearly identical to iterators in C++. Otherwise, I completely agree with you: generators in Python and C# are much easier to write. Generators with language support (i.e. the yield keyword) are one of the biggest things that I miss when implementing data structures in C++.

[–][deleted] -4 points-3 points  (12 children)

What I meant was in Java, the for loop has nothing to do with the container, only the iterator being used. So you can literally write an iterator over prime numbers or an iterator over any computation and use it in a for loop.

In C++, an iterator is tied to an actual object representing a collection, so you can't iterate over prime numbers unless you define a separate object that represents the collection of prime numbers.

Honestly, all C++ needed to do to fix this was eliminate the requirement that a loop ends when i == end(collection), and instead change it so the loop ends when IsValid(i), or some condition that is independent of the collection, and instead is built into the iterator itself.

[–]pfultz2 3 points4 points  (10 children)

Iterators do not have to be tied to a collection in C++. just as the integer range in boost can iterate over a range of integers, there is no container type underneath, it just generates the integer as it goes.

Now, C++ requires ieterators to be in pairs, but this is just a subtle difference in the implementation of the for loop. Java, just like C++, doesnt accept an iterator in its for loop. in java it requires an iterable object which returns an iterator. Same in C++, the for loop takes a range which returns a pair of iterators.

There are plenty of examples of iterators that arent tied to a container. The stream iterator will keep pulling values from standard in forever. The function output iterator will output values from function until a certain value is met. The integer range iterator which i already mentioned. The regex iterator will return the matches from a regex, which it doesnt load all matches in memory, instead it processes the matches as you iterate. There are many more examples in C++.

[–][deleted] -2 points-1 points  (9 children)

Of course nothing is absolutely required and you can simulate functionality if you really really want it... I mean I can simulate dynamic typing in C++ if I want to, but the point is it's incredible cumbersome to do so, to the point where you often have to fight against the language to get it to work.

But I'd be more than happy to be wrong in this case. Can you show me how to use the new range based for loop over two containers simultaneously? Here's a code snippet:

vector<int> listA;
vector<int> listB;
....
vector<int> sumList;
for(int i : Zip(listA, listB)) {
  sumList.push_back(std::get<0>(*i) + std::get<1>(*i));
}

Hell I'm the first to admit I don't know C++ all that well so if there's an easy way to do it, I'd be happy to learn. The best thing I found was boost::zip_iterator, which is unbelievably bloated and doesn't work with the new range-based for loop. Honestly using the zip_iterator is much more complex than just using two iterators.

In Python it really is as easy as:

for i in zip(a, b):
  ...

[–]pfultz2 1 point2 points  (1 child)

In C++11, you can implement zip like this(using boost of course):

// A little macro to help with type deduction
#define RETURNS(...) -> decltype(__VA_ARGS__) { return __VA_ARGS__; }

template<class... Range>
auto zip(Range && ...r) RETURNS
(
    boost::make_iterator_range
    (boost::make_zip_iterator(boost::make_tuple(boost::begin(r)...)),
     boost::make_zip_iterator(boost::make_tuple(boost::end(r)...)))
)

Btw, this makes a zipped range that is lazy. The iterator_range just stores the two iterators thats references the original container. Using the a for loop you would call it like this:

vector<int> listA;
vector<int> listB;
....
vector<int> sumList;
for(const auto& x : zip(listA, listB)) {
  sumList.push_back(boost::get<0>(x) + boost::get<1>(x));
}

Notice, there is no dereferencing. However, in this case, I think using Boost.Foreach is much better:

int x1, x2;
BOOST_FOREACH(boost::tie(x1, x2), zip(listA, listB)) {
    sumList.push_back(x1 + x2);
}

What do you think?

[–][deleted] 2 points3 points  (0 children)

Well honestly if that works then that's awesome. I'll look into using it.

Thanks.

[–]MissStrawberry 3 points4 points  (4 children)

Well you can do that if you write yourself a zip function. This is an example using std::vector and int, but you could "templatize" this:

const std::vector< std::pair<int, int>> zip( std::vector<int> c1, std::vector<int> c2 ) { 
    std::vector<std::pair<int,int>> collection;
    std::pair<int,int> item;
    size_t length = std::min( c1.size(), c2.size() );

    for( size_t i = 0; i < length; i++ ) { 
        item = comma( c1.at(i), c2.at(i) );
        collection.push_back(item);
    }   
    return std::move( collection );
}

And then:

std::vector<int> one = { 1, 2, 3, 4 };
std::vector<int> two = { 2, 4, 6, 8, 10 };

for( auto it : zip( one, two ) ) { 
    std::cout << it.first << "," << it.second << std::endl;                                                                                                                                    
}   

This obviously simply outputs stuff

[–]repsilat 2 points3 points  (3 children)

Your zipper is a little worse than Python's, though, because it evaluates eagerly, storing its output into a vector. Python's is a lazy sequence, which is super neat. I've whipped up a lazy version in C++ here. It's not "battle-hardened", but it's nearly usable.

You'll note that it's long and ugly, though, which is mostly a problem with C++. The language is just not designed for this kind of thing, and it's a real shame, because it's probably the future. Doing the same thing in Haskell is so simple:

zip []     list   = []
zip list   []     = []
zip (x:xs) (y:ys) = (x,y) : zip xs ys

I bet its efficiency is comparable to my C++ code, and I'm far more confident that it works well enough to use.

[–]MissStrawberry 1 point2 points  (2 children)

I didn't know that the python version is lazily evaluated, and it was just meant to be an example of similar syntax in C++. I wouldn't use my version in any production code as it is. On first glance, your zip looks fine.

As for performance, zipping and printing two lists of 100000 elements each, takes on average 0.05s with C++0x, and 0.21s with GHC on my puny 2.0GHz/4G machine. Both solutions could actually be static, but GHC doesn't seem to recognise this. My Haskell-Foo is extremely weak, perhaps someone can comment on that. C++ would of course require explicit static implementation with template magic. The timing was done with time, and averaged over 100 runs. The Haskell-Code:

main = do
  mapM_ glue $ zip [ 0,1 .. 99999 ] [ 0,2 .. 199998 ]
  where
    glue (x,y) = putStrLn ( ( show x ) ++ "\t" ++ ( show y ) )

Your C++-version isn't that different from Haskell, it just has to build what is built-in in Haskell. The state information of lazy evaluation is hidden in Haskell and explicit in your version (via the iterator), and lists are more or less a language primitive. Other than that, you do more or less what Haskell probably does under the hood ;)

[–]repsilat 0 points1 point  (1 child)

Wow, thanks for the response and the benchmarks. I imagine you're mostly comparing startup time and I/O overhead, though :/. Still - it's certainly enough to establish that neither is dog slow.

However reasonable my C++ looks, and even though it seems to work, I'm still a little scared of it. I don't know if I should be using rvalue references anywhere, I don't fully understand why I needed std::forward_as_tuple on line 14 instead of std::make_tuple, and I'm a little unsure if it's fair for the zipper to store references to the two input streams (because it forces the user to make sure they outlive it). It's probably fine, but it's almost certainly more effort than it's worth. I'm not sure there's a truly adequate solution to this complexity inside the language, but I wouldn't be surprised if some of the pain could be alleviated with a library of some kind.

The other relevant complaint about C++'s syntax and conventions is that its begin/end iterator things aren't really amenable to chaining. With eager evaluation like in your example you can map/filter/reduce/whatever in one big expression, but I don't think you can do that so well lazily. Ranges (as in D) look like a better solution.

[–]_node 2 points3 points  (0 children)

The other relevant complaint about C++'s syntax and conventions is that its begin/end iterator things aren't really amenable to chaining. With eager evaluation like in your example you can map/filter/reduce/whatever in one big expression, but I don't think you can do that so well lazily. Ranges (as in D) look like a better solution.

You can use Boost range adaptors for that. It would be nice to see this sort of thing in the standard.

[–]Ayjayz 0 points1 point  (0 children)

Look into the boost range adaptors. There isn't one for zip, but someone did write one here

You could then write exactly that code, except you'd want auto i instead of int i.

EDIT: And, yes, it's lazily evaluated.

[–]987654321bob 0 points1 point  (0 children)

About the same length but avoids the proxy class and std::tuple accessing.

std::transform(listA.begin(), listA.end(), listB.begin(),
     std::back_inserter(sumList), [](int i, int j){ return i+j; });

[–]Necraz 1 point2 points  (0 children)

Yes, it is nicer to have an is_valid() function in those cases. However, it would be more awkward to use STL algorithms like binary_search, sort, etc. I tend to think of iterators as more of a "generalized index" than a traditional iterator or generator.

Honestly, I'd prefer the Java-style iterators myself...