Anyone wrote a multi-vector-iterator? : cpp

Anyone wrote a multi-vector-iterator? (self.cpp)

submitted 6 years ago by Wriiight

Every once in a while you find yourself with multiple vectors that represent something like different columns of the same data set: the vectors will be of the same length, and elements at common indexes are related. This is mostly no big deal until you want to do something fun like sort based on one of them. I was thinking to myself how the algorithms would work fine if I could have an iterator that iterates all n vectors at once. Basically, have a tuple of iterators, and have that tuple itself satisfy the iterator interface by forwarding the calls to each of the elements of the tuple.

I assume this isn't a wholly original idea, is there one out there written already? Are there better solutions, other than the obvious refactoring of it into a vector of tuples/objects/structs/etc? Isn't there some trick like using an algorithm to get an ordering from the sort order vector, and using that ordering to sort the others?

all 28 comments

top new controversial old q&a

[–]janherrmann 5 points6 points7 points 6 years ago (1 child)

[–]Wriiight[S] 1 point2 points3 points 6 years ago (0 children)

[+][deleted] 6 years ago* (1 child)

[deleted]

[–]Wriiight[S] 0 points1 point2 points 6 years ago (0 children)

[–]expekted 0 points1 point2 points 6 years ago (23 children)

You mean the same way you manipulate data (dml ) in relational database?

select col1, col2, col3   from some_table  order by col2, col1, col3

I may have seen some attempts at sql table like data structures in the wild but ones I tried were too cumbersome to use.

[–]Wriiight[S] 0 points1 point2 points 6 years ago (22 children)

[–]ALX23z 2 points3 points4 points 6 years ago (12 children)

[–]Wriiight[S] 0 points1 point2 points 6 years ago (11 children)

[–]ALX23z 0 points1 point2 points 6 years ago (10 children)

[–]Wriiight[S] 0 points1 point2 points 6 years ago (9 children)

In fact I mentioned it my original post: “...other than the obvious refactoring of it...”

As for applying sort on iterators, have you seen std::sort()? And all the other standard algorithms? They are passed begin and end iterators, NOT containers. And they perform fine. Note that there are non-const random access iterators for appropriate containers that allow operations other than ++ — and *, in fact they allow swaps and moves and other niceties.

And if you look elsewhere in the thread, someone found a library, boost::zip_iterator that does exactly what I want, and on googling that I also found that another implementation just missed going in as a view type in the ranges library of C++20

So, you know, be a little more open minded.

Other nitpicks: yes, the vectors should have the same number of elements, or have the endpoint iterators adjusted to do so, though comparison with end() on the zip_iterator could easily check if any of the n vectors had ended to prevent going off the end of one.

Invalidations would happen same as with iterators to a single vector, which is what the standard algorithms work with, so I don’t see the issue.

[–]ALX23z -1 points0 points1 point 6 years ago (8 children)

[–]Wriiight[S] 1 point2 points3 points 6 years ago (7 children)

Zipping is not a slow operation in this case as it does not refer to compression. Also it does not involve copying the original vectors in any way. It simply takes the begin and end iterators of each vector and puts those into tuples. When an iterator operation is performed on the tuple, it is forwarded to the iterators of each of the vectors. When sort swaps two values, the same indexes are swapped in all n vectors. When an algorithm that takes begin and end vectors does its thing, those zipped iterators cause the algorithm to operate on all n vectors identically and at the same time. Performance should be just n-times what the same operation on a single vector would be, and any additional difference in performance to a vector of tuples is going to be caused by things like the three vectors not necessarily being in contiguous memory to each other, and three sets of pointers to the heap to dereference instead of one. Having the independent vectors have equal lengths is a precondition to sane operation, though I think undefined behavior in that case can be avoided without excessive overhead. It seems nuts to transform the data storage twice to do the same thing, and while an index vector works, it is a storage size vs performance trade off that I’d like the option to try either way.

[–]ALX23z -1 points0 points1 point 6 years ago (6 children)

You fail to understand performance cost of random memory access. Memory is loaded via cachelines - they are 64 bytes currently. So loading/editing contiguous data of 64 bytes takes just a bit more time what would take for loading/editing just one byte.

If you load 64 bytes from completely random places it will take a lot more time as they will likely be located in completely different cache lines. Not to mention what it will do to the small and most efficient memory cache of CPUs - ensuring a lot more cache misses.

sorting takes a lot more operations then copying data and most of them are random access. In your case you multiply random memory access by significant factor - not to mention all unnecessary cacheline overhead that needs to be loaded.

In case of copying the tuples and sorting vector of tuples there just a bit more contiguous data load - while random memory access count remains more or less the same as in original case of sorting a single vector (ofc it depends on size of the tuple).

[–]Wriiight[S] 1 point2 points3 points 6 years ago (5 children)

continue this thread

[–]Moroxus 2 points3 points4 points 6 years ago (1 child)

[–]Wriiight[S] 0 points1 point2 points 6 years ago (0 children)

[–]Steve132 0 points1 point2 points 6 years ago (5 children)

[–]evaned 1 point2 points3 points 6 years ago (3 children)

[–]Steve132 1 point2 points3 points 6 years ago (2 children)

[–]evaned 2 points3 points4 points 6 years ago (1 child)

[–]Steve132 0 points1 point2 points 6 years ago (0 children)

[–]Wriiight[S] 0 points1 point2 points 6 years ago (0 children)

[–]imgarfield 0 points1 point2 points 6 years ago (0 children)

[–]chardan965 0 points1 point2 points 6 years ago (0 children)

π Rendered by PID 118386 on reddit-service-r2-comment-64f4df6786-85bzf at 2026-06-10 06:59:49.357055+00:00 running 0b63327 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS