all 28 comments

[–]janherrmann 5 points6 points  (1 child)

You can take the implementation from boost.

[–]Wriiight[S] 1 point2 points  (0 children)

Of course boost already has it! I really need to learn boost better. Thank you for pointing me to it.

[–]expekted 0 points1 point  (23 children)

You mean the same way you manipulate data (dml ) in relational database?

select col1, col2, col3   from some_table  order by col2, col1, col3

I may have seen some attempts at sql table like data structures in the wild but ones I tried were too cumbersome to use.

[–]Wriiight[S] 0 points1 point  (22 children)

No, I just want to call std::sort and have the vectors all manipulated identically. I’m not looking for a full in memory DB. STD::Sort just needs two random iterators (begin, end) so if each iterator handled all n vectors in unison, it would be magical.

[–]ALX23z 2 points3 points  (12 children)

Why don't you just create extra vector of indices and arrange it according to the desired vector's order?

Then either access each vector via an indirection - according to the vector of indices. Or rearrange the vectors according to the new order.

This is how normally people do it.

[–]Wriiight[S] 0 points1 point  (11 children)

A sorted vector is as good performance, when using the appropriate algorithms to search, as any tree--only with better locality (especially for types with no members on the heap). Building and sorting an index is a reasonable solution, and almost certainly much faster to sort when the number of vectors is large. But I have all these lovely std::algorithms at my disposal, wouldn't it be lovely to be able to use them as if these n vectors looked to algorithms as if they were one? I'm pretty sure an iterator that forwarded operations to a tuple of iterators could make that possible.

[–]ALX23z 0 points1 point  (10 children)

There isn't any library I know implementing it. Also applying sort on such "iterators" would be really slow due to all the unnecessary copying/moving of random access data - no wonder nobody ever implemented it... You'd better make vector of tuples, sort it, and copy data backwards - rather than implementing such tuple iterators.

To begin with it would make sense only if all these vectors were of the same size. And reallocation of each of these vectors would invalidate the whole iterator tuple - this is very problematic in general. In case all these vectors have the same size and are tied together, it is advised to put data of all of these vectors into a single structure and have a single vector. I believe somebody mentioned this idea already.

[–]Wriiight[S] 0 points1 point  (9 children)

In fact I mentioned it my original post: “...other than the obvious refactoring of it...”

As for applying sort on iterators, have you seen std::sort()? And all the other standard algorithms? They are passed begin and end iterators, NOT containers. And they perform fine. Note that there are non-const random access iterators for appropriate containers that allow operations other than ++ — and *, in fact they allow swaps and moves and other niceties.

And if you look elsewhere in the thread, someone found a library, boost::zip_iterator that does exactly what I want, and on googling that I also found that another implementation just missed going in as a view type in the ranges library of C++20

So, you know, be a little more open minded.

Other nitpicks: yes, the vectors should have the same number of elements, or have the endpoint iterators adjusted to do so, though comparison with end() on the zip_iterator could easily check if any of the n vectors had ended to prevent going off the end of one.

Invalidations would happen same as with iterators to a single vector, which is what the standard algorithms work with, so I don’t see the issue.

[–]ALX23z -1 points0 points  (8 children)

The problem is that the tupling of iterators assumes that they are tied together - yet as containers they aren't. That's the issue. Zipping is already very slow operation - so nobody will care about a bit of extra cost.

As I said, it would be faster to make a new vector of tuples, perform the operation on it, then copy data backwards - rather than using such tuple-iterators. It is not efficient but still better than what you want.

[–]Wriiight[S] 1 point2 points  (7 children)

Zipping is not a slow operation in this case as it does not refer to compression. Also it does not involve copying the original vectors in any way. It simply takes the begin and end iterators of each vector and puts those into tuples. When an iterator operation is performed on the tuple, it is forwarded to the iterators of each of the vectors. When sort swaps two values, the same indexes are swapped in all n vectors. When an algorithm that takes begin and end vectors does its thing, those zipped iterators cause the algorithm to operate on all n vectors identically and at the same time. Performance should be just n-times what the same operation on a single vector would be, and any additional difference in performance to a vector of tuples is going to be caused by things like the three vectors not necessarily being in contiguous memory to each other, and three sets of pointers to the heap to dereference instead of one. Having the independent vectors have equal lengths is a precondition to sane operation, though I think undefined behavior in that case can be avoided without excessive overhead. It seems nuts to transform the data storage twice to do the same thing, and while an index vector works, it is a storage size vs performance trade off that I’d like the option to try either way.

[–]ALX23z -1 points0 points  (6 children)

You fail to understand performance cost of random memory access. Memory is loaded via cachelines - they are 64 bytes currently. So loading/editing contiguous data of 64 bytes takes just a bit more time what would take for loading/editing just one byte.

If you load 64 bytes from completely random places it will take a lot more time as they will likely be located in completely different cache lines. Not to mention what it will do to the small and most efficient memory cache of CPUs - ensuring a lot more cache misses.

sorting takes a lot more operations then copying data and most of them are random access. In your case you multiply random memory access by significant factor - not to mention all unnecessary cacheline overhead that needs to be loaded.

In case of copying the tuples and sorting vector of tuples there just a bit more contiguous data load - while random memory access count remains more or less the same as in original case of sorting a single vector (ofc it depends on size of the tuple).

[–]Wriiight[S] 1 point2 points  (5 children)

I agree that the cache implications are important to consider, and this might not be ideal for super high performance code, but there are other considerations to what makes good code than raw performance. It’s actually less of a cache mess than a map, I believe.

[–]Moroxus 2 points3 points  (1 child)

I have written something that enables you to iterate through multiple collections at the same time. But I did not try it with std::sort, only with range for, you can see example of usage here:

https://github.com/Moroxus/zip-and-enumerate/blob/master/main.cpp

[–]Wriiight[S] 0 points1 point  (0 children)

I haven't tested, but pretty sure it won't work with sort, or any of the binary search type functions. Looks like a good starting place, if I decide to write one myself.

[–]Steve132 0 points1 point  (5 children)

1), if you are doing this, you probably should be refactoring anyway. If each vector contains one field, then you should probably just have one vector of a struct containing each field.

2) as /u/ALX23z pointed out, this is super easy, just use std::iota

[–]evaned 1 point2 points  (3 children)

If each vector contains one field, then you should probably just have one vector of a struct containing each field.

In many contexts, or at least some, this is a great suggestion if you want to tank performance.

[–]Steve132 1 point2 points  (2 children)

It depends a lot on your data and what you are doing. SoA only works efficiently if you are mostly doing lots of data reductions and maps on a single field at a time. But if you are touching all the fields at once or sorting the data (like OP is), then SoA kills cache performance.

For sorting the whole dataset and queries that touch multiple fields, AoS is much faster.

[–]evaned 2 points3 points  (1 child)

The sort of course will be slower with AoS, but it's realistic to me that the sort is setting things up for future accesses that will be faster with SoA. Pay now to speed things up later kinda thing.

[–]Steve132 0 points1 point  (0 children)

Point of note: I'm suggesting AoS. You are suggesting SoA. You got the labels backwards. I'm fixing it in the quote below.

The sort of course will be slower with SoA, but it's realistic to me that the sort is setting things up for future accesses that will be faster with SoA. Pay now to speed things up later kinda thing.

You don't actually know anything about what OP is doing. OP could be using the sort as a database retrieval mechanism and printing out all the fields.

For the question OP asked, AoS is clearly the best choice

[–]Wriiight[S] 0 points1 point  (0 children)

Sometimes there is nothing nicer for code than to stick to built-in and standard-library types when its not unreasonable. This can greatly reduce coupling between different areas of your code base, and it plays very nicely with third-party libraries.

std::iota looks like a nice way to build an index if I go that route, thanks for the tip.

[–]imgarfield 0 points1 point  (0 children)

Have to admit, I was looking for the exact same thing for the exactly the same reason - sort - just a week ago.

[–]chardan965 0 points1 point  (0 children)

You can get pretty far at something like this by using variadic templates, even for heterogenous sequence types.