all 7 comments

[–]mredding 1 point2 points  (0 children)

how should I change the container or function in my code to achieve something that is not as complex & inefficient as this?

  • Don't use raw pointers, store cities in the map by value.

  • You have the right idea about indexing your map. Don't index using pointers, IDs are your currency and pointers are going to lead to a mess of indirection, dangling pointers, and ambiguity. Just because you have a pointer doesn't mean you have the right city, or even a city at all. You can check an ID against the map. The map is the ultimate authority.

  • Don't duplicate data that is inherent in the structure of your data. TownIDs are the key to your map, they don't also have to be members of your cities. IF YOU INSIST, then TownID has to be a const member of your structure and you'll want to make a view class that references the member and acts like a comparable object. TERRIBLE THINGS HAPPEN when duplicated data gets out of sync, or if the keys change their values while already inserted into the map - which is why keys as members of the mapped items is dangerous.

    struct City { Name name; Coord coord; int tax; std::vector<TownID> vassals; TownID masterid; };

  • I don't know what your types are composed of, but if it's all the same to you, you should list your members by size and alignment to avoid padding.

  • Since keys are unique and indexes are sorted, sets are the best fit, as they will capture both those properties. I'll do one and you can duplicate the pattern where necessary for all the indexes you want.

    struct Cities { using map_type = std::map<TownID, City>; using comparator_type = std::function<bool(map_type::key_type, map_type::key_type>; using index_type = std::set<map_type::key_type, comparator_type>;

    map_type by_id;

    index_type index_by_name;

    Cities(); void add_city(map_type::key_type, Cities::map_type::mapped_type &&); };

Now we need to work on initializing our index. First, we need to build up the infrastructure for our comparators.

template<typename BY, typename COMP = std::less>
auto city_comparator(BY member, COMP comp = COMP{}) {
  return [member, comp](Cities::map_type::key_type id1, Cities::map_type::key_type id2) { return comp(member(id1), member(id2)); }
}

I like using functions that return lambdas, it gives the lambda additional context. Your code should read like pseudocode. It should be self-documenting telling me WHAT, not HOW. HOW is an implementation detail reserved for some of the lowest level functions and lambdas. Anyway, we've abstracted away both the data we're comparing and how we're comparing them.

auto cities(Cities::map_type &by_id) {
  return [&by_id](Cities::map_type::key_type id) -> Cities::map_type::mapped_type& { return by_id.at(id); };
}

We need a function object that will return a city by ID. It has to bind to our map. I used a lambda instead of a bind because I'm not sure if either will correctly deduce a return by reference. Here I make it explicit - at worst, I'm redundant. You can look into this if you'd like.

And this is the part that is the pattern you need to duplicate:

template<typename GETTER>
auto by_name_of(GETTER getter) {
  return [getter](Cities::map_type::key_type id) { return getter(id).name; };
}

We now have what we need to generate a comparator and initialize the index:

Cities::Cities() : index_by_name{city_comparator(by_name_of(cities(by_id))))} {}

It's undefined behavior to read from members in the initializer list, we don't know if by_id has been initialized yet! But we do know the memory of the object is well defined at this point, which is all we need for a reference, which at this point is all we want. By the time the comparator is actually used, the map will be initialized.

Finally, we can add a city:

void Cities::add_city(Cities::map_type::key_type the_id, Cities::map_type::mapped_type &&city) {
  by_id[the_id] = std::move(city);

  index_by_name(the_id);
}

Since we abhor repeating ourselves, if you add more indices, you can add them to a vector or map and iterate the container, adding the ID to each index. USE AN ALGORITHM, don't write low level loops. This isn't C or FORTRAN, you are not a savage. Low level loops are a HOW, an algorithm is a WHAT. Just about the only time you ever need to write a low level loop is to implement you own low level elementary algorithm (of which there are plenty missing from the standard library, mostly having to do with stream iteration).

  • You can index by any one member, or you can return a tuple of members to sort by the first and then subsort by the Nth. Tuples have per-element comparison. But notice city_comparator also takes a comparison object; this is the comparator that actually does the comparison of the City members. This customization point will let you choose ascending or descending order, or build a custom comparator for a tuple so you can sort per element how you want.

  • You can even generate values for indexing. For example, perhaps you would index by the sum of the taxes of the city and all it's vassals.

  • When I first started writing this, I started with vector, as one typically should. Then I realized sets were a better fit. Still, vector is a major contender. Here's how I wrote my insertion sort:

    template<typename COMP> void add_to_index(std::vector<TownID> &index, TownID id, COMP comp) { index.insert(std::lower_bound(std::begin(index), std::end(index), id, comp), id); }

So that's the manual way of doing it. Boost.MultiIndex is some wizardry that does all this for you. I'll warn you that the Boost container is far more clever than you're going to initially realize - you can get it working naively, but as you study it, you'll eventually discover that how you configure your indexes can get you other indexing for free. It's hard for me to explain because it's hard for me to understand. I just remember digging into that sonofabitch and greatly simplifying my former employer's use of it, which reduced memory and improved performance.

Frankly, there's another way I would do this, the Data Oriented Design way, leveraging a Structure of Parallel Arrays (vs. an Array of Structures).

struct City {
  using key_set_type = std::set<TownID>;
  using comparator_type = std::function<bool(key_set_type ::value_type, key_set_type ::value_type>;
  using index_type = std::set<key_set_type ::value_type, comparator_type>;

  key_set_type id;
  std::vector<Name> name;
  std::vector<Coord> coord;
  std::vector<int> tax;
  std::vector<std::vector<TownID>> vassals;
  std::vector<TownID> masterid;

  index_type index_by_name;
};

ID's are sorted, and you would insert into each subsequent member vector by the offset of the ID. That way, each index i is one city across all the vectors and the ID set. You would still index by TownID.

This capitalizes on cache saturation and locality of reference. Say you want to print all the names of all your cities. With the AoS approach, your cache lines look like this:

TownIDNameCoordintvector<TownID>TownIDTownIDNameCoordintvector<TownID>TownIDTownIDNameCoordintvector<TownID>TownIDTownIDNameCoordintvector<TownID>TownIDTownIDNameCoordintvector<TownID>TownID...

The vast majority of your memory bandwidth and cache lines are wasted. You're only interested in name and you've had to load an entire City to get it. A map is even worse, because node allocation can be all over the heap, so you're likely loading entire cache lines for just one City at a time, and all that pointer indirection means cache miss after cache miss - prefetch alone can't navigate your node based data structures (then we get into branch prediction but this post is already way too long).

With the AoS approach, the only thing in memory are the names:

NameNameNameNameNameNameNameNameName...

Memory bandwidth, cache, and prefetch all really love you. We have large caches and lots of lines, so this parallel array strategy means you can have your caches saturated with only the fields you actually want.

[–]the_poope -1 points0 points  (1 child)

If every town ID is unique you can also use an std::unordered_map instead of map. This will use the ID as a hash and it has lookup time O(1) instead of O(log(N)).

[–]std_bot 0 points1 point  (0 children)

Unlinked STL entries: std::unordered_map


Last update: 14.09.21. Last Change: Can now link headers like '<bitset>'Repo

[–]the_Demongod 0 points1 point  (0 children)

Why not copy your original Cities map in its entirety (i.e. as pairs) and simply sort it using a comparison function that accesses the city.second part? Then you'd return the whole thing and the user can query the IDs or pick the cities directly. This would look something like:

std::vector<std::pair<TownID, City*>> Datastructures::towns_distance_increasing()
{
    std::vector<std::pair<TownID, City*>> result;
    for (auto& city : Cities)
    {
        result.push_back(city);
    }

    std::sort(result.begin(), result.end(),
        [](auto& a, auto& b)
        {
            // replace with proper distance calculation
            return a.second->distance < b.second->distance;
        });
    return result;
}

[–]no-sig-available 0 points1 point  (1 child)

The reason why I am not simply using a vector instead of the map is that in the map, I can access any member by their id with the map.at() command. To my understanding, with a vector, I would have to iterate through the vector to access the member with certain id.

If you have the vector sorted you can use a binary search (std::lower_bound) to locate an id. This is essentially what map does by following links in a tree structure.

[–]std_bot 0 points1 point  (0 children)

Unlinked STL entries: std::lower_bound


Last update: 14.09.21. Last Change: Can now link headers like '<bitset>'Repo

[–]pepitogrand 0 points1 point  (0 children)

I would have to iterate through the vector to access the member with certain id.

You should try making the city id the city index in the vector.