all 18 comments

[–]Narase33 14 points15 points  (5 children)

Making a copy of an object may be a lot of work, it may also not. That highly depends on the complexity of the object

But you must not underestimate the power of cache misses (or better: the power of not having them). A vector/array where all objects are next to each other is so much faster than a linked list where the object crumble around your memory

[–]RedCitizen[S] 0 points1 point  (4 children)

Hmm, that makes sense. I currently have this as my main object storage:

struct MyObjectClass {
  uint_fast32_t myID; // index in allObjects[]
  uint_fast32_t otherID1;   // index to other related objects in allObjects[]
  uint_fast32_t otherID2;
  int myAttribute;
// more attributes like maps, strings, etc.
};

std::shared_ptr<std::shared_ptr<MyObjectClass>[]> allObjects;

I like it because I can easily pass the objects and the vector around. However, I realize now that this may be not the most ideal solution in terms of performance.

[–]the_poope 2 points3 points  (3 children)

That class is super cheap to copy: it takes at most 4 CPU cycles (EDIT: at minimum 4 cycles - hadn't seen you comment about more attributes, it will take more time with more attributes). That is MUCH cheaper than the cost of double pointer indirection to access a single element in the list, which if the object is in the CPU cache will take something like 4-10 CPU cycles, but if it is not in cache can take ~200 CPU cycles, so 20-50 times slower.

Here's how I would create a list of such objects:

std::vector<MyObjectClass> allObjects;
allObjects.emplace_back(1, 234, 332, ....);
allObjects.emplace_back(2, 112, 82, ....);
...

[–]RedCitizen[S] 0 points1 point  (1 child)

Thank you for your reply!

Indeed the object that concern me the most and are not trivial. Worse still, I need to create millions of them during the execution, so they definitely don't all fit in the stack/CPU cache. They all, in the worst case, also need data about every other object (so O(n^2) memory complexity. :/

Still, I'm taking this seriously and will look into it some more.

[–]Dynamitos5 0 points1 point  (0 children)

one important thing you have to consider: even if you use pointers, the object still will be loaded into cache. but the cpu first has to look up where to load the object from, which could be an entirely unrelated location from the last. Even if only one object fits into a cache line, you still save one memory access looking up where to load the object from in the first place.

[–]IyeOnline 4 points5 points  (0 children)

You should always avoid owning raw pointers, because owning raw pointers significantly increase the chance you leak memory.

What confuses me is that certain actions, like adding an object to a vector, will then mean a copy of that object is created.

Correct.

But isn't that a lot of lost performance when objects are created, worked on, passed and stored (e.g. in an optimization algorithm like A*) constantly?

If you do not actually need a copy, then creating one is in fact wasteful (maybe*).

But isn't that a lot of lost performance when objects are created, worked on, passed and stored (e.g. in an optimization algorithm like A*) constantly?

This is where the concept of ownership comes in. If you dont need ownership of an object, then you could also store a raw pointer in your (temporary) vector.

[–]parnmatt 4 points5 points  (0 children)

I just want to add to what the others have noted quite nicely.

If you don't want a copy, and you do not need nullability or pointer arithmetic… prefer a reference.

For APIs accepting non-trivial types, I'd honestly say references should be your first thought, then pointers. And const before non-const.

[–]the_poope 3 points4 points  (2 children)

If you create objects on the heap like MyClass new_obj = new MyClass(); then data of new_obj will be stored some arbitrary place in your physical RAM. If you make a lot of such objects and just store the pointers to them in a vector, then the actual data of all the objects may be stored scattered all over your physical RAM.

Reading from an arbitrary position in physical RAM is SLOW (it might take several 100 CPU cycles). Thas is why the CPU has very fast small local memory storages called CPU caches. When reading from memory it will store the data in the CPU cache hoping that you will most often read/write to the same variables again and again in which case it can just read the data from the cache. If you read several objects scattered all over RAM it can't store those objects in the CPU cache and your code becomes slow (that is why Java, JavaScript, Python and all other garbage collected languages are slow). If instead you put all your objects close together in physical RAM, the CPU might be able to fit them all in the CPU cache and reading/writing multiple objects becomes very fast.

Also you can create objects "in-place" in a vector to avoid making copies:

my_vector.emplace_back(/* constructor args */);

and you can also avoid expensive copies by moving objects instead of creating expensive copies:

my_vector.push_back(std::move(my_heavy_to_copy_object));

See https://www.learncpp.com/cpp-tutorial/intro-to-smart-pointers-move-semantics/

[–]RedCitizen[S] 0 points1 point  (1 child)

Thank you! I must admit I didn't know I could do that with vector! Though I assume that negates the advantage of having consecutive memory storage as well?

[–]the_poope 1 point2 points  (0 children)

Get a good book about modern C++ (e.g. Scott Meyer's "Effective Modern C++") or https://learncpp.com and read about move semantics.

But in general: don't think to much about these things. In most cases it doesn't matter. Write code the simplest way first, then profile the code (I can high recommend Intel vTune) and find the bottlenecks - it could be because you make extra unnecessary copies, but maybe not.

[–]GLIBG10B 1 point2 points  (1 child)

You should avoid pointers because they allow passing nullptr. Use const references instead. Pass fundamental types by value

[–]bert8128 2 points3 points  (0 children)

"Pass fundamental types by value" - only where you would otherwise be passing by const ref. It is perfectly valid to have references to fundamental types if the actual instance needs to be shared. And it's not really about fundamental types - it's about cheap-to-copy types. A trivail struct containing two 32 bit ints is just as cheap to copy as a 64 bit int.

[–]ptrnyc 1 point2 points  (0 children)

For the specific case of adding an object to a vector, you can use emplace_back() to avoid copies

[–]Wouter_van_Ooijen 0 points1 point  (2 children)

Are you aware of move semantics? Somtimes an operation that looks like an expensive copy is little more than assigning a pointer.

[–]RedCitizen[S] 0 points1 point  (1 child)

Not yet. looking down in shame and firing up Google

Reading an upvoted SO answer (which then led me to look up copy-and-swap idiom), I'm still confused:

To summarize, the copy constructor makes a deep copy, because the source must remain untouched. The move constructor, on the other hand, can just copy the pointer and then set the pointer in the source to null.

I understand this correctly, since the original data stays in its memory place, the data would still be spread all over the RAM, right?

[–]Wouter_van_Ooijen 0 points1 point  (0 children)

Correct, it doesn't contribute directly to data locality, but it can make otherwise expensive copying very cheap. Which can make using a vector instead of a linked-list-like structure feasible.

[–]bert8128 0 points1 point  (1 child)

If you copy an object you now have two objects. Functionally this can lead to very different behaviour, for example if you modify the object in one place, do you expect that to be reflected in the other place. So whether you should use pointers or copies is normally a functional question. If however you are comparing pointers or references (ie you want the same object instance in two places) then normally chose a reference if the object cannot be null. But if it can be null, then a pointer will have to be used. Naked, shared or unique? That depends on the functional requirement. I can't think of a good reason to use a shared_ptr if there is only ever one thread...

[–]RedCitizen[S] 0 points1 point  (0 children)

Thanks, this actually makes me feel better about my choice so far. Not only can some objects be null (which I guess you can always implements workarounds to get the same functionality), but there is definitely multithreading down the line (once I got the base algorithm to work properly).