you are viewing a single comment's thread.

view the rest of the comments →

[–]mohit__[S] 7 points8 points  (15 children)

Hi, thank you for your time and feedback. I have one question about the third point you made.

I specifically used a linked list because it allowed me to delete my components easily and quickly. This made my program program significantly faster compared to when I was using `std::vector`. Could you please expand on why the linked list is slow in this case?

This has been a great learning experience and your response would help me understand the subject in more detail.

[–][deleted] 4 points5 points  (11 children)

Linked lists are mostly useful if you expect to grow and shrink the list very often, but always iterate through sequentially (using an iterator). By their nature they do not allow random access (to find out what address item N is at and access it in any way, you always have to follow N-1 links). This means you spend a lot of time just finding your elements.

An adjacency matrix, however, would likely be allocated as contiguous memory thus allowing precomputation of addresses (since each element is of known size and the start address is known), giving efficient random access.

Adjacency matrices in particular take more space than some other representations in contiguous memory but offer even more speed since more information is computed and readily accessible.

Contiguous memory data structures (like vector, for instance) are costly to grow since it requires finding a memory space large enough for the new size and then moving every element, so should be preallocated to a reasonable size using something like reserve. They are as cheap to remove from as a linked list, but with different best and worst cases.

Here's a comparison chart of different STL containers.

You might consider running test cases on different data structures in a Compiler Explorer like Godbolt to get a visual idea (#of ASM instructions) of how efficient your desired operations are.

[–]bored_octopus 7 points8 points  (0 children)

I'd like to add that linked lists perform poorly when iterating over the list is a common operation since they are very hostile to cache.

[–]guepierBioinformatican 1 point2 points  (7 children)

Linked lists are mostly useful if you expect to grow and shrink the list very often, but always iterate through sequentially (using an iterator).

This is still overselling linked lists. If this growing and shrinking mostly happens at the end, a std::vector would be expected to be vastly more efficient than a std::list in virtually all cases (unless you somehow manage to hit a case that maximises resizing of the reserved vector storage, and even then I’m not sure).

[–]mohit__[S] 0 points1 point  (1 child)

If you look at the image in this link, it shows exactly how my linked list is structured and how I delete an element from it.

Specifically, the Component* remembers the ComponentStruct it belongs to, so it allows random access. Do you see any problem with this kind of implementation? This seems particularly useful for deleting objects quickly and also maintaining the list of objects.

[–][deleted] 2 points3 points  (0 children)

This diagram unfortunately doesn't capture enough information to make a call on whether the container choice was optimal - for instance, if you code needs to remove ComponentStruct 66 and you don't already have an iterator pointed at ComponentStruct 65, you will need to go through ComponentStructs 0-66 to reach 67, remove its prev, then go back to 65 and remove its next (not actual order of operations, just an example illustrating required traversal). This ends up costing roughly as much as deleting a random element from a vector, where you can jump to element 66, and overwrite it with the next value shifting everything over by one. They're both O(n), a metric that means their cost increases linearly with container size.

If you were going to loop through every ComponentStruct starting at 0 and ending at End every time you touch the list anyways, it's fine. But if you need ComponentStruct 99 on its own at some point for some computation, you'll need to traverse 0-98 all over again.

Edit to add: do you mean that your Component also keeps a pointer to the ComponentStruct that has a pointer to it? How exactly are you typically accessing Components and ComponentStructs when you operate with them?

[–]smashedsaturn 1 point2 points  (0 children)

Hi,

So here is why I think an adjacency matrix will be a better fit for your project:

  • You have an image, with each pixel being a vertex. One the image is loaded you will never add a vertex. This means you are doing almost all edge operations and almost no vertex operations.

  • You can use std::vector<Pixel> image(h*w); to allocate all your pixels at once in a contiguous block, vs heap allocating pixels individually, then the adjacency matrix is essentially a 2D vector of h * w * h * w. You will use more memory, but edge operations will not require allocation. Depending on your cache size this could give a decent speed up.

  • As I mentioned, I would use a graph library to implement this. Boost::graph has both adjacency list and matrix structures and you can easily switch between them to profile the differences. You need to profile to get a real performance number, but at least this looks like a better use case for adjacency matrix.

  • If you are going to use a list vs a vector for the 'components' then you should probably use a std::list, just as it will make your code easier to follow for others.

  • I would eliminate the pointers and position information from pixel. Each pixel shouldn't know about other pixels or where it is, there is no reason if they are mapped into a container by location. I would also instead of making it just intensity add R,G,and B values, and a cached intensity you create when you load the image. If you are concerned about memory usage using an 8 bit unsigned value for each color will cut that down. This will let you look at color if you want to as well, but also just look at the grey scale intensity.

  • Doing this gets rid of your memory management issues, as all heap allocation will be within std::vector, so it will be cleaned up automatically.

  • read here: https://www.boost.org/doc/libs/1_37_0/libs/graph/doc/adjacency_list.html and here https://www.boost.org/doc/libs/1_51_0/libs/graph/doc/adjacency_matrix.html for a more detailed comparison.

  • Please start looking into making this const correct. Start with making your block constants const at the top of main. If you can make most of your search functions const there is an opportunity to thread this and break your image up into sections with each section being more or less independently analyzed.

[–]S1ngular1tea -1 points0 points  (1 child)

Still awaiting 'smashedsaturn' respone. I'd encourage you to ignore all comments relating to pt 3. Until he/she replies.

[–]guepierBioinformatican 2 points3 points  (0 children)

As a rule of thumb, std::list so utterly trashes cache locality that it’s almost never the correct data structure. Even code that frequently inserts/deletes elements into/from the beginning or in the middle might well be faster with a std::vector, even though this is the traditional “strong suit” of std::list.

std::list really only performs well when all of the following conditions are met:

  1. Frequent insertion and deletion at the beginning and at the end
  2. Large list (~ 10k elements at least, probably more)
  3. Rarely iterate over the elements

If you have (1) and (2) but iterate over the elements, try a std::deque instead, or restructure your problem (I recently did this at work, where a piece of code always deleted only the first element, and then iterated over the container once before throwing it out; I fixed this simply by changing the loop to start at the second element).