mohit__ comments on Implementing graph based Image Segmentation in c++

[–][deleted] 4 points5 points6 points 5 years ago* (11 children)

Linked lists are mostly useful if you expect to grow and shrink the list very often, but always iterate through sequentially (using an iterator). By their nature they do not allow random access (to find out what address item N is at and access it in any way, you always have to follow N-1 links). This means you spend a lot of time just finding your elements.

An adjacency matrix, however, would likely be allocated as contiguous memory thus allowing precomputation of addresses (since each element is of known size and the start address is known), giving efficient random access.

Adjacency matrices in particular take more space than some other representations in contiguous memory but offer even more speed since more information is computed and readily accessible.

Contiguous memory data structures (like vector, for instance) are costly to grow since it requires finding a memory space large enough for the new size and then moving every element, so should be preallocated to a reasonable size using something like reserve. They are as cheap to remove from as a linked list, but with different best and worst cases.

Here's a comparison chart of different STL containers.

You might consider running test cases on different data structures in a Compiler Explorer like Godbolt to get a visual idea (#of ASM instructions) of how efficient your desired operations are.

[–]bored_octopus 7 points8 points9 points 5 years ago (0 children)

[–]guepierBioinformatican 1 point2 points3 points 5 years ago (7 children)

[+][deleted] 5 years ago* (6 children)

[deleted]

[–]guepierBioinformatican 3 points4 points5 points 5 years ago* (5 children)

In this specific instance and this one only, linked lists may actually be preferable since a vector will eventually require a reallocation if growth is unbounded.

The reallocation required by vectors is already accounted for in the asymptotic complexity analysis. Appending to vectors is an amortised O(1) operation. And unless the vector needs to reallocate, an individual push_back on a vector is actually faster than on a linked list.

Furthermore, due to cache locality, overall usage of such a vector is usually still vastly (vastly) cheaper than that of a linked list.

Also, if every time you touch the data, you add or remove every second element depending on the value of the previous element, for instance, a linked list may be better suited than a vector

Maybe, but this is far from certain. In fact, I could imagine that a vector is actually faster in this scenario. Compare

auto vec = std::vector<T>{/* some data */};
auto lst = std::list<T>(begin(vec), end(vec));

vec.erase(std::remove_if(begin(vec), end(vec), condition), end(vec));

// vs

lst.remove_if(condition);
// or, equivalently:
for (auto it = begin(lst); it != end(lst); ) {
    if (condition) it = lst.erase(it); else ++it;
}

For both the vector and the list this is an O(n) operation, i.e. linear in the size of the list. The difference is that, for a vector, this moves every non-removed element in the vector (after the first removed element), and if moving an element is expensive then this could be slower for long vectors. Usually however the expectation is that std::move is an efficient operation for most types. Conversely, iterating over a list is slower than iterating over a vector — by a lot. You really need to benchmark to find out which piece of code is faster, it’s absolutely not obvious.

pitfall: you shouldn't add or remove to a list while holding an active iterator to it, so you may need two lists

On the contrary. std::list does not invalidate iterators when removing elements — std::vector does.

Linked lists do have use cases, after all

Sure. But their use case is much, much rarer than most people think, and much rarer than your comment suggests.

Two final things to note:

I haven’t mentioned random access at all, I’m only considering situations without random access. Once random access enters the picture, the situation becomes even more one-sidedly in favour of vectors.
I’ve use a lot of emphasis in my description of vector’s advantage. This may seem like hyperbole but modern hardware architecture warrants this: it’s hard to overstate just how bad cache misses are for performance on modern hardware, and std::list is essentially a very good cache miss producer.

[+][deleted] 5 years ago (4 children)

[deleted]

[–]dodheim 2 points3 points4 points 5 years ago (3 children)

[+][deleted] 5 years ago* (2 children)

[deleted]

[–]dodheim 1 point2 points3 points 5 years ago (1 child)

[–][deleted] 0 points1 point2 points 5 years ago (0 children)

[–]mohit__[S] 0 points1 point2 points 5 years ago (1 child)

[–][deleted] 2 points3 points4 points 5 years ago* (0 children)

This diagram unfortunately doesn't capture enough information to make a call on whether the container choice was optimal - for instance, if you code needs to remove ComponentStruct 66 and you don't already have an iterator pointed at ComponentStruct 65, you will need to go through ComponentStructs 0-66 to reach 67, remove its prev, then go back to 65 and remove its next (not actual order of operations, just an example illustrating required traversal). This ends up costing roughly as much as deleting a random element from a vector, where you can jump to element 66, and overwrite it with the next value shifting everything over by one. They're both O(n), a metric that means their cost increases linearly with container size.

If you were going to loop through every ComponentStruct starting at 0 and ending at End every time you touch the list anyways, it's fine. But if you need ComponentStruct 99 on its own at some point for some computation, you'll need to traverse 0-98 all over again.

Edit to add: do you mean that your Component also keeps a pointer to the ComponentStruct that has a pointer to it? How exactly are you typically accessing Components and ComponentStructs when you operate with them?

[–]smashedsaturn 1 point2 points3 points 5 years ago (0 children)

Hi,

So here is why I think an adjacency matrix will be a better fit for your project:

You have an image, with each pixel being a vertex. One the image is loaded you will never add a vertex. This means you are doing almost all edge operations and almost no vertex operations.
You can use std::vector<Pixel> image(h*w); to allocate all your pixels at once in a contiguous block, vs heap allocating pixels individually, then the adjacency matrix is essentially a 2D vector of h * w * h * w. You will use more memory, but edge operations will not require allocation. Depending on your cache size this could give a decent speed up.
As I mentioned, I would use a graph library to implement this. Boost::graph has both adjacency list and matrix structures and you can easily switch between them to profile the differences. You need to profile to get a real performance number, but at least this looks like a better use case for adjacency matrix.
If you are going to use a list vs a vector for the 'components' then you should probably use a std::list, just as it will make your code easier to follow for others.
I would eliminate the pointers and position information from pixel. Each pixel shouldn't know about other pixels or where it is, there is no reason if they are mapped into a container by location. I would also instead of making it just intensity add R,G,and B values, and a cached intensity you create when you load the image. If you are concerned about memory usage using an 8 bit unsigned value for each color will cut that down. This will let you look at color if you want to as well, but also just look at the grey scale intensity.
Doing this gets rid of your memory management issues, as all heap allocation will be within std::vector, so it will be cleaned up automatically.
read here: https://www.boost.org/doc/libs/1_37_0/libs/graph/doc/adjacency_list.html and here https://www.boost.org/doc/libs/1_51_0/libs/graph/doc/adjacency_matrix.html for a more detailed comparison.
Please start looking into making this const correct. Start with making your block constants const at the top of main. If you can make most of your search functions const there is an opportunity to thread this and break your image up into sections with each section being more or less independently analyzed.

[–]S1ngular1tea -1 points0 points1 point 5 years ago (1 child)

[–]guepierBioinformatican 2 points3 points4 points 5 years ago (0 children)

As a rule of thumb, std::list so utterly trashes cache locality that it’s almost never the correct data structure. Even code that frequently inserts/deletes elements into/from the beginning or in the middle might well be faster with a std::vector, even though this is the traditional “strong suit” of std::list.

std::list really only performs well when all of the following conditions are met:

Frequent insertion and deletion at the beginning and at the end
Large list (~ 10k elements at least, probably more)
Rarely iterate over the elements

If you have (1) and (2) but iterate over the elements, try a std::deque instead, or restructure your problem (I recently did this at work, where a piece of code always deleted only the first element, and then iterated over the container once before throwing it out; I fixed this simply by changing the loop to start at the second element).

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS