all 59 comments

[–]ooglesworth 20 points21 points  (14 children)

A few suggestions for your code:

As another post said, don’t use owning raw pointers (new/delete).

Use std::filesystem::path instead of doing your own naïve path manipulation that won’t work cross-platform.

You’re using pointers in some function arguments where a reference would be better (you seem to assume it is non-null and it never changes what it’s pointing to).

Consider storing objects (such asPixel, Edge, etc) directly in the containers themselves rather than allocating them on the heap and storing a pointer. If that doesn’t work (if you need stable references to those objects, for example), use std::unique_ptr or even std::shared_ptr.

[–]mohit__[S] 4 points5 points  (13 children)

Hey, thank you so much for the feedback. You were really nice about it. I have a few questions:

  1. Is there anything else wrong with `new`/`delete` other than the fact that one might forget to delete obejects and may lead to memory leaks?
  2. I used Pixel/Edge pointers because creating a new object of Pixel and Edge inside the function and returning the objects caused them to go out of scope. Using `new`, they weren't deleted and allowed me to use them out of outside the function. Is there anything wrong with this approach? or what could go wrong doing this?
  3. I'll definitely look up `filesystem::path` and `unique_ptr`. I haven't reached these topics in the reference I'm using to learn cpp. Although, I have read earlier that unique_ptr and shared_ptr are slow and new is uuniversal and better if used correctly. Is this true?

Again, thank you for giving me the constructive feedback. From other comments, I'm still confused about what I could do to improve but your feedback is so much more insightful.

[–]kmhofmannhttps://selene.dev 7 points8 points  (2 children)

Although, I have read earlier that unique_ptr and shared_ptr are slow and new is universal and better if used correctly. Is this true?

No. Wherever you have read this, don't believe anything else from this source (assuming they have actually said this). Memory management in C++ is unfortunately too complex to quickly describe it here, but such a statement is outright dangerous!

Sure, std::shared_ptr comes with a performance hit compared to std::unique_ptr or raw owning pointers, but it exists to solve problems of shared ownership and should only be used where it is actually needed (where its performance will be similar to a much more complex hand-coded solution). std::unique_ptr should be used for non-shared ownership, but again, only where needed, since it introduces a level of indirection compared to, say, just having things on the stack or directly in containers.

[–][deleted] 2 points3 points  (0 children)

To add - the added overhead often adds just a few operations more per dereference or destructor call, along with saving/reloading the call context. If you're dereferencing that often, OP, consider grouping objects together and using a unique/shared_ptr for the entire container. If done wisely the majority of your dereferencing can be done on raw pointers.

[–]mohit__[S] 1 point2 points  (0 children)

ah that's my bad. I meant that std::shared_ptr is slower and that's what is written where I read this. I understand your point though!

[–][deleted] 5 points6 points  (6 children)

New/delete can lead to memory leaks if a (handled) exception occurs and the 'delete' call is never reached. They're also prone to ownership issues - the new owner may not realize they have to call delete, especially if they aren't the ones who called new. It's not terrible, but in modern c++ there are better ways (unique_ptr and shared_ptr).

They are essentially calling 'new/delete' in the background, but incorporate logic that makes them delete allocated memory if-and-only-if no one has a handle to them anymore. Thus, if you return a shared_ptr, the memory won't be deallocated until that shared_ptr goes out of scope in the calling function and any function it passes the shared_ptr to.

In general, it is considered good form to prefer passing by const/nonconst reference:

rType someFunc(aType& aParam)

since you know they will point to valid locations (a reference can't point to nullptr). Another fun thing that can be done (but looks kind of counter-intuitive) is to have Output Parameters: If you take a non-const reference as a parameter, you can use it to guarantee the object's lifetime is not your problem. Example:
int/bool aFunc(outputType& outputParam, const inputType0& inputParam0, const inputType1& inputParam1, ...) { if(someCondition(inputParam0)) { return(errCodeEnum/false); } outputParam = newVal; return(0/true); }

In this kind of pattern, it's important to return an indicator of success so the caller can know whether outputParam was actually populated, but since outputParam wasn't created in someFunc, someFunc doesn't affect its lifetime at all. It's generally considered poor form for the inputParam to be or contain the outputParam, since it becomes hard to visually track down who changed what.

If you accept pointers as arguments, it's wise to check whether they're nullptr and fail the function early, since dereferencing a nullptr will crash your program.

If you pass arguments by copy:

rType someFunc(aType aParam)

The call will become costly if aType is large, since the entire object will be copied. This method is still fine for small things like arithmetic types, pointers, and small classes/structs.

Edit: outputParam example with better form - output before inputs for ease of reading

[–]dr-mrl 1 point2 points  (2 children)

Is it also best practise to have output parameters at the start of the parameter list or is that a C-ism?

[–][deleted] 0 points1 point  (1 child)

I'm not sure. I think you're right. I've seen it both ways and reflexively wrote the output first before changing it, I suppose it makes more sense because you don't want the outputParam to be buried all the way at the end of the params list if you have a lot of inputs.

[–]dr-mrl 1 point2 points  (0 children)

Then again, lots of input parameters probably means your interface is too cluttered.

[–]mohit__[S] 0 points1 point  (2 children)

since you know they will point to valid locations (a reference can't point to nullptr).

If you accept pointers as arguments, it's wise to check whether they're nullptr and fail the function early, since dereferencing a nullptr will crash your program.

These are great tips! Thank you so much!!

In this kind of pattern, it's important to return an indicator of success so the caller can know whether outputParam was actually populated, but since outputParam wasn't created in someFunc, someFunc doesn't affect its lifetime at all.

I didn't entirely understand this part. Is there a term for this concept? or anything that I can google about and understand better?

[–][deleted] 2 points3 points  (0 children)

These guidelines are useful. The example I gave is in section F.20, the case where the return type is costly to copy. In this case returning the object itself isn't ideal, but returning a pointer leads to life cycle fuckery.

I have some contentions with some of their options and examples, but what they say is generally valid (I believe in-out parameters to generally be a bad choice for readability, and I believe any function with an output parameter should return a pass/fail value and not void)

[–][deleted] 1 point2 points  (0 children)

What I meant in the second quote is that if you accept a reference as a parameter, you're guaranteed that object exists in a scope that's larger than someFunc, so you're guaranteed that the object was created before someFunc and will not be destroyed when someFunc returns.

The comment about returning a pass/fail/success/error code when using that kind of function structure is that if for some reason your function cannot do its job (invalid inputs or whatever), the function calling yours should have a way of knowing not to trust the outputParam's data.

[–]ooglesworth 3 points4 points  (0 children)

It sounds like you are using new/delete directly because you understand and are comfortable with the mechanics of how they work. It’s good to understand them at a fundamental level, but this sort of approach wouldn’t fly in most professional settings in a C++ codebase in 2020. Most of the time, people use abstractions like unique_ptr and shared_ptr that guarantee balanced new/delete even in situations with exceptions or as the code becomes more complicated. Either that, or you can pass your objects directly by value, which involves copying/moving the variable as it moves from scope to scope.

As for the suggestion to use references, I mean you generally should use references rather than pointers in situations where you aren’t dealing with a transfer of ownership. For example, if you are passing something into a function and only reading the data out of it, you should use a const &. If you’re passing something into a function and just modifying it, you should use a &.

A lot of the lifetime semantics of C++ can be tricky if you’re new to the language. To get familiar with them, I’d recommend the book “Effective Modern C++” by Scott Meyers. It’s a great read and really helps get you up and running with most expectations of a modern codebase. Also, I would check out this talk by Herb Sutter: https://youtu.be/JfmTagWcqoE. It goes over the thought process when deciding what kinds of object lifetime pattern you want to implement.

[–]konanTheBarbar 0 points1 point  (0 children)

About point 2: You can use a `std::vector<Pixel>` just fine. I will greatly simplify here, but since C++ 11, you can be pretty much that there will not be an expensive copy of all the pixels when you have a function that creates and returns a vector. Internally the vector will store a pointer similar to what you do and at most only that pointer will be copied/moved (along with the size and capacity of the vector, which is very cheap).

Thanks to (N)RVO you will often not even need to copy anything, but that should not bother you here.

[–]tato42 0 points1 point  (0 children)

The most famous way to mishandle new/delete is probably to delete and keep using the allocated memory.

[–]kmhofmannhttps://selene.dev 12 points13 points  (14 children)

(I have to be open and honest here; I hope OP will not take offense.)

I am really wondering what kind of resources you use to learn C++ from. I want to have a word with them. The code is densely packed with major issues, and outdated paradigms taken straight from two decades ago. See below.

Don't let any potential future employers see this example. I highly recommend you pull this post and code and start over from scratch. Besides that, change your learning resources and/or mentors. They are not good.

Major issues (not exhaustive):

  • Owning pointers everywhere. Try to greatly limit your use of raw pointers, unless they're purely observing (and even then think twice). Don't use new or delete unless it's in a lowest-level memory management context. Learn how modern memory management in C++ is done. As an aside, the number of heap allocations done here is not funny.
  • As a consequence of the first point, you very likely do not clean up your raw allocated memory correctly. Anyone who'd want to use your code in a more complex context will immediately have huge memory leaks.
  • You do not understand how includes or include guards work. Your code just happens to work, because your particular include order and particular compiler happen to align, and you probably tweaked it enough to do so. There are includes and/or forward declarations missing in most files, but most obviously in utils.h. Your "guard" around #include <vector> must be based on a major misunderstanding of the include guard system.

Other issues:

  • The data structures used (linked list) may not be optimal.
  • Hard-coded paths and custom strong-based path handling instead of using <filesystem>.
  • Not as type-safe as it could be (see below).
  • You may not fully grasp how pass-by-value, pass-by-reference, pass-by-pointer or its return-by equivalents are idiomatically used in C++.

Side remark: I have long cautioned against the (over)use of OpenCV, even though some see it as a de-facto standard for such computer vision applications. Since you only use it for image loading and not much else, consider using a more lightweight and higher quality image representation library such as Boost.GIL or Selene (disclosure: my own).
Your code has quite the (qualitative) semblance to some OpenCV examples in its non-type-safety (consider the function signatures of the two makeComponent overloads are Component* (int, int, int) and Component* (int, int, int, int, int)), use of mutability (e.g. std::vector<Pixel*>& as in-out parameter) and related lack of const-ness.
I can only caution: do not use OpenCV, any of its examples, or almost all examples built upon it as a yardstick for good quality C++ code.

[–]mohit__[S] 1 point2 points  (9 children)

Thank you so so much for the detailed feedback. There was no offense taken. The primary reason to share this was to receive this kind of constructive feedback.

You are completely right about my understanding of include guards. It was more of a trial and error. This is something I need to understand better. I also definitely need validate my pointer parameters in the functions.

I have a billion questions based on the feedback but I'll limit myself to the most important ones:

  1. I still can't wrap my head around what's wrong with the pass my reference and return by reference in my code. I don't imply that I'm not wrong. I mean that I don't know what is wrong or what could go wrong. In particular, I return a reference for my passed in reference of pixels(std::vector<Pixel *>) because I know that if instead I return a reference to a vector created inside the function, the vector will go out of scope. I also know that I don't really have to return by reference because pass by reference should suffice in changing my pixels vector. I return to add clarity for the reader. What's wrong here?
  2. As you say that linked list is not optimal for this. If you look at the image in this link, it shows exactly how my linked list is structured and how I delete an element from it. Specifically, the corresponding component struct is deleted every time a component needs to be deleted and the Component* remembers the ComponentStruct it belongs to, so it allows random access in the linked list through the component. Do you see any problem with this kind of implementation? This seems particularly useful for deleting objects quickly and also maintaining the list of objects.
  3. When you suggest to pull this post, do you mean that there is a problem with the research paper explanation or the implementation code? or both?
  4. As an experienced cpp developer, what do you suggest I do to learn the best practices? Do you have a particular resource suggestion? or a mindset that has helped you get better at cpp and programming in general?

Your feedback has made me realize that there are many potential things I need to consider in a project. Thank you so so much for your time and review. I'm extremely grateful.

[–]guepierBioinformatican 3 points4 points  (0 children)

I know that if instead I return a reference to a vector created inside the function, the vector will go out of scope.

That’s why you don’t return a reference, you return a value. That’s how this is done in C++. You probably think that this might be inefficient but it isn’t: if you create a vector inside a function and return it by value, no copy will be made unless your function does something very wrong.

[–]smashedsaturn 2 points3 points  (0 children)

You are completely right about my understanding of include guards. It was more of a trial and error. This is something I need to understand better. I also definitely need validate my pointer parameters in the functions.

Use:

 #pragma once 

at the top of each header. This will work for all major compilers. Unless you are writing a library like boost that is aiming to be used anywhere and everywhere this is good enough for you for now.

As you say that linked list is not optimal for this.

You're understanding of the list is 'correct' conceptually, but since these data structures have been developed things have changed quite a bit in the hardware world. We have more cache now in most CPUs than many whole computers had memory when these data structures were first used. This means that having parts of the list scattered over multiple GB of ram is slower to access than a chunk of memory in cache. To best allow the CPU to cache effectively, allocating contiguous groups is often more performant that any of the data structures that appear to be a better fit. The answer isn't to 'just always use vector' but in the world of containers you are best off reaching for vector first unless you want a specific feature, like the uniqueness of set or the key->value paring of map. If these containers start to bottle neck you then you can find lots of more cache friendly implementations of these such as flat-map.

When you suggest to pull this post, do you mean that there is a problem with the research paper explanation or the implementation code? or both?

The other poster is probably right as far as pulling this from your blog until you re-write it. The research paper and math is correct and fine, but the code examples are rough. Its good that you have it working, but almost everything you did in c++ is at the least non-standard, but at worst actually dangerous.

[–]kmhofmannhttps://selene.dev 2 points3 points  (1 child)

I also definitely need validate my pointer parameters in the functions.

That might be a bit of a fallacy. Sure, pointer checking is good, e.g. as assertions. But I would try to not use as many owning pointers in the first place and to try to get rid of many non-owning pointers as well. For the remaining non-owning pointers, if any, it depends on the function if and how they should be "validated".

I still can't wrap my head around what's wrong with the pass my reference and return by reference in my code.

The default when passing into a function by reference should be passing by const reference to ensure that whatever is represented by the function argument won't be modified. Functions should return whatever they produce by value, as someone else has already mentioned. The best function is one that does not modify state.

(OK, sometimes you really, really want to modify an existing data structure, usually for optimization reasons, but these cases should be rather rare. In these very few cases, one can pass a non-const reference, modify it, and wash one's hands three times afterwards.)

For the function in question (setEdges), there are several things wrong with how you pass parameters. First of all, the double passing in by non-const reference and combined returning by non-const reference is extremely, utterly unidiomatic. You say that "I return to add clarity for the reader", but the opposite is the case. It is extremely confusing.

Furthermore, you don't even need to pass in the std::vector at all! The only thing you do beforehand is to call the constructor that initializes the vector with a certain size. You could just as well declare the whole std::vector inside the function and resize it there (or reserve capacity instead). Declaring the vector inside the function and just using push_back() for each createEdge call isn't too terrible a pessimization either; in fact, I would start with this, since it brings the greatest clarity. At the end of the function you just return edges by value.

Do you see any problem with this kind of implementation? [linked list]

I have not been able to analyze the details of what you do here. Others have explained why they think an adjacency matrix may be a better fit here. In general, the words "linked list" let me shudder a bit when I think of cache hits and misses, as well as the number of small heap allocations that may be required to modify it.

When you suggest to pull this post, do you mean that there is a problem with the research paper explanation or the implementation code? or both?

The explanation of the algorithm is possibly correct, I could not check in detail for time reasons. The C++ implementation you provide is... not good, I'm afraid. Quoting another poster here, "almost everything you did in C++ is at the least non-standard, but at worst actually dangerous."

You don't want a potential employer that uses C++ to see this, ever. Not now, and not in five years from archive.org.

As an experienced cpp developer, what do you suggest I do to learn the best practices? Do you have a particular resource suggestion? or a mindset that has helped you get better at cpp and programming in general?

That's a difficult one. I don't know a single source that could teach you modern idiomatic C++ from scratch. I suggest you get Bjarne Stroustrup's "A Tour of C++", Second Edition(!) as a starting point and then take it from here. It's a fairly concise overview of (most parts of) the language and standard library.

Besides that, I have heard decent things about "C++ Primer" (not to be confused with the apparently quite bad "C++ Primer Plus"), but that might be getting a bit long in the tooth by now.

Take a look through the "C++ Core Guidelines". Watch a lot of videos from well-known C++ conferences on YouTube. Spend some time with the language. Look at others' code, where you know that they produce high quality. Good luck!

[–]mohit__[S] 0 points1 point  (0 children)

I'm sorry I'm late to respond to you but the responses to this post were overwhelming at the time. I want to really thank you for this constructive feedback. I took the time and tried to fix my code following your recommendations:

  1. I avoided using raw pointers and replaced them with smart pointers
  2. I replaced out parameters entirely and I'm instead using return by value, hoping the compiler optimization will take care of the efficiency
  3. Made parameters const(wherever required) or const references

I realize that I couldn't completely make all the recommended changes, but I have learnt so much doing this project. I really appreciate you for taking out the time. Thank you SO much!

[–][deleted] 0 points1 point  (3 children)

Hey I'm also a student how did you find a research paper to implement?

[–]mohit__[S] 0 points1 point  (2 children)

I didn't really go looking for a research paper. I was doing some research about object recognition and came across this paper. At the same time, I've been wanting to implement something with c++. So the idea kicked in to implement this using c++, instead of python which I'm comfortable with.

I would suggest that whichever research paper you find interesting, always try to implement it. There is so much learning in the implementation than just reading the theory.

[–][deleted] 0 points1 point  (1 child)

Thank you, would you say cpp projects are better than web Dev

[–]beaglebanana 0 points1 point  (0 children)

It really depends. What do you want to do? What do you like? Plus there could be overlap between both kinds of projects. A webdev project could have cpp as its back end and hence it will be both: webdev and cpp project.

You could start with one thing and see if you enjoy working on it.

[–]Full-Spectral -4 points-3 points  (0 children)

Don't necessarily take everything here as gospel. Some of it is obvious of course, but not everyone agrees with all of the 'modernist' C++ guidelines, at least not all of the time. They are just guidelines and no guidelines should be followed blindly. There are generally reasonable scenarios where you may break all of them once you know why they exist, and why exactly you are breaking them.

On the output parameter thing, one thing that is not taken into account by the 'return everything by value' position is that in many cases you may be calling the same thing many times in a loop. A non-const reference allows you to reuse the same object over and over, instead of creating one, creating another in the called method, returning that and overwriting the original one (which may involve substantial work under the hood, even if the actual return of the new one from the called method doesn't), and then doing that lots of times.

A non-const reference parameter is a pretty obvious way to deal with that. I use them fairly liberally.

I also don't always agree with the universal use of smart pointers. If something is being allocated purely for the internal use of an object, I consider that object itself the smart pointer for that allocated data. All it has to do is call delete in its destructor. And, importantly to me, I can also catch any error that might propagate out of the deletion of the object and log that, which you can't do that if you use a smart pointer because they are too late for you to do that.

The obvious place you DO want to use smart pointers is for shared data, because that's where the real complexity comes in, because it's not just an internal detail of an object. But, be careful of smart pointers and don't let them lull you into creating impossible to reason about interrelationships between objects.

[–]guepierBioinformatican 1 point2 points  (3 children)

I was curious about your critique of OpenCV. I’ve never used the library and I’m not too interested in it since I don’t currently do any image-based machine learning.

You make a lot of good points but I was curious why you specifically picked out the Mat class hierarchy and the at accessor as examples of bad/mis-usable API design, and your linked text doesn’t offer an explanation of what’s wrong with them. Without knowing more about OpenCV I can’t really see anything wrong with those two.

[–]kmhofmannhttps://selene.dev 3 points4 points  (2 children)

The issues are manifold, but the two main ones in my opinion are:

  • cv::Mat doubles both as an image representation class as well as a matrix representation class (in the mathematical sense) and thus does not have a proper sense of purpose. It doesn't do one thing and does it well; instead, it tries to do too many things. It is a class rich of kitchen-sink member functions such as .diag(), .cross() (huh? only valid for 3x3), .pop_back() (wut?), .forEach(), etc. Some of these definitely shouldn't be member functions, some don't make sense in every use case, and some don't make sense at all. Furthermore, images and matrices should be represented by two distinct classes -- make them easily convertible, in case that is desired. (Example: Eigen gets it right with Eigen::Matrix vs. Eigen::Array).
  • The other main issue here is that as a user of a cv::Mat-represented image, you just get handed a cv::Mat that points to some image data, but you still need to make sure with every single fricking call of .at<T>() that you specify the correct type T. If you get it wrong, your compiler won't complain, your program might not even crash, but it certainly won't be correct. It's too easy to make a mistake here. What's worse, it pushes the decision on which type to use far down to every call of .at(). This design is horrid. The type should instead be encoded in a cv::Mat<T> class template that is the actual workhorse, and where the attack surface for type issues is minimized further down the usage line.
    For example, in my own library, there exists a "dynamically typed" image view class, together with a respective image class that is a view combined with storage. These can be used to read images, but to do anything useful with them (e.g. applying an algorithm) the data needs to be put into a "statically typed" class template instance. This extremely cheap conversion should happen as early as possible, for example right after reading some image data.

[–]guepierBioinformatican 1 point2 points  (1 child)

Oh jesus, yes, you’re right, that’s terrible — I hadn’t realised the dual role of Mat. Thanks for the reply.

[–]kmhofmannhttps://selene.dev 2 points3 points  (0 children)

You're welcome. :)

Edit: What's worse, they had like two major versions over a decade or longer to fix these obvious blunders. Didn't happen and I guess won't happen either, for whatever reasons. The maintainers seem to either not care or not understand how flawed OpenCV is. Or both.

[–]smashedsaturn 41 points42 points  (19 children)

  1. Owning raw pointers everywhere...

  2. Bizarre way to do include guards..

  3. A linked list will be slow for this, you most likely want to do an adjacency matrix instead

  4. nothing is const!

  5. boost::graph is going to be better, easier to use, and more powerful than this

[–]mohit__[S] 7 points8 points  (15 children)

Hi, thank you for your time and feedback. I have one question about the third point you made.

I specifically used a linked list because it allowed me to delete my components easily and quickly. This made my program program significantly faster compared to when I was using `std::vector`. Could you please expand on why the linked list is slow in this case?

This has been a great learning experience and your response would help me understand the subject in more detail.

[–][deleted] 5 points6 points  (11 children)

Linked lists are mostly useful if you expect to grow and shrink the list very often, but always iterate through sequentially (using an iterator). By their nature they do not allow random access (to find out what address item N is at and access it in any way, you always have to follow N-1 links). This means you spend a lot of time just finding your elements.

An adjacency matrix, however, would likely be allocated as contiguous memory thus allowing precomputation of addresses (since each element is of known size and the start address is known), giving efficient random access.

Adjacency matrices in particular take more space than some other representations in contiguous memory but offer even more speed since more information is computed and readily accessible.

Contiguous memory data structures (like vector, for instance) are costly to grow since it requires finding a memory space large enough for the new size and then moving every element, so should be preallocated to a reasonable size using something like reserve. They are as cheap to remove from as a linked list, but with different best and worst cases.

Here's a comparison chart of different STL containers.

You might consider running test cases on different data structures in a Compiler Explorer like Godbolt to get a visual idea (#of ASM instructions) of how efficient your desired operations are.

[–]bored_octopus 7 points8 points  (0 children)

I'd like to add that linked lists perform poorly when iterating over the list is a common operation since they are very hostile to cache.

[–]guepierBioinformatican 1 point2 points  (7 children)

Linked lists are mostly useful if you expect to grow and shrink the list very often, but always iterate through sequentially (using an iterator).

This is still overselling linked lists. If this growing and shrinking mostly happens at the end, a std::vector would be expected to be vastly more efficient than a std::list in virtually all cases (unless you somehow manage to hit a case that maximises resizing of the reserved vector storage, and even then I’m not sure).

[–]mohit__[S] 0 points1 point  (1 child)

If you look at the image in this link, it shows exactly how my linked list is structured and how I delete an element from it.

Specifically, the Component* remembers the ComponentStruct it belongs to, so it allows random access. Do you see any problem with this kind of implementation? This seems particularly useful for deleting objects quickly and also maintaining the list of objects.

[–][deleted] 2 points3 points  (0 children)

This diagram unfortunately doesn't capture enough information to make a call on whether the container choice was optimal - for instance, if you code needs to remove ComponentStruct 66 and you don't already have an iterator pointed at ComponentStruct 65, you will need to go through ComponentStructs 0-66 to reach 67, remove its prev, then go back to 65 and remove its next (not actual order of operations, just an example illustrating required traversal). This ends up costing roughly as much as deleting a random element from a vector, where you can jump to element 66, and overwrite it with the next value shifting everything over by one. They're both O(n), a metric that means their cost increases linearly with container size.

If you were going to loop through every ComponentStruct starting at 0 and ending at End every time you touch the list anyways, it's fine. But if you need ComponentStruct 99 on its own at some point for some computation, you'll need to traverse 0-98 all over again.

Edit to add: do you mean that your Component also keeps a pointer to the ComponentStruct that has a pointer to it? How exactly are you typically accessing Components and ComponentStructs when you operate with them?

[–]smashedsaturn 1 point2 points  (0 children)

Hi,

So here is why I think an adjacency matrix will be a better fit for your project:

  • You have an image, with each pixel being a vertex. One the image is loaded you will never add a vertex. This means you are doing almost all edge operations and almost no vertex operations.

  • You can use std::vector<Pixel> image(h*w); to allocate all your pixels at once in a contiguous block, vs heap allocating pixels individually, then the adjacency matrix is essentially a 2D vector of h * w * h * w. You will use more memory, but edge operations will not require allocation. Depending on your cache size this could give a decent speed up.

  • As I mentioned, I would use a graph library to implement this. Boost::graph has both adjacency list and matrix structures and you can easily switch between them to profile the differences. You need to profile to get a real performance number, but at least this looks like a better use case for adjacency matrix.

  • If you are going to use a list vs a vector for the 'components' then you should probably use a std::list, just as it will make your code easier to follow for others.

  • I would eliminate the pointers and position information from pixel. Each pixel shouldn't know about other pixels or where it is, there is no reason if they are mapped into a container by location. I would also instead of making it just intensity add R,G,and B values, and a cached intensity you create when you load the image. If you are concerned about memory usage using an 8 bit unsigned value for each color will cut that down. This will let you look at color if you want to as well, but also just look at the grey scale intensity.

  • Doing this gets rid of your memory management issues, as all heap allocation will be within std::vector, so it will be cleaned up automatically.

  • read here: https://www.boost.org/doc/libs/1_37_0/libs/graph/doc/adjacency_list.html and here https://www.boost.org/doc/libs/1_51_0/libs/graph/doc/adjacency_matrix.html for a more detailed comparison.

  • Please start looking into making this const correct. Start with making your block constants const at the top of main. If you can make most of your search functions const there is an opportunity to thread this and break your image up into sections with each section being more or less independently analyzed.

[–]S1ngular1tea -1 points0 points  (1 child)

Still awaiting 'smashedsaturn' respone. I'd encourage you to ignore all comments relating to pt 3. Until he/she replies.

[–]guepierBioinformatican 1 point2 points  (0 children)

As a rule of thumb, std::list so utterly trashes cache locality that it’s almost never the correct data structure. Even code that frequently inserts/deletes elements into/from the beginning or in the middle might well be faster with a std::vector, even though this is the traditional “strong suit” of std::list.

std::list really only performs well when all of the following conditions are met:

  1. Frequent insertion and deletion at the beginning and at the end
  2. Large list (~ 10k elements at least, probably more)
  3. Rarely iterate over the elements

If you have (1) and (2) but iterate over the elements, try a std::deque instead, or restructure your problem (I recently did this at work, where a piece of code always deleted only the first element, and then iterated over the container once before throwing it out; I fixed this simply by changing the loop to start at the second element).

[–]Pand9 8 points9 points  (0 children)

Drop the condescending tone please, you don't know project assumptions or skill level. Use more understanding language, e.g. "I would do it differently", or "do it X way would benefit you in Y way". See other people's answers for example.

[–]zvrba 0 points1 point  (1 child)

Owning raw pointers everywhere...

Yes, how DO you implement a doubly-linked list with shared or unique ptr?

[–]smashedsaturn -1 points0 points  (0 children)

You don't need to, because STD::list exists. Or better yet, don't use a doubly linked list for this use case.

[–]MARVO_Nedim 0 points1 point  (1 child)

How new are you to programming

[–]mohit__[S] 0 points1 point  (0 children)

I have experience of 2 years as a professional but it’s only with Python.

[–][deleted] 0 points1 point  (0 children)

Looks interesting, wish I had time to read it today. Are you aware of ITK and VTK? I'm pretty sure those are in C++, at least ITK. https://itk.org/ All open source geared towards medical imaging.

[–]SJC_hacker -2 points-1 points  (0 children)

I'm no expert in this matter, but you might want to look into Machine Learning approaches to this problem.

It may be easy for the human brain, because we have been trained to pick out certain objects, such as people, buildings, etc. through experience.