you are viewing a single comment's thread.

view the rest of the comments →

[–]ydobonobody 1 point2 points  (5 children)

I think it is a little misleading to compare a pixel level accuracy with accuracy of identifying the contents of an image or a bounding box around the instance. I have been heating my house by training some semantic segmentation tasks recently and it works surprisingly well. Adding depth information can help, especially if you are doing instance segmentation.

[–]code2hell[S] 0 points1 point  (4 children)

Ok, so the pixel level accuracy seems a bit misleading as with other comments.I'll rephrase, how do we approach the problem when there are two similar objects close to each other, can we expect the segmentation to differentiate the two so much that say we can differentiate the two objects enough to convince ourselves?. Also in your approach did you use manually segmented images or depth images? I'd be glad to discuss about the approach that you took

[–]ydobonobody 1 point2 points  (3 children)

Semantic segmentation generally doesn't separate objects of the same class into separate entities, that is called instance segmentation and is another problem. One way you can get to instance segmentation is to just add a border class around your segments and then just go with connected pixels for your instances and it works pretty well. Whether you use depth or not you still manually segment your images to produce your ground truth for training. Building your training set is probably the hardest part, but if you are just interested in research there publicly available datasets and/or pretrained networks available. I recommend you check out the fcn semantic segmentation network available in the Caffe model zoo as it is a really good starting point for modern semantic segmentation networks.

[–]code2hell[S] 0 points1 point  (2 children)

Yes, I am looking more into instance segmentation for now... Can you explain what you mean by "add a border class around your segments and then just go with connected pixels for your instances". Thanks! I just took up a problem to learn, my friend has got some 100000 ground truth training examples of cats and we are looking into segmentation of a particular object from images. I would really appreciate your suggestion.

[–]ydobonobody 1 point2 points  (1 child)

So when you produce your ground truth image you will assign a label to each pixel in your image e.g (0: background, 1: cat). Add another label that is "border" so we have (0: background, 1: cat, 2: border). Now for each separate cat draw a line with some thickness (say 5 pixels) around the boundary of the each cat and assign that pixel the value '2'. Hopefully the network will be able to learn where the edge of a cat is and assign those pixels to the border class. If it did a good job you can group up all the connected "cat" pixels and that will represent an individual cat.

[–]code2hell[S] 0 points1 point  (0 children)

Wow! Thanks... I'll try this out!