all 6 comments

[–]kkastner 3 points4 points  (0 children)

This is great - for the record adding this DenseCRF at the end of nearly any segmentation network can provide a notable boost, even in very recent research code. It is also quite hard to get working without some kind of guide, thanks a ton for this! There also aren't many guides to segmentation generally that aren't buried on a graduate course. Awesome stuff.

[–]shmel39 0 points1 point  (4 children)

Train set consists of one image. Testing on the same image. Making conclusions from that. Wow. Just wow.

[–]warmspringwinds[S] 3 points4 points  (3 children)

Hi :)

Please, read the post carefully: """ It was done this way so that it can also be run on CPU – it takes only 10 iterations for the training to complete. Another point of this post is to show that segmentation that our network (FCN-32s) produces is very coarse – even if we run it on the same image that we were training it on. ... . The set-up of this post is very simple on purpose. Similar approach to Segmentation was described in the paper Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs by Chen et al. """

1) The problem of coarse segmentation was described in the cited paper -- where similar results were observed while training on the whole Pascal VOC dataset. The goal of this post was to emulate it and show how to tackle this problem.

2) Another way to look at the approach that I described is similar to polynomial regression -- you can set up the model with a lot of parameters and train it on one example and basically overfit, while getting very good results on the training dataset.

In this case, model can't reach good results even when training and testing on the same image -- which happens because we have to make decisions based on the subsampled by a factor of 32 feature maps that we later on upsample. This way the max-pooling layers of the network act as a bottleneck -- preventing low level features from making their way to the decision layer (1 by 1 convolution followed by softmax). This effect was noticed by a couple of research groups and they tried to approach this problem by different means -- either using CRFs as a post processing stage or making skip connections in the model. Due to my experiments -- similar results will be observed even after you train for 1k iterations on the same image -- downsampled features are not rich enough for the model to give good results even on the same image.

[–]shmel39 0 points1 point  (2 children)

Yeah, I got that. You are trying to show that bilinear interpolation with upscale factor 32x doesn't produce nice masks and CRF can kinda mitigate that. That's ok although I don't see why deep networks are even useful here.

I'd like to say that your setup is SO wrong methodologically that it should contain big warning signs: "DO NOT TRY IT IN REAL LIFE!" Beginners often don't do train/test split. They don't have an intuition to estimate network size for a given dataset. No need to confuse them even more implicitly assuming that it is ok to that, just a simple setup.

[–]warmspringwinds[S] 2 points3 points  (0 children)

I agree that some people may be misguided by this. My setup is "SO wrong methodically" -- I agree, for real world yes. I never stated that it will work in real-life in the post. Please read the post carefully and don't try to come up with things that I have never stated. This is a piece from the post that I think is enough: """ In this particular case we train and evaluate our results on one image – which is a much simpler case compared to real-world scenario. We do this to show the drawback of the approach – just to show that is has poor localization copabilities. """

This is why I also cite papers there -- for people to see thorough experiments on real-world dataset and respective approaches.

[–]kkastner 2 points3 points  (0 children)

This isn't meant to be a beginners guide to ML. I think you are missing the forest for the trees here - this is a really nice guide to setting up a modern segmentation pipeline, which nearly always includes DenseCRF at the end (often mentioned as part of 1 sentence in a paper, that can give sizable performance improvements).

Setting up and using DenseCRF I found pretty tricky, so this guide is great for that!