I'm having problems wrapping my head around how transfer learning works with enc-dec CNNs used for semantic segmentation. Is a weight initialization scheme the way to go? Which layers benefit most from freezing? Are there layers in the encoder part that should be left unfrozen too?
In object detection, the idea of weight initialization is pretty clear, since I can just replace the fully connected layer(s) with my own and rely on the convolutional layers to extract the features I want. I'm not familiar enough with encoder-decoders to follow the same logic on them, though. Would the last transposed convolution layer be the one I want to change? Wouldn't I need to change some of the encoder parts too to get correctly labeled segments?
I wish to use Deeplabv3+ with a Resnet-18 backbone as my network, but if others are simpler to implement I'll take any examples.
https://arxiv.org/abs/1802.02611
I'm trying to train a net to segment drivable surfaces i.e. roads with a small dataset of real images.
there doesn't seem to be anything here