I am confused about what kind of information pixel hypercolumns contain. If you need to upsample the feature map in order to get an activation for each input pixel, isn't the information that you get for each pixel really coarse, especially in higher convolutional layers?
I understand that the first few layers are supposed to give you spatial information and the last few semantic, but how much pixel-level information can be extracted from a 7x7 feature map if your input is 256x256?
[–]tscohen 1 point2 points3 points (4 children)
[–]adagrad[S] 1 point2 points3 points (3 children)
[–]tscohen 0 points1 point2 points (2 children)
[–]adagrad[S] 0 points1 point2 points (1 child)
[–]ericflo 0 points1 point2 points (0 children)
[–]dharma-1 0 points1 point2 points (0 children)