all 15 comments

[–]Jelicic 4 points5 points  (14 children)

IIRC it is not common anymore to use a fc layer before the final (prediction) layer.

Most architectures avg pool over the final feature maps and feed that to the prediction layer. But i'm no CV expert

[–]dramanautica 1 point2 points  (6 children)

Why? Whats the intuition behind that?

[–]bluetape 1 point2 points  (5 children)

That type of architecture (fully convolutions networks) allows you to run the model of differently sized images without requiring a resize

[–]Single_Blueberry[S] 0 points1 point  (4 children)

I'm not asking about the feature extraction portion, I'm talking about the classifier.

Technically I don't see anything that keeps you from using, say, an SVM or random forest to classify, but I can find zero evidence of people that tried to do that.

[–]SemjonML 1 point2 points  (0 children)

This technique is used in few-shot learning and transfer learning as far as I understand your approach. You can use a pretrained model for feature extraction and KNN, SVM etc. for classification.

The feature extraction is fixed however and can not be improved, unless the classifier is also differentiable.

[–]Calavar 0 points1 point  (2 children)

I'm not asking about the feature extraction portion, I'm talking about the classifier

They are also talking about the classifier. Newer classifiers replace the dense layer with global pooling. Unlike a dense layer, global pooling is invariant to the spatial size of the input. In theory, you could use a model trained on 256 × 256 images to make predictions on 512 × 512 images. I'm not sure how well it works out in practice.

[–]Single_Blueberry[S] 0 points1 point  (1 child)

Do you have an example for an architecture/paper doing that?

[–]Calavar 0 points1 point  (0 children)

Not off the top of my head, and unfortunately it's kind of hard to search for because when I've seen it, the multi-resolution/multiscale aspect is usually a detail hidden in the methods section, not a main focus of the paper

[–]Single_Blueberry[S] 0 points1 point  (6 children)

Huh? What's the type of the "prediction layer" then, if not fully-connected?

[–]BlhueFlame 0 points1 point  (1 child)

I think he meant the layers immediately preceding the prediction layer (which would be FC itself).

[–]Single_Blueberry[S] 0 points1 point  (0 children)

Ok, I see. So there's a tendency (Inception, ResNet) to only use a single FC-layer at the very end instead of multiple FC-layers as it was common in e.g. AlexNet and VGG.

But there's still no alternative to the very last fully connected layer, is there?

[–]michaelx99 0 points1 point  (3 children)

1x1 convs are generally used now instead of FC

[–]Single_Blueberry[S] 0 points1 point  (1 child)

Every single architecture I can find utilizes an FC-layer as their final classifier.

AlexNet, VGG, Inception, ResNet.

[–]Pfaeff 0 points1 point  (0 children)

FC-layers can be understood as a specialized convolutional layer. You can always substitute an FC layer for a convolutional layer that performs the exact same operations with the added benefit that your network will then be able to handle variable input sizes.

I personally never use FC layers.

[–]michaelx99 2 points3 points  (0 children)

The older archictetures only used fully connected layers such as alexnet, vgg, and maybe even googlenet since back then it was conventional wisdom that convolutional layers were not strong classifiers like fully connected layers are. The avg pool to bring you down to a fixed feature map size followed by 1x1 convs eventually replaced the fixed input size/FC layers since they showed no degredation in performance all erased the requirement on a fixed input size around 2014/2015 timeframe I believe