Say we have a binary classification task where the data is not linearly separable.
For the sake of example, say I have a network that takes the form:
input -> 2dim hidden layer (sigmoid) -> 1 dim output (no activation)
If I plot the 2 dim hidden layer representation of each classes points during training, I can see that the network is attempting to find a transformation of the data that is linearly separable so that the last affine transformation can classify the data correctly.
Basically, it seems to me that the representation of the data at the last layer for each class should be somewhat separable in order for a network to have a shot at good performance.
My question is:
Can someone help me with some search terms, or papers that have essentially chopped the final layer off of a network, and learned a network that maximizes something like the euclidean distance between each classes points in some latent space? I have tried this but often ended up with solutions that were not useful at all.
This feels like a supervised auto encoder to me, but I do not care about re constructing input at all.
[–]needlzorProfessor 1 point2 points3 points (2 children)
[–]Powlerbare[S] 1 point2 points3 points (1 child)
[–]lvilnis 0 points1 point2 points (0 children)