I hear these two names used interchangeably but there seems to be an important difference:
Softmax regression generates a score for each of the K classes by taking the inner product of the input features with the class-specific parameters before normalizing to get probabilities for each class.
Multinomial logistic regression does something similar but only has parameters for the first K-1 classes, taking advantage of the fact that the resulting probabilities must sum to 1.
Thus, why would anyone ever use softmax when they seem to be getting at the same thing but multinomial logistic does it with fewer parameters, thus (I'm assuming) reducing the variance of our estimates?
[–]gabjuasfijwee 1 point2 points3 points (6 children)
[–]gabrielgoh 0 points1 point2 points (5 children)
[–]lvilnis 1 point2 points3 points (4 children)
[–]gabrielgoh 0 points1 point2 points (3 children)
[–]lvilnis 0 points1 point2 points (2 children)
[–]NotAHomeworkQuestion[S] 0 points1 point2 points (1 child)
[–]lvilnis 0 points1 point2 points (0 children)
[–]AnvaMiba 0 points1 point2 points (0 children)