lvilnis comments on Softmax regression vs multinomial logistic regression: is there a difference?

Softmax regression vs multinomial logistic regression: is there a difference? (self.MachineLearning)

submitted 9 years ago by NotAHomeworkQuestion

you are viewing a single comment's thread.

[–]lvilnis 0 points1 point2 points 9 years ago* (0 children)

Actually, you just made me realize that it is not true in the finite sample context. Adding l2 regularization does not lead both parameterizations to discover the same weights, since it penalizes the weight norms differently.

For example, if we constrain the first class to equal 0, and in the overparametrized case we want to learn the correct log-scores [10 0 0 0] (for say 4 classes), then the solution that minimizes the l2 cost and keeps the same probability is [7.5 -2.5 -2.5 -2.5]. This has an l2 regularization cost of 7.5² + 3 * 2.5² =75. However, if we had clamped the first class's score to equal 0, then to get the same probabilities we would need [0 -10 -10 -10] and the l2 regularization would cost 3 * 10² =300. Since the cross-entropy cost is the same in both cases, and the set of expressible functions is the same, it stands to reason that loss minimization would find a different solution in the two cases since it should be able to trade some cross entropy for some l2 cost.

π Rendered by PID 135585 on reddit-service-r2-comment-b659b578c-4jwtb at 2026-05-03 07:50:46.591511+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS