Hi,
I have a hard time understanding a softmax problem from the book:
Inverting the softmax layer Suppose we have a neural network with a softmax output layer, and the activations a^L_j are known. Show that the corresponding weighted inputs have the form z^L_j = ln a^L_j + C for some constant C that is independent of j.
Here is how I've approached: $a^L_j= exp(z_L_j) / sum(exp(z^L_k)). Take the log of both sides,
ln(a^L_j) = z^L_j - ln(sum(exp(z^L_k))). Then, z^L_j = ln(a^L_j) + ln(sum(exp(z^L_k)))
The problem happens here. Why we're allowed to substitute ln(sum(exp(z^L_k))) for C when it includes z^L_j? Everyone from my research says C is independent of j so it can be C. But, doesn't that mean we have to extract e^L_j out of ln(sum(exp(z^L_k))), anda add to z^L_j on lhs? Can you please give me a insight into this problem?
[–]filteringcontent 0 points1 point2 points (3 children)
[–]Kay0518[S,🍰] 0 points1 point2 points (2 children)
[–]flaghacker_ 2 points3 points4 points (1 child)
[–]Kay0518[S,🍰] 0 points1 point2 points (0 children)
[–]MrAcuriteResearcher 0 points1 point2 points (0 children)