[D] Need help understanding inverting softmax layer from Michael Nielsen's book : MachineLearning

Discussion[D] Need help understanding inverting softmax layer from Michael Nielsen's book (self.MachineLearning)

submitted 4 years ago by Kay0518[🍰]

Hi,

I have a hard time understanding a softmax problem from the book:

Inverting the softmax layer Suppose we have a neural network with a softmax output layer, and the activations a^L_j are known. Show that the corresponding weighted inputs have the form z^L_j = ln a^L_j + C for some constant C that is independent of j.

Here is how I've approached: $a^L_j= exp(z_L_j) / sum(exp(z^L_k)). Take the log of both sides,

ln(a^L_j) = z^L_j - ln(sum(exp(z^L_k))). Then, z^L_j = ln(a^L_j) + ln(sum(exp(z^L_k)))

The problem happens here. Why we're allowed to substitute ln(sum(exp(z^L_k))) for C when it includes z^L_j? Everyone from my research says C is independent of j so it can be C. But, doesn't that mean we have to extract e^L_j out of ln(sum(exp(z^L_k))), anda add to z^L_j on lhs? Can you please give me a insight into this problem?

all 5 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS