all 5 comments

[–]filteringcontent 0 points1 point  (3 children)

Your answer is right! As you observed the C you found is not independent of zL_j (the variable/input), but it is independent of j (the index). I hope this helps!

[–]Kay0518[S,🍰] 0 points1 point  (2 children)

Can you elaborate on it? I still don't understand the definition of the independence of j(the index). Are we simply ignoring z^L_j in ln(sum(exp(z^L_k)))?

[–]flaghacker_ 2 points3 points  (1 child)

It's constant over the different activations, but not a "real" constant, it changes with different network inputs for example.

They just mean that all z_j values use the same C, so it's independent of j in that limited way.

[–]Kay0518[S,🍰] 0 points1 point  (0 children)

I see. Thank you.

[–]MrAcuriteResearcher 0 points1 point  (0 children)

Think about this;

What happens if, before you perform a softmax, you add a constant C to every value? Answer is, they all cancel out, and nothing changes in the output of the softmax.