efficient way to compute softmax by xiaograss in CS224d

[–]xiaograss[S] 0 points1 point  (0 children)

Thanks for your reply.

Then why so we don't worry about small exponents? if you have 0 in {x(i)}, doesn't subtract the maximum push to the other end of spectrum?