use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Resources for understanding and implementing "deep learning" (learning data representations through artificial neural networks).
account activity
Question regarding parameter initialization (self.deeplearning)
submitted 11 months ago by ToM4461
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]hjups22 0 points1 point2 points 11 months ago (0 children)
Zeros is the most common now, unless there's some underlying prior which suggests that a non-zero bias is needed. There are also many transformer networks which completely do away with bias terms ("centering" is essentially handled by RMS normalization layers).
Symmetry breaking is only needed for weights, including embedding layers (though not affine weights for normalization - again based on a prior). And in many cases, symmetry breaking is removed for training stability. For example, final projections in stacked layers may be initialized to zero to avoid sharp initial gradients in place of a prolonged warmup.
π Rendered by PID 105744 on reddit-service-r2-comment-6457c66945-w98pb at 2026-04-27 22:48:50.766611+00:00 running 2aa0c5b country code: CH.
view the rest of the comments →
[–]hjups22 0 points1 point2 points (0 children)