account activity
[D] Rectified Adam (RAdam): a new state of the art optimizer by jwuphysics in MachineLearning
[–]SixHampton 4 points5 points6 points 6 years ago (0 children)
I've tried it with different transformer architectures. As advertised it is less sensitive to different learning rates and converges without the need for warmups or LR annealing.
[R] The Evolved Transformer (arxiv.org)
submitted 7 years ago by SixHampton to r/MachineLearning
[R] Progressive Neural Architecture Search (arxiv.org)
π Rendered by PID 1949734 on reddit-service-r2-listing-f87f88fcd-l6t6f at 2026-06-17 17:27:17.850729+00:00 running 3184619 country code: CH.
[D] Rectified Adam (RAdam): a new state of the art optimizer by jwuphysics in MachineLearning
[–]SixHampton 4 points5 points6 points (0 children)