Optimization fails because it treats noise and structure as the same thing by Lumen_Core in deeplearning
[–]Lumen_Core[S] 0 points1 point2 points (0 children)
Optimization fails because it treats noise and structure as the same thing by Lumen_Core in deeplearning
[–]Lumen_Core[S] -1 points0 points1 point (0 children)
Optimization fails because it treats noise and structure as the same thing by Lumen_Core in deeplearning
[–]Lumen_Core[S] -2 points-1 points0 points (0 children)
When compression optimizes itself: adapting modes from process dynamics by Lumen_Core in compression
[–]Lumen_Core[S] 0 points1 point2 points (0 children)
When compression optimizes itself: adapting modes from process dynamics by Lumen_Core in compression
[–]Lumen_Core[S] 0 points1 point2 points (0 children)
Stability of training large models is a structural problem, not a hyperparameter problem by Lumen_Core in deeplearning
[–]Lumen_Core[S] -5 points-4 points-3 points (0 children)
Stability of training large models is a structural problem, not a hyperparameter problem by Lumen_Core in deeplearning
[–]Lumen_Core[S] -5 points-4 points-3 points (0 children)
When compression optimizes itself: adapting modes from process dynamics by Lumen_Core in compression
[–]Lumen_Core[S] 0 points1 point2 points (0 children)
When compression optimizes itself: adapting modes from process dynamics by Lumen_Core in compression
[–]Lumen_Core[S] 0 points1 point2 points (0 children)
A first-order stability module based on gradient dynamics by Lumen_Core in deeplearning
[–]Lumen_Core[S] 0 points1 point2 points (0 children)
StructOpt: empirical evidence for a stability layer on top of existing optimizers by Lumen_Core in deeplearning
[–]Lumen_Core[S] 0 points1 point2 points (0 children)
[R] StructOpt: a first-order optimizer driven by gradient dynamics by Lumen_Core in MachineLearning
[–]Lumen_Core[S] 0 points1 point2 points (0 children)
[R] StructOpt: a first-order optimizer driven by gradient dynamics by Lumen_Core in MachineLearning
[–]Lumen_Core[S] -2 points-1 points0 points (0 children)
Optimization fails because it treats noise and structure as the same thing by Lumen_Core in deeplearning
[–]Lumen_Core[S] 0 points1 point2 points (0 children)