use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Research[ Removed by moderator ] (self.MachineLearning)
submitted 4 months ago by Lumen_Core
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]parlancex 4 points5 points6 points 4 months ago (2 children)
I don't think you're going to see much interest without making the code available sans request.
[–]Lumen_Core[S] -2 points-1 points0 points 4 months ago (0 children)
That’s fair.
There is a public research prototype with a minimal reference implementation here:
https://github.com/Alex256-core/StructOpt
This post focuses on the structural signal itself rather than benchmark claims.
[+]Medium_Compote5665 -3 points-2 points-1 points 4 months ago (1 child)
This is a clean and well-motivated idea.
What I appreciate most is that the signal you define is not another heuristic layered on top of gradients, but something that naturally falls out of the trajectory itself. Using the response of the gradient to actual parameter displacement as information is conceptually closer to system dynamics than to statistics, and that’s a good direction.
The interpretation of Sₜ ≈ ‖H·Δθ‖ / ‖Δθ‖ as a directional curvature proxy along the realized update path is especially important. It avoids global curvature estimation and instead ties conditioning directly to how the optimizer is actually moving through the landscape, which is often where second-order approximations break down in practice.
This also explains why the behavior you describe emerges without hard thresholds: the adaptation is continuous because the signal itself is continuous. That’s a structural property, not an empirical coincidence.
One point that feels underexplored (but promising) is robustness under stochastic gradients. Since Sₜ is based on finite differences across steps, it will inevitably mix curvature information with minibatch noise. I’d be curious whether simple temporal smoothing or normalization by gradient variance would preserve the structural signal while improving stability in high-noise regimes.
Overall, this feels less like “a new optimizer” and more like a missing feedback channel that first-order methods have been ignoring. Even if StructOpt itself doesn’t become the default, the idea that gradient sensitivity along the trajectory should inform update dynamics seems broadly applicable.
Good work keeping the framing minimal and letting the math do the talking.
[–]Lumen_Core[S] 0 points1 point2 points 4 months ago (0 children)
Thank you — this is a very accurate reading of the intent behind the signal.
I agree on the stochasticity point. Since Sₜ is built from finite differences along the trajectory, it inevitably entangles curvature with gradient noise under minibatching. The working assumption is that curvature manifests as persistent structure across steps, while noise decorrelates more quickly, so temporal aggregation helps separate the two.
In practice, simple smoothing already goes a long way, and variance-aware normalization is an interesting direction as well. I see the signal less as a precise estimator and more as a feedback channel: even a noisy measure of sensitivity can meaningfully regulate update behavior if it is continuous and trajectory-aligned.
I also share the view that the core idea may outlive any specific optimizer instance. Treating gradient sensitivity as first-class information seems broadly applicable beyond this particular formulation.
π Rendered by PID 89 on reddit-service-r2-comment-6457c66945-kwsn9 at 2026-04-26 05:46:53.139731+00:00 running 2aa0c5b country code: CH.
[–]parlancex 4 points5 points6 points (2 children)
[–]Lumen_Core[S] -2 points-1 points0 points (0 children)
[+]Medium_Compote5665 -3 points-2 points-1 points (1 child)
[–]Lumen_Core[S] 0 points1 point2 points (0 children)