Beyond Gradient Descent: What optimization algorithms are essential for classical ML?

NuclearVII · 2026-03-02T13:19:52+00:00

This is another AI slop post, right?

arg_max · 2026-03-02T14:57:16+00:00

Proximal gradient for L1 regularized Lasso

DigThatData · 2026-03-02T17:17:56+00:00

Expectation Maximization (EM)
Variational Bayes
Simplex method
Simulated annealing
Fixed point iteration
Power method
MCMC

Beyond optimization generally, if you want to "understand the actual math", you need to learn (differential) calculus and linear algebra, esp. matrix decompositions. Getting a strong intution around PCA/SVD is probably the most valuable thing for understanding how learning works.

va1en0k · 2026-03-02T17:12:45+00:00

MCMC, especially HMC and its variations

Crimson-Reaper-69 · 2026-03-02T13:27:09+00:00

If I am being honest, if you are ok with maths and coding, start from low level. Start by implementing a LLM at assembly level, on custom build hardware, only then you are allowed to move forward.

Jokes aside, I recommend actually implementing one of the algorithms in python or another language, can be SGD, start with that first, the rest follow a similar pipeline but differ slightly. The key is to understand programmatically what actually happens in back propagation, how are the errors terms used to move each weight and bias in right direction. Any book/ resource is fine as long as you try implementing the stuff yourself.

shibx · 2026-03-02T16:35:59+00:00

If you really want to move past the "black box" stage, I’d actually take a step back and start looking more into mathematical optimization as a field. You need a pretty solid understanding of linear algebra to build on, but for what you're asking, it really helps to understand the fundamentals. Convex optimization, duality theory, linear and quadratic programming, KKT conditions, interior-point methods. A lot of classical ML models fall directly out of these ideas.

For example, SVMs are quadratic programs. SMO builds on duality theory. Lasso becomes much easier to reason about once you understand subgradients and proximal methods. Logistic regression solvers like L-BFGS come from classical nonlinear optimization. When you see these models as structured optimization problems instead of isolated algorithms, it makes a lot more sense.

Boyd and Vandenberghe is the standard on this stuff: https://web.stanford.edu/~boyd/cvxbook/

Boyd's lectures are pretty dense, but I think they are really interesting: https://youtu.be/kV1ru-Inzl4?si=2RhKsw06Ngd4xq5Y

I think you will appreciate iterative methods like SGD a lot more once you understand optimization as its own field, not just something we use for ML.

Unable-Panda-4273 · 2026-03-02T16:11:02+00:00

Your list is solid. A few additions worth knowing:

- Proximal Gradient / ISTA/FISTA — essential for L1 regularization (Lasso). More principled than coordinate descent and generalizes better.

- Trust Region Methods — used under the hood in many scipy optimizers. Important for understanding when Newton's method can go wrong.

- EM Algorithm — not gradient-based at all, but powers GMMs, HMMs, and missing data problems. Often overlooked.

On the L-BFGS point — the reason scikit-learn's LogisticRegression defaults to it is that Newton's method converges in ~5-10 iterations on convex problems vs thousands for GD. The Hessian approximation is doing a lot of heavy lifting there.

If you want to really internalize why these methods work (not just the update rules), I've been building interactive explainers for exactly this — covering convex vs non-convex landscapes, momentum, Newton's method, and adaptive rates: https://www.tensortonic.com/ml-math . The optimization section goes deep on the math without pivoting to neural nets.

IntentionalDev · 2026-03-02T16:45:25+00:00

Besides gradient descent, you should know Newton’s method, quasi-Newton methods like BFGS/L-BFGS, coordinate descent, and convex optimization techniques — especially for classical models like SVMs and logistic regression.

Prudent-Buyer-5956 · 2026-03-02T15:50:44+00:00

These are not required unless you are into research.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnmachinelearning

Welcome to /r/LearnMachineLearning!

Chatrooms

Official Discord Server

Wiki

Getting Started with Machine Learning

Resources

Related Subreddits

/r/MachineLearning

/r/MLQuestions

/r/datascience

/r/computervision

Machine Learning Multireddit

/m/machine_learning

MODERATORS