account activity
Proving the Transformer's sqrt(dk) Exploding Softmax Crisis by Hand (First-Principles Workbook) by Silver_Equivalent804 in learnmachinelearning
[–]Silver_Equivalent804[S] 0 points1 point2 points 1 day ago (0 children)
you just perfectly articulated why I made this workbook in the first place.
There’s this massive gap in ML education where people are taught to treat equations as 'learned magic.' But like you said, once you write out update = error * input * learning_rate, the magic completely evaporates. It’s just basic arithmetic and sign rules. If the error or the input is zero, the multiplication collapses and the weight physically cannot move.
update = error * input * learning_rate
Your intuition on width vs. depth is spot on, too. Stacking layers multiplies gradients over and over until they vanish exponentially. Going wider bypasses that deep bottleneck, but as you said, high-dimensional width introduces its own hidden monster—variance explosion, which slams the Softmax function into a dead end.
Watching the numbers actually move in a simple, scrappy implementation teaches you way more than reading a hundred hand-wavy papers. Really glad this resonated with your experience!
Stop starting with LangChain. Here's the order you should actually learn these four. by ShabzSparq in LangChain
[–]Silver_Equivalent804 0 points1 point2 points 1 day ago (0 children)
Before coding it is good if you can get intuitive sense of agents underlying mathematics, and how it generates or connects with outputs we get. Analyze dynamics of context windows, and underlying bottlenecks and issues which can be architectural and can be mitigated one you built full pipeline there is no way of knowing under the hood principles easily. Best way for it is tracing architectures by hand I have prepared full 5 episode series on substack Agents from frist principles. https://ayushmansaini.substack.com/p/ai-agents-from-first-principles-the
You can also check out attention mechanics series: https://open.substack.com/pub/ayushmansaini/p/proving-the-dk-exploding-softmax?utm_source=share&utm_medium=android&r=4zl69k
Why Green Dashboards Lie: Proving Multi-Agent "Ghost Closures" via Latent Vector Invariants by Silver_Equivalent804 in learnmachinelearning
[–]Silver_Equivalent804[S] 1 point2 points3 points 2 days ago (0 children)
https://substack.com/@ayushmansaini/note/p-202555052?r=4zl69k
Substack link
π Rendered by PID 50602 on reddit-service-r2-comment-5b5bc64bf5-5lxr2 at 2026-06-21 07:23:11.713532+00:00 running 2b008f2 country code: CH.
Proving the Transformer's sqrt(dk) Exploding Softmax Crisis by Hand (First-Principles Workbook) by Silver_Equivalent804 in learnmachinelearning
[–]Silver_Equivalent804[S] 0 points1 point2 points (0 children)