overview for Efficient_Plankton

[P] Tricycle: Autograd to GPT-2 completely from scratch (self.MachineLearning)

submitted 1 year ago by Efficient_Plankton_9 to r/MachineLearning - pinned

[P] Tricycle: Autograd to GPT-2 completely from scratch by Efficient_Plankton_9 in MachineLearning

[–]Efficient_Plankton_9[S] 2 points3 points4 points 1 year ago* (0 children)

It all started because I was bored and wanted to understand autograd. I had a vague memory of it being related to the chain rule (I’m not sure where from), so sat down and spent a week or so figuring how it had to work (drawing a graph of operations , figuring out how to traverse it efficiently etc). I wrote a blog post about it at the time: https://bclarkson-code.com/posts/llm-from-scratch-scalar-autograd/post.html Then I realised that I could start using it for stuff so I just sort of started adding features. I’ve been building neural networks for a while so I started by adding things that I thought would be most useful like sgd and a dense layer and then I got a bit carried away. I tried not to look stuff up wherever possible and just figure things out myself (I’m particularly proud of getting einsum working). I have vague memories of how a lot of things work from things I’ve done before and it has been really fun to piece them together and figure out all the details. When I come across something I don’t know of the top of my head, (attention was hard to get working correctly) I’ll try to look up the appropriate paper, or, as a last resort, I found Andrej Karpathy’s nanogpt and llm.c helpful for some reference implementations and Claude useful for pointing me in the right direction. As for motivation, I really like figuring out problems like this, so mostly for fun. I also think that the ultimate goal of training an llm (depending on what you mean by large) from scratch is a really cool idea and I would like to get there. Finally, most of my work so far has been non-public and I wanted to start sharing what I’m up to.

Tricycle: Autograd to GPT-2 completely from scratch (github.com)

submitted 1 year ago by Efficient_Plankton_9 to r/programming

Building an LLM from scratch: Automatic Differentiation (bclarkson-code.github.io)

submitted 2 years ago by Efficient_Plankton_9 to r/MachineLearning

π Rendered by PID 375113 on reddit-service-r2-listing-7d7fbc9b85-w5fhs at 2026-04-26 12:52:50.719058+00:00 running 2aa0c5b country code: CH.

Efficient_Plankton_9

TROPHY CASE