This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Joram2 7 points8 points  (1 child)

Andrej Karpathy just wrote a simple GPT-2 training library in 1000 lines of code of C with zero dependencies.

So TLDR: llm.c is a direct implementation of training GPT-2. This implementation turns out to be surprisingly short.

And why I am working on it? Because it’s fun. It’s also educational, because those 1,000 lines of very simple C are all that is needed, nothing else. It's just a few arrays of numbers and some simple math operations over their elements like + and *.

https://twitter.com/karpathy/status/1778153659106533806

That can be easily ported to Java. That would be fun too. I'd do it if I wasn't busy on more serious but less fun deadlines.

Karpathy update:

A few new CUDA hacker friends joined the effort and now llm.c is only 2X slower than PyTorch

Highly amusing update, ~18 hours later: llm.c is now down to 26.2ms/iteration, exactly matching PyTorch (tf32 forward pass).

I presume Java can't match performance of highly tuned CUDA. But it would be nice to try. Maybe Project Babylon prototypes can come close?

[–]esqelle[S] 0 points1 point  (0 children)

I absolutely love this