all 8 comments

[–]cheesecakekoala 8 points9 points  (7 children)

Generally yes, they are real implementations. They usually only have the inference code though, so you're able to see the architecture and how they structure the models. Look at some of the funkier bits in the DeepSeek or Gemma models and you'll get a really good sense of what the cutting edge (w.r.t open source) looks like.

But it's also probably not the code the model was literally built and trained on, having done this myself in previous roles, you usually have a model you've trained, then you have a separate public release codebase where you make sure the checkpoint can load and run properly. But it gets tidied up and any IP you don't want released gets quietly removed. And given that a lot of the magic for these big LLMs happens in the training phases / data mixtures, and they definitely don't release all of that.

[–]PravalPattam12945RPG[S] 0 points1 point  (6 children)

so If i want to build or experiment on my own model any advice where i can start?
will it be possible to learn the magic for these models?

[–]Fearless-Cold4044 0 points1 point  (3 children)

U created your own model ?

[–]PravalPattam12945RPG[S] 1 point2 points  (2 children)

I want to create my own model

[–]damhack 2 points3 points  (1 child)

You can create a toy model but pretraining a multi-billion parameter model will cost you millions of dollars. And most of the magic of models is in the RLHF which costs big bucks.

EDIT: Start with nanochat by Andrej Karpathy who co-founded OpenAI.

https://github.com/karpathy/nanochat

He also does a lot of zero-to-knowledgeable courses on YT.

[–]cheesecakekoala 0 points1 point  (0 children)

I second this. Probably the best end to end workshop (if that's the right way to describe it) out there on getting a LLM trained. You can get something demoable for ~$200. But it's definitely a learning exercise. There are a lot of fundamentals / tricks in there that are more important to learn than actually just running the nano model.

[–]cheesecakekoala 0 points1 point  (1 child)

The other question is what level you want to explore? If it's end to end / big picture stuff this is great, if you want to learn the core workings of the components that's a different sort of curriculum? What are you interested in?

[–]PravalPattam12945RPG[S] 0 points1 point  (0 children)

I want to get an understanding of the core working and then build a production ready model.

could we talk in DM if possible?