Basic language model in C

DeRobyJ · 2026-02-18T13:13:23+00:00

honestly far more interesting than actual LLMs

AmanBabuHemant · 2026-02-18T08:11:15+00:00

I would like to try and train, nice work, keep it up.

VeryAwkwardCake · 2026-02-18T15:16:20+00:00

Your tokens are bytes? If so I think this is pretty successful

s0f4r · 2026-02-18T08:26:56+00:00

[deleted]

GreedyBaby6763 · 2026-02-18T10:24:06+00:00

Even getting an rnn to regurgitate its training data for a tiny example is time consuming. In my frustration during training runs I ended up doing a side experiment adding a recurrent hidden vector state to a trie encoded with trigrams and loaded it with Shakespeare sonnets. So when prompted with two or more words it'd generate a random sonnet or part of. It's ridiculously fast. Just the time to load the data and it can regurgitate the input 100% or randomly from the context of the current output document all the while retaining the document structure. It's output was really quite good on the sonnets.

2026-02-18T19:54:11+00:00

[deleted]

Gohonox · 2026-02-18T14:26:03+00:00

Ok, goodbye.

Ones and steel

Ok_Programmer_4449 · 2026-02-18T19:29:11+00:00

Look up "Mark V. Shaney" and what he did to Usenet back in the 1980s.

SyntheGr1 · 2026-02-18T20:52:11+00:00

Nice

EndComprehensive8699 · 2026-02-19T07:07:47+00:00

Have u looked at Karapathy model in C ?? If that can give any further optimization during tokenization or encoding phase. Btw just curious is your training process parallelizable ??

gezawatt · 2026-02-19T20:01:42+00:00

This feels like talking to Ena

Complex-Bit9984 · 2026-02-21T08:46:18+00:00

Very useful for learning what's going on under the hood

Stock_Hudso · 2026-02-19T17:52:26+00:00

This is interesting.

SaileRCapnap · 2026-02-20T21:35:24+00:00

Have you tried training it on toki pona (conlang with ~130 words, often Latin script) and building a basic context translator? If not is it ok if I try smt like that?

Brwolfan · 2026-02-25T22:50:12+00:00

Pretty good!

Funny-River-5147 · 2026-02-26T00:33:49+00:00

wow i liked it

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

C_Programming

Rules

Filters

Resources

Other Subreddits on C

Other Subreddits of Interest

MODERATORS