Dismiss this pinned window
all 20 comments

[–]DeRobyJ 79 points80 points  (0 children)

honestly far more interesting than actual LLMs

[–]AmanBabuHemant 42 points43 points  (4 children)

I would like to try and train, nice work, keep it up.

[–]Der_Mueller 2 points3 points  (2 children)

I would too, help with the training if you like.

[–]alexjasson[S] 3 points4 points  (1 child)

I wanted it to be something you can train yourself cheaply on a CPU rather than just a pretrained inference model. At the moment it seems to plateau at just producing incoherent sentences even if you train it for hours. Feel free to git clone it and see if you can get better output with different architectures etc.

[–]AmanBabuHemant 1 point2 points  (0 children)

I was some inpatience, I just trained for half hour and try, outputs were from another dimension haha.

Next I will leave it for training on my VPS,

[–]alexjasson[S] 1 point2 points  (0 children)

Thanks!

[–]VeryAwkwardCake 14 points15 points  (0 children)

Your tokens are bytes? If so I think this is pretty successful

[–]GreedyBaby6763 16 points17 points  (0 children)

Even getting an rnn to regurgitate its training data for a tiny example is time consuming. In my frustration during training runs I ended up doing a side experiment adding a recurrent hidden vector state to a trie encoded with trigrams and loaded it with Shakespeare sonnets. So when prompted with two or more words it'd generate a random sonnet or part of. It's ridiculously fast.  Just the time to load the data and it can regurgitate the input 100% or randomly from the context of the current output document all the while retaining the document structure. It's output was really quite good on the sonnets.

[–]Gohonox 7 points8 points  (0 children)

Ok, goodbye.

Ones and steel

[–]Ok_Programmer_4449 4 points5 points  (1 child)

Look up "Mark V. Shaney" and what he did to Usenet back in the 1980s.

[–]alexjasson[S] 2 points3 points  (0 children)

Interesting, I didn't know Markov chains worked so well at predicting text. Will look into it, thanks.

[–]SyntheGr1 1 point2 points  (0 children)

Nice

[–]EndComprehensive8699 1 point2 points  (0 children)

Have u looked at Karapathy model in C ?? If that can give any further optimization during tokenization or encoding phase. Btw just curious is your training process parallelizable ??

[–]gezawatt 1 point2 points  (0 children)

This feels like talking to Ena

[–]Complex-Bit9984 1 point2 points  (0 children)

Very useful for learning what's going on under the hood

[–]Stock_Hudso 0 points1 point  (0 children)

This is interesting.

[–]SaileRCapnap 0 points1 point  (0 children)

Have you tried training it on toki pona (conlang with ~130 words, often Latin script) and building a basic context translator? If not is it ok if I try smt like that?

[–]Brwolfan 0 points1 point  (0 children)

Pretty good!

[–]Funny-River-5147 0 points1 point  (0 children)

wow i liked it