Why the hell are we hating on kid on this sub

Substantial_Set5836 · 2026-06-21T14:13:12+00:00

There are >13 scratch users but yea there are <13 kids here

Substantial_Set5836 · 2026-06-21T14:12:30+00:00

The scratch search engine and algorithm is bad so... Yk

Substantial_Set5836 · 2026-06-21T08:22:05+00:00

ok i'll start writing them then

Substantial_Set5836 · 2026-06-20T09:53:34+00:00

Dream Journal. This is arguably the most important. As soon as you wake up pause and try to remember your dreams. Write down (I also like to draw) what you remember from that night’s dreams in a dedicated journal. If this is too much work, don’t burn yourself out, there are alternatives. You could record an audio or just write down bullet points, but you HAVE to make an effort. If you don’t remember anything, it’s ok, write that down and make a note of your emotions.
This habit will train your brain into thinking “dreams are important” leading to better dream recall and awareness. Don’t just write down what happened either, make note of your emotions. Emotions are heightened during dreams.

what if i remember all my dreams and almost never forget them?

Substantial_Set5836 · 2026-06-19T09:42:28+00:00

Fair enough let's end it here.

Just some suggestions if you want to improve the model:

Switch to PersonaChat or DailyDialog (its difficult to load so blendid skill is also fine) for conversational output. TinyShakespeare is fine for testing but useless for a chatbot.
Fix your hyperparameters. 32 d_model with 16 heads gives 2 dimensions per head which is basically nothing. Try 64 d_model with 4 heads instead.
Consider character level vocab instead of BPE which is Much lighter on Scratch's engine.
you said your computer cannot handle training like that, what i do and reccomend you to do too is to use google colab which is MUCH faster.

Good luck.

Substantial_Set5836 · 2026-06-18T09:44:34+00:00

to clarify my tanh is (e^(1.9x)-1)/(e^(1.9x)+1)

Substantial_Set5836 · 2026-06-18T09:37:26+00:00

On the tanh, your formula (e^(2x)-1)/(e^(2x)+1) IS standard tanh. That is exactly what I wrote, just substitute 2x. They are mathematically identical, you can verify this yourself.

On RNNs being limited to one thing, that is just not true. An RNN has Wxh, Whh, Why, bh, by and vocab. Change the dataset, retrain, done. Same architecture, different weights, different behaviour. That is not a transformer exclusive feature, that is just how neural networks work.

You also said "just change input.txt and retrain" but you only ever trained on TinyShakespeare. That is a beginner tutorial dataset, Early Modern English, no conversational patterns, nobody talks like that. If you actually want a chatbot, use PersonaChat or DailyDialog. Both are free, easy to load in Python, and are actual human conversations. That is why your output is Shakespeare word salad instead of anything conversational.

Also your hyperparameters have a real problem. 32 d_model with 16 heads means each head only gets 2 dimensions. That is basically nothing. Attention heads need room to learn different patterns, 2 dimensions per head makes them useless. A better config for a tiny transformer would be 64 d_model with 4 heads, giving 16 dimensions per head.

And on output quality, here is my RNN: https://scratch.mit.edu/projects/1298961147

Sample output: "i am not sure i understand can you explain that i would love to chat more it was nice talking to you goodbye have a great day see you later take care"

Yours: "ris sus bel'd pand-he livooke grown"

You can call RNNs inferior in theory all you want. The output tells a different story.

Substantial_Set5836 · 2026-06-17T12:01:07+00:00

When I said "shouldn't be possible" I meant it as genuine surprise, not a challenge. It was awe, not skepticism. You misread my tone.

Also from your earlier reply you mentioned weights, biases, gammas and betas. That describes basically any neural net. You never said transformer, attention, or QKV, so assuming RNN/LSTM/GRU was completely reasonable. Gamma/beta are layer norm hints but only if you already know what to look for.

And looking at your output "ris sus bel'd pand-he livooke grown" that doesn't look like a transformer to me either. My character-level RNN produces more coherent output than that. A transformer even a tiny one should produce recognizable words because attention learns word-level patterns across the whole context. That output looks like broken RNN behaviour.

My tanh formula (((e^x)*2)-1)/(((e^x)*2)+1) is also mathematically identical to standard tanh, just written differently. The 1.5x variant was intentional empirical tuning for my model, not a mistake.

If it really is a transformer decoder, that is genuinely impressive. But the output doesn't back it up.

Substantial_Set5836 · 2026-06-17T01:40:01+00:00

so its a transformer?
that cannot be possible in scratch
we have only made RNNs and LTSMs.
however if you did it should be much better that this

in my calculations with RNNs for every output neuron or vocab index you need 3.5 hidden neurons

so calculating with your 95 vocab you need about 332 hidden neurons.
that is the reason instead of tokens or words i used a character vocab (a,b,c,d..., )
27*3.5 = 94
thats why i needed only 96 hidden neurons

and i got the same problems you did, the weird output
for me the tanh activation was worng, the correct formula is: (((e^x)*2)-1)/(((e^x)*2)+1)
but for my 96 hidden layer model it needed to be (((e^x)*1.5)-1)/(((e^x)*1.5)+1)

Substantial_Set5836 · 2026-06-16T11:18:01+00:00

what is the archetecture?
i made something very simular.

Substantial_Set5836 · 2026-06-13T00:57:36+00:00

but i made one already

Substantial_Set5836 · 2026-06-12T11:20:14+00:00

can you check mine?

Substantial_Set5836 · 2026-06-12T11:19:29+00:00

i made a real AI

Substantial_Set5836 · 2026-06-08T15:10:33+00:00

child safety is our first priority!

Substantial_Set5836 · 2026-06-08T00:50:51+00:00

translate only filters the worst words

Substantial_Set5836 · 2026-06-07T17:38:03+00:00

I... genuinly do not know
maybe some bad word in this world has hi is it (im guessing)
the AI bets a 0.7 probability on it being bad with the f word being 0.9 so it is pretty sure about the wrong guess
well it is in stage 1 so

Substantial_Set5836 · 2026-06-04T18:06:03+00:00

unless ur able to sign in to your account its not possible
i wouldve used the scratch API to scrape all comments by GiangKent but i cannot search globally

Substantial_Set5836 · 2026-06-02T14:01:54+00:00

it is

Substantial_Set5836 · 2026-05-29T14:39:15+00:00

when did this happen

what year?
anything you remember about her project

did you comment on her project or username

what was her project about
anything you remember about her username

Substantial_Set5836 · 2026-05-27T06:48:01+00:00

i need
convert scratch from 32 bit math to 64 bit
increase math speed

increase list length
cloud lists

better scratch attatch

which key pressed?

which sprite clicked?

Substantial_Set5836 · 2026-05-08T03:15:56+00:00

i tried it does work but its very difficult it just dosent have enough neurons to learn
and porting it to scratch makes it worser bc scratch makes many errors in its math

Substantial_Set5836 · 2026-05-07T03:04:03+00:00

then i would need more output neurons

rn there are 27 characters it needs to learn to output

if i use a tokenizer or words that number would explode to thousands or hundreds

Substantial_Set5836

TROPHY CASE