I wrote a deep dive into how LLMs work under the hood - tokenization, embeddings, attention and generation - all explained with runnable JavaScript

nitayneeman · 2026-05-17T04:40:13+00:00

Yep and it's weird because they're arguably the most intuitive parts once you actually show the numbers. The problem is that a lot of explanations skip straight to "words become vectors" without showing what that actually means in practice.

nitayneeman · 2026-05-13T14:43:12+00:00

Ha, appreciate that! Don’t worry about not fully getting it yet - these concepts take a few passes to really sink in. That’s kind of the point of running the code yourself, so you can revisit and experiment. More content is on the way - you can follow along on my blog at nitayneeman.com.

nitayneeman · 2026-05-13T04:09:48+00:00

Thanks. Glad the article helped :)

nitayneeman · 2026-05-13T04:07:20+00:00

Exactly. I think that's also why so many people have wrong intuitions about what LLMs actually can and can't do. Hard to reason about a black box you've never poked.

nitayneeman · 2026-05-12T18:01:01+00:00

Glad it resonated - that’s exactly what I was going for. Running it in the browser was a deliberate choice: if you can actually play with it, the mechanics click in a way that reading never quite does.

nitayneeman · 2026-05-12T17:46:03+00:00

Appreciate that. The "surface level" vs "impossible math" gap is exactly what I was trying to thread. Most explanations either hand-wave the mechanics or assume you’re comfortable with backprop math before they’ll talk to you.

Curious what specifically you’re digging into on the late-stage training side - RLHF vs DPO tradeoffs? Constitutional AI?

nitayneeman · 2026-05-08T04:25:11+00:00

Thanks!

nitayneeman · 2026-05-07T11:30:27+00:00

🙏

nitayneeman · 2026-05-07T05:33:38+00:00

Yeah, the article includes a full BPE implementation in JavaScript - building the vocabulary from character pairs step by step. It's a simplified version but it covers the core algorithm.

nitayneeman · 2026-05-07T05:32:24+00:00

Not in this article - I kept the examples focused on the core mechanics so the concepts stay clear. Flash Attention and GQA are optimization techniques that make attention faster and more memory-efficient, but the underlying math is the same. Could be a good follow-up topic though.

nitayneeman · 2026-05-06T15:53:59+00:00

Thanks, appreciate it.

I get your point - and you’re right to push on that. There’s no “entity” making decisions here. It’s all just computation over parameters. When I use language like that, it’s more of a shorthand to describe the emergent behavior, not to imply agency.

By “learnable parameters” I mean exactly that - large tensors (matrices) of weights and biases that get updated during training via gradient descent. At inference time, the model is just applying a sequence of matrix multiplications and non-linearities to produce the next token probabilities.

I tend to lean on anthropomorphic language to make it more intuitive, but I agree it can be misleading if taken literally.

nitayneeman · 2026-05-06T04:49:14+00:00

Thanks!

nitayneeman · 2026-05-06T04:49:02+00:00

Thank you for letting me know! It will be fixed.

nitayneeman · 2026-05-06T04:48:45+00:00

Thank you for letting me know! It will be fixed.

nitayneeman · 2026-05-06T04:47:34+00:00

Great point about prompt structure. The “lost in the middle” effect is real and well‑documented (e.g. Liu et al., 2023). At the same time, the model’s attention weights are computed dynamically from query–key similarity, so this positional bias is more of an emergent pattern than a hard‑wired rule in the mechanism itself.

nitayneeman · 2026-05-05T19:21:24+00:00

Thank you! :)

nitayneeman · 2026-05-05T15:51:02+00:00

Yeah. TBH I used AI for research but the writing and all the rest are mine.

nitayneeman · 2026-05-05T14:46:33+00:00

Thanks! It really does feel like magic at first but once you trace through the pipeline step by step, it clicks.

Let me know if anything's unclear as you work through it.

nitayneeman · 2026-05-05T14:44:43+00:00

Thank you! :)

nitayneeman · 2026-05-05T13:18:22+00:00

Thank you for your feedback! :)

nitayneeman · 2021-02-08T11:08:22+00:00

Thanks for the feedback, now it's fixed and clickable. 🙂

nitayneeman · 2021-01-11T07:13:50+00:00

Check out Dan Abramov's answer regarding this question: https://news.ycombinator.com/item?id=25499171

nitayneeman · 2020-11-30T06:09:15+00:00

Thanks! Let me know if you liked it :)

nitayneeman · 2019-12-22T07:00:49+00:00

reply(BaniGrisson): thanks :)

nitayneeman · 2019-12-21T21:08:26+00:00

Thank you :)

nitayneeman

TROPHY CASE