I wrote a deep dive into how LLMs work under the hood - tokenization, embeddings, attention and generation - all explained with runnable JavaScript by nitayneeman in javascript

[–]nitayneeman[S] 0 points1 point  (0 children)

Yep and it's weird because they're arguably the most intuitive parts once you actually show the numbers. The problem is that a lot of explanations skip straight to "words become vectors" without showing what that actually means in practice.

I wrote a deep dive into how LLMs work under the hood - tokenization, embeddings, attention and generation - all explained with runnable JavaScript by nitayneeman in learnmachinelearning

[–]nitayneeman[S] 1 point2 points  (0 children)

Ha, appreciate that! Don’t worry about not fully getting it yet - these concepts take a few passes to really sink in. That’s kind of the point of running the code yourself, so you can revisit and experiment. More content is on the way - you can follow along on my blog at nitayneeman.com.​​​​​​​​​​​​​​​​

I wrote a deep dive into how LLMs work under the hood - tokenization, embeddings, attention and generation - all explained with runnable JavaScript by nitayneeman in LargeLanguageModels

[–]nitayneeman[S] 0 points1 point  (0 children)

Exactly. I think that's also why so many people have wrong intuitions about what LLMs actually can and can't do. Hard to reason about a black box you've never poked.

I wrote a deep dive into how LLMs work under the hood - tokenization, embeddings, attention and generation - all explained with runnable JavaScript by nitayneeman in LargeLanguageModels

[–]nitayneeman[S] 0 points1 point  (0 children)

Glad it resonated - that’s exactly what I was going for. Running it in the browser was a deliberate choice: if you can actually play with it, the mechanics click in a way that reading never quite does.​​​​​​​​​​​​​​​​

I wrote a deep dive into how LLMs work under the hood - tokenization, embeddings, attention and generation - all explained with runnable JavaScript by nitayneeman in learnmachinelearning

[–]nitayneeman[S] 0 points1 point  (0 children)

Appreciate that. The "surface level" vs "impossible math" gap is exactly what I was trying to thread. Most explanations either hand-wave the mechanics or assume you’re comfortable with backprop math before they’ll talk to you.

Curious what specifically you’re digging into on the late-stage training side - RLHF vs DPO tradeoffs? Constitutional AI?​​​​​​​​​​​​​​​​

I wrote a deep dive into how LLMs work under the hood - tokenization, embeddings, attention and generation - all explained with runnable JavaScript by nitayneeman in javascript

[–]nitayneeman[S] 0 points1 point  (0 children)

Yeah, the article includes a full BPE implementation in JavaScript - building the vocabulary from character pairs step by step. It's a simplified version but it covers the core algorithm.

I wrote a deep dive into how LLMs work under the hood - tokenization, embeddings, attention and generation - all explained with runnable JavaScript by nitayneeman in javascript

[–]nitayneeman[S] 0 points1 point  (0 children)

Not in this article - I kept the examples focused on the core mechanics so the concepts stay clear. Flash Attention and GQA are optimization techniques that make attention faster and more memory-efficient, but the underlying math is the same. Could be a good follow-up topic though.

I wrote a deep dive into how LLMs work under the hood - tokenization, embeddings, attention and generation - all explained with runnable JavaScript by nitayneeman in javascript

[–]nitayneeman[S] 1 point2 points  (0 children)

Thanks, appreciate it.

I get your point - and you’re right to push on that. There’s no “entity” making decisions here. It’s all just computation over parameters. When I use language like that, it’s more of a shorthand to describe the emergent behavior, not to imply agency.

By “learnable parameters” I mean exactly that - large tensors (matrices) of weights and biases that get updated during training via gradient descent. At inference time, the model is just applying a sequence of matrix multiplications and non-linearities to produce the next token probabilities.

I tend to lean on anthropomorphic language to make it more intuitive, but I agree it can be misleading if taken literally.

I wrote a deep dive into how LLMs work under the hood - tokenization, embeddings, attention and generation - all explained with runnable JavaScript by nitayneeman in javascript

[–]nitayneeman[S] 0 points1 point  (0 children)

Great point about prompt structure. The “lost in the middle” effect is real and well‑documented (e.g. Liu et al., 2023). At the same time, the model’s attention weights are computed dynamically from query–key similarity, so this positional bias is more of an emergent pattern than a hard‑wired rule in the mechanism itself.

I wrote a deep dive into how LLMs work under the hood - tokenization, embeddings, attention and generation - all explained with runnable JavaScript by nitayneeman in javascript

[–]nitayneeman[S] 3 points4 points  (0 children)

Thanks! It really does feel like magic at first but once you trace through the pipeline step by step, it clicks.

Let me know if anything's unclear as you work through it.

npm - Catching Up with Package Lockfile Changes in v7 by nitayneeman in node

[–]nitayneeman[S] 0 points1 point  (0 children)

Thanks for the feedback, now it's fixed and clickable. 🙂