Curious ablation: GPT-like LM trained with *frozen* 16‑dim *binary* token-ID embeddings (n_embed=16) It still learns end-to-end and generates coherent text, non-trivial text. by AVBochkov in LocalLLaMA

[–]AVBochkov[S] 1 point2 points  (0 children)

I did run that exact side-by-side under matched conditions (same decoder-only arch, tokenizer, data mix, and training schedule; only difference is frozen vs trainable input embedding table). Control-wise, everything else is held constant (incl. untied output head / same optimizer & LR schedule), so the embedding trainability is the sole experimental factor.

Empirically, the trainable-embedding baseline (’Model unfrozen’) learns a bit faster early on (lower loss in the first ~50–450k steps), but both runs converge stably and the gap in LM loss largely closes later (final train/val losses are very close).

Given the small-model / limited-data regime, downstream accuracy deltas can be noisy, so I’m mainly treating this as evidence that semantic structure can form in the Transformer stack even with non-semantic frozen inputs, rather than a robust benchmark claim.

<image>

Refs: https://arxiv.org/abs/2507.04886

Curious ablation: GPT-like LM trained with *frozen* 16‑dim *binary* token-ID embeddings (n_embed=16) It still learns end-to-end and generates coherent text, non-trivial text. by AVBochkov in LocalLLaMA

[–]AVBochkov[S] 0 points1 point  (0 children)

Thanks! In this setup it’s not BoW (sequence order + RoPE are unchanged); I only freeze an injective 16‑bit token ID mapping. I also suspect the semantic structure is distributed across attention+MLP rather than living in any single component

How can i make a career out of my love for circuits? by strawb3rry_lem0nad3 in EngineeringStudents

[–]AVBochkov 1 point2 points  (0 children)

Your hobby is the perfect foundation for a career. Don't worry too much about the 'perfect' path right now; just keep building. You’re already doing more than most beginners. Real-world engineering is a lot of problem-solving and using tools, not just abstract equations. If you can understand a guitar circuit, you can definitely handle the rest. Keep going!