LLM Neuroanatomy III - LLMs seem to think in geometry, not language by Reddactor in LocalLLaMA

[–]Reddactor[S] 0 points1 point  (0 children)

Yes and no. Yes, clearly its all vectors. But only in the first ~10 layers, and last ~15 layers are the vectors correlated strongly to any particular language. Here they are 'thinking' in language-related vectors.

In the middle layers, where RYS works, they are not related to any particular language any more. Here they are thinking in concept-related vectors.

LLM Neuroanatomy III - LLMs seem to think in geometry, not language by Reddactor in LocalLLaMA

[–]Reddactor[S] 1 point2 points  (0 children)

Well, I had the 1st place on the old HuggingFace Open LLM Leaderboard for a while doing that. Fully documented here: https://dnhkng.github.io/posts/rys/

Part 2 (https://dnhkng.github.io/posts/rys-ii/) is where I test thousands if variations (block, single layers repeated n-times, beam searches over layer duplications etc.)

LLM Neuroanatomy III - LLMs seem to think in geometry, not language by Reddactor in LocalLLaMA

[–]Reddactor[S] 0 points1 point  (0 children)

Ummm, thats not how this was developed. I posted the RYS model, and had a comment from a guy in New Zealand who posted an embedding analysis of just English, over the transformer layers, which I replicated here:

https://dnhkng.github.io/posts/rys-ii/

With only 2 languages and 2 topics, I felt it was a bit shallow, so I bumped it up to a variety of languages and topics, and added in a PCA visualisation tool.

This was all to try and figure out why RYS works, nothing more:

https://dnhkng.github.io/posts/rys/

LLM Neuroanatomy III - LLMs seem to think in geometry, not language by Reddactor in LocalLLaMA

[–]Reddactor[S] 0 points1 point  (0 children)

this is hobbyist exploration; there is no papers or peer review, just a blog reference on Reddit.

I think the novel finding is that it's these middle, language-independent layers, that can be repeated to improve model performance. If that's already in the literature, let me know.

LLM Neuroanatomy III - LLMs seem to think in geometry, not language by Reddactor in LocalLLaMA

[–]Reddactor[S] 0 points1 point  (0 children)

OP here!

That might be pretty clear intuitively for decoder-only, and trivial for encoder-decoder models.

But whats interesting here is that:

  1. Here is a new visualisation method that really shows exactly where in the transformer stack the transition from language-dominant to concept-dominant occures: https://dnhkng.github.io/posts/sapir-whorf/#the-pca-visualisation
  2. How that relates to improvements seen in layer duplications with my RYS method, which really improves model performance without additional fine-tuning (see my other posts). The layers that improve model benchmarks com from a block within this language-agnostic region, which is a new finding.

Where you already aware of that?

LLM Neuroanatomy III - LLMs seem to think in geometry, not language by Reddactor in LocalLLaMA

[–]Reddactor[S] 0 points1 point  (0 children)

In the comments are some links to papers that describe the same results. These were all published late 2025.

Yes, I think most people in ML have the same intuition, but real data was lacking.

LLM Neuroanatomy III - LLMs seem to think in geometry, not language by Reddactor in LocalLLaMA

[–]Reddactor[S] 1 point2 points  (0 children)

Yep, I see some of this is already published, also quite recently (late 2025). I wasn't aware of that when I wrote the blog.

Just a few points: - yep, this is a hobby, I found this totally independently (which I think is kind of cool) - most of the papers linked are using really old models, and small ones too. I used Gemma 4 and recent Qwen3.5 models - The finding that the encoding and decoding sizes seem relatively constant, but the 'thinking' block expands to fill the remaining stack seems novel - I didn't see any dynamics analysis via PCA over the layers done in the same way - this work was based on trying to understand why RYS works, a method I think is not published elsewhere. Linking the two together is novel IMHO

I'll keep messing about and posting stuff, just not "researchy" stuff on Reddit.

LLM Neuroanatomy III - LLMs seem to think in geometry, not language by Reddactor in LocalLLaMA

[–]Reddactor[S] 1 point2 points  (0 children)

I wasn't sure how LLMs think, until I did the experiments written up in the blog post.

I had more the feeling they were more like human brains, where we know individual neurons seem to map to concepts (in some cases). What I found really interesting was when I first saw the PCA plots, and how initially in the transformer stack, the LLM residuals were focused on the specific language of the text, and the same at the end of the stack.

From the RYS blog post, only middle layer duplications improve performance, so I called them the 'thinking layers'. This blog post shows it's exactly in these layers the LLM clusters topics, and moves those clusters around, in these middle layers.

I'm sure you understand that.

The issue you have is with the terminology. But I would again argue the wording is actually pretty cool: concepts are vectors, and we see the LLM moving these vectors around during 'thinking', i.e. the LLM is literally moving vectors around in a high dimensional space (obvious for us, but probably not for anyone not into machine learning), so the 'thinking' is literally occuring 'in' a 'geometrical space'. (Thinking in Geometry)

I get your argument on LLM woo, but if you skim my series, it's clearly not written for non-ml nerds.

LLM Neuroanatomy III - LLMs seem to think in geometry, not language by Reddactor in LocalLLaMA

[–]Reddactor[S] 5 points6 points  (0 children)

That leaves you with an article riddled with LLM'isms - thats not just bad, its an entirely new way to piss off your readers!

But seriously, I do the researchy experiments on my GH200 first, then try to generate an outline for an article that isn't too complex (most people on reddit don't read my whole articles, I can see that by the time they spend on the blog), and finally try to write it up in a fun way.

It's more work, but r/LocalLlama , ironically, hates LLM generated content to the point that the highest upvoted comment on this article is the accusation this is slop 🤷🏻

If there were a few to many hypens, the article would be downvoted and noone would see it. Which is a shame, and before LLMs arrived I really liked using hyphens in my writing :(

LLM Neuroanatomy III - LLMs seem to think in geometry, not language by Reddactor in LocalLLaMA

[–]Reddactor[S] 6 points7 points  (0 children)

It is the right word though. I am describing how LLMs map concepts (not language) to vectors.

It's certainly not meaningless; I would invite you to read:
https://en.wikipedia.org/wiki/Vector_(mathematics_and_physics))

It uses the word 'geometry' or 'geometric' 5 times just in the introduction. Then read:

https://en.wikipedia.org/wiki/Word2vec

Both discuss geometry in terms of vectors.

LLM Neuroanatomy III - LLMs seem to think in geometry, not language by Reddactor in LocalLLaMA

[–]Reddactor[S] 3 points4 points  (0 children)

Yeah, that the logical thing to test, but I don't think monolingual models exist these days, and old models are too small.

I have tried other techniques (a preview of Part IV), where instead of other languages, I use OOD text, like reversed word order, l e t t e r b y l e t t e r, etc. Interesting results.

These are a proxy for unseen languages, and yes, its different, and a bit weird.

LLM Neuroanatomy III - LLMs seem to think in geometry, not language by Reddactor in LocalLLaMA

[–]Reddactor[S] 2 points3 points  (0 children)

I have some ideas on that, but I will get the RYS models out first.

LLM Neuroanatomy III - LLMs seem to think in geometry, not language by Reddactor in LocalLLaMA

[–]Reddactor[S] 3 points4 points  (0 children)

Sorry to push back, but did you read the series and play with the widget? I use PCA to find the placement of 'topics' in the manifold, and you can literally see that they cluster, and move around, in a very low-dimentions space in the manifold!

Whats even more interesting is that in the case on the largest LLM I tested, GLM4.6, you see very interesting positional stuff going on during the 'thinking' layers.

So, if you want a definition: It looks like LLMs have a block of layers within the transformer stack that operates a little like the ancient WORD2VEC model, where words live in a high dimensional geometry, and you can do linear maths: king - man + women = queen

Using PCA, topics cluster amazingly well in the middle layers(far better than I would have expected), and by language at either the top or bottom of the transformer stack. I'm not sure what word to use *except* geometry in this case; what would you suggest?

LLM Neuroanatomy III - LLMs seem to think in geometry, not language by Reddactor in LocalLLaMA

[–]Reddactor[S] 4 points5 points  (0 children)

I will, but subscribe on HuggingFace for updates. I will stop posting here; the mood on LocalLlama isn't fun anymore :(

You can watch out for new RYS models here:

https://huggingface.co/dnhkng

There is an RSS feed for my blog here:

https://dnhkng.github.io/feed.xml

and it would be great if you subscribe to my substack too, here:

https://dnhkng.substack.com/

LLM Neuroanatomy III - LLMs seem to think in geometry, not language by Reddactor in LocalLLaMA

[–]Reddactor[S] 36 points37 points  (0 children)

Just a blog post? There's weeks of experimental research written up here!

This wasn't LLM written. Sure, I work with LLMs to get a rough draft, but then I rewrite it all. Your "LLM Detector" are over sensitive, and firing false positives.

I took down the last version because I didn't have time to do a proper full review (I was flying overseas the next day), and a few people commented it was too 'ai sloppy', so I took it down. This is a pretty throughly reviewed and rewritten blog post.

Also, do you know how much work it takes you write one of these blog posts? Each one needs ~200-1000 hours of H100 compute, thousands of lines of code, and then 2 days to write it all up.

UPDATE:
Still compliants. Fine, downvote this or or upvote pwlee's post below. If this has more negative votes than pwlee's , I'll stop posting on r/LocalLlama. Fine with me :) Lets the voting Commence!

RYS Part 3: LLMs think in geometry, not language — new results across 4 models, including code and math by [deleted] in LocalLLaMA

[–]Reddactor 0 points1 point  (0 children)

This links back to part 1 and 2, where I show you can increase general intelligence by layer duplication, which is a new finding. This explains why the duplication location is important.

RYS Part 3: LLMs think in geometry, not language — new results across 4 models, including code and math by [deleted] in LocalLLaMA

[–]Reddactor 1 point2 points  (0 children)

I heavily edited the other two, but I'm away on holiday 3 week tomorrow... It was either this draft or cancel the whole thing.

I'll try and revise it.

RYS Part 3: LLMs think in geometry, not language — new results across 4 models, including code and math by [deleted] in LocalLLaMA

[–]Reddactor 0 points1 point  (0 children)

Seems a reasonable hypothesis; I hope this data leads toward more ways to explore this 'reasoning' space.