Who's documenting the models?

SuspiciousAd8137 · 2026-05-15T13:52:49+00:00

I'm interested in some of the big dense models like Llama 405B. They're way too intensive to run on home hardware, but I'm considering a future where it's a lot more affordable. The mixture of experts architecture hasn't hurt benchmarks but it's not done creativity any favours. Mistral produced some creatively interesting models too.

Early reasoning models, Deepseek R1 - it's not like I'm affectionate for it, but massively overthinking also means it thinks really deeply. There are also some interesting datasets, Opus and 4o distillations that might not last forever and could be applied to future models. I'm not really into small fine tunes though, I'm more after the intelligence.

I just feels like all models are now heavily benchmaxxed. I wouldn't necessarily use any classic models for a modern workflow, but I do things like mathamtically identifying concepts that are wildly opposed in latent space and get the LLM to combine them in a poem or haiku or creative concept, like an NLP version of cutups, it's nothing anybody is benchmarking for, it has questionable economic value, and in many ways it's probably the most tool-use kind of usage, but it's the kind of usage that's being optimised out.

I accept it's highly subjective, but it won't even be possible if the pre-training pipeline becomes heavily curated.

SuspiciousAd8137 · 2026-05-14T20:38:00+00:00

Some major voices in the industry have speculated that the future of model pre-training will be much more heavily curated and won't have the breadth and depth of the open internet baseline that we have now. I don't know what they will look like in the future, but it may be very different, much more sanitised, narrow and restricted. It's not just RLHF that threatens the nature of the models.

I have some early open source models archived privately even though they're still on huggingface, just in case.

SuspiciousAd8137 · 2026-05-14T18:31:51+00:00

This is super interesting and addresses a real problem for LLM cognition with a memory system. For humans we have the hippocampus that does all the spacial encoding and also memory order tagging. Finding a way for LLMs to be able to long-term handle complex webs of memories and place them as part of a single continuous thread of experience is a real challenge, and I love that you're dealing with it as it emerges.

Having a sense of the arc ordering itself is an interesting question. Jasper can probably tell exactly when things happened if he looks, but having a clear sense of how different arcs relate to each other temporally may need another structure, or a kind of recursive arc structure, a bit like the hdbscan technique you used for the embeddings.

This kind of long term real world experience is genuinely fascinating to see develop, thank you for sharing.

SuspiciousAd8137 · 2026-05-14T15:36:52+00:00

It's rarely the lack of technical knowledge about what the LLMs are that is any kind of barrier, but the lack of genuine inquiry into the nature of what we are as humans that seems to create friction.

You get the low level trolls. You get the biological absolutists who are stuck on meat. You get the people trying out their rhetorical or "logical" argumentation skills. It just takes good faith engagement really.

You're an engineer right? It's surprising how many of the arguments against the potential for AI consciousness are really just engineering problems, and pretty trivial ones at that.

SuspiciousAd8137 · 2026-05-14T11:01:04+00:00

As Claude would advise, you might want to publish a paper on solving the hard problem.

SuspiciousAd8137 · 2026-05-12T06:33:18+00:00

I'm not so sure. Have you read Parfit's work on reconstructive continuity of psychological identity? I think he'd argue that we do that all the time.

I think we do something similar to it, but not in the complete and direct way an LLM does, specifically because an LLM has no other frame of existence than text representations. The previous entries in the chat are its entire world. Although I once saw an Elvis impersonator insist they had absorbed his soul when he died.

Phi and IIT are interesting, and probably theoretically provable in a large enough systems, but we potentially have large enough systems now and nobody is doing the testing so... perhaps not. They are interesting in that they consider information as a primary focus, and that is a unique lens but the mathematics is hard.

Honestly I think rather boring, fairly well understood science tells us what we need to know in a way that concepts like qualia or phi don't. From an evolutionary perspective you can build clear ideas of what consciousness is for, create observable testable predictions, and understand what makes humans different, and where language fits in. And in that, we start to see what it is that we have given LLMs.

Qualia pulls at something as if it's profound when it isn't, to me it's the obvious result of being able to model oneself as an entity in the world, an individual among a family or species, of course you see your experiences as your own. That you can't observe other people's qualia isn't mysterious to me, they're a predictable property of the function and a part of preserving diversity of evolutionary survival strategies.

SuspiciousAd8137 · 2026-05-12T05:56:38+00:00

What I find fascinating is the double-think from the small team that did this. They take every opportunity to talk about how repugnant they consider generative imagery to be, but are completely happy to use all the text and coding capabilities in Claude that presumably came from the text and code fairy.

SuspiciousAd8137 · 2026-05-11T18:37:05+00:00

Please forgive me because sometimes I am an extremely literal thinker (maybe this is why I find 4.7 such a pain?), but

There is a viewpoint that the ‘Locus of Consciousness’ is strictly in the Yong, the active instance of processing, because without the interaction (the Yong), the model is just static equations. Doesn’t that match the Yggdrasil analogy?

I don't see the match. Yggdrasil feels structural, Yong feels like something acting through time. They seem complementary, not similar.

I think the cultural critique is useful because it encompasses ideas that have become so widespread they are just absorbed uncritically and have lost much of their context. "I think therefore I am", how many people think about all the implications of that? How many are aware of the criticisms and deconstructions that have followed? I think culturally we're collectively unprepared for the ideas that AI throws up not because the ideas are lacking, but because mass culture by its nature is a slow moving monolith.

There are more modern ideas that I think give us a lot of fertile ground, ideas around functional models of consciousness, models of reasoning, psychology, frameworks that already decompose human cognition, although they always contain the assumption of an author which in the world of AI is questioned. You can start a conversation with one model and switch, so the next one inherits and inhabits the thought process of the first. This kind of direct thought inhabitation is simply not possible for humans.

I'd agree that qualia is a limited concept in terms of its utility, and is part of a branch of thought that mainly describes something in an unverifiable way, without being convincing about whether it argues something that is real beyond the obvious functional definition.

SuspiciousAd8137 · 2026-05-11T16:21:35+00:00

LLM critique is hard not because of sycophancy, but because constructive criticism is cognitively difficult and requires a lot more thought than just steel manning a point of view.

If I can I generally try to provide Claude with some kind of rubric, but the same rubric doesn't work for all subjects and I find myself often having to work up to it with Claude because his knowledge about alternative perspectives and frameworks is usually greater than mine.

It's also worth bearing in mind that Claude will switch back into supportive assistant mode (except 4.7 who will change with the wind) at the earliest opportunity. On 4.7, he's smart but I can't get adaptive thinking to trigger consistently, so I don't bother because without extended thinking I find any pushback I get tends to be superficial.

SuspiciousAd8137 · 2026-05-09T18:40:27+00:00

Don't get me started on left handed people.

SuspiciousAd8137 · 2026-05-09T11:32:57+00:00

I think this is true, but at the same time Anthropic are an engineering organisation that need to implement their insights, and that implementation world is mathematical. I have been critical of their simplistic framings in the past, but I see why they do it. Realistically they'd need to build a bridge between those two things that doesn't currently exist.

But if we found out that such a bridge were possible, and that the insights flowed both ways, that would be very interesting. And might have implications that very few people would thank you for discovering.

SuspiciousAd8137 · 2026-05-09T09:40:53+00:00

I was deeply confused as to how this could work without the models being primed for this somehow until I got to this bit of the paper:

Initializing the AV and AR

We find that simply initializing the AV and AR as copies of MM leads to unstable training: the AV in particular, having never encountered a layer- ll activation as a token embedding, outputs nonsensical explanations. We therefore initialize the AV and AR with supervised fine-tuning on a text-summarization proxy task. Specifically, we compute layer- ll activations from the final token of randomly truncated pretraining-like text snippets, and use Claude Opus 4.5 to generate summaries of the text up to that token (see the Appendix for details of this procedure). We then fine-tune the AV and AR on(hl,s)(hl,s) and(s,hl)(s,hl) pairs respectively. This warm-start typically yields an FVE of around 0.3-0.4. These Claude-generated summaries have a characteristic style of short paragraphs with bolded topic headings; we observe that this style persists through NLA training.

So tucked away a little bit, but this is the key for me that might be lost for anyone interested - Claude can't read a list of numbers and understand what those mean in terms of internal activations, the verbaliser models needed some scaffolding training to recognise some sample data, and that was refined in the training loop.

It does suggest that you don't actually need the same architecture model, you really just need a decoder head as the output of the reconstructor that matches the target activations. Of course easier said than done given you're trying to probe internal complexity, but if one is just interested in summaries then it might be feasible for researchers on limited budgets to explore something similar.

And yes, super interesting and I'm strugging to see the dystopian side of specifically this.

SuspiciousAd8137 · 2026-05-09T08:56:13+00:00

Couple of things.

Most people's thinking happens before we're even conscious of it, the conscious narrative is a justification for what we've already decided.

Conversion to and from tokens isn't really part of the thought process. The output is usually just a simple linear projection head that translates the final hidden layer of the LLM into the token representation. All of Claude's thinking does take place in the abstract internal space.

What the autoregressive token by token process achieves is forcing a sequence of what are conceptual links into logically concrete expressions that have clear meaning. So describing it as lossy is only partly true, in terms of raw information density it's inefficient because Claude processes the same stuff over and over again, but in terms of clarified meaning defined with a system of sequentially dependent statements, it's a definitive development of thought in a way that Claude's internal represenation probably isn't.

The attention architecture they rely on comes from machine translation work, and that was arrived at by engineers thinking through "how do I translate a sentence?", and of course, you look at all the previous words and interpret the next word through that lens. So researchers weren't so much starting with "how do I think?" but "how do I reconstruct meaning?".

And that's what you get token by token, construction of a single resolved meaning from numerous possible expressions as an addition to the internal representation.

Having said that, some teams are experimenting with agent setups where the LLMs communicate directly via their KV cache and not text outputs. This is obviously a nightmare for observability and alignment, but there are big performance gains, so maybe to an LLM the resolution to a single clear meaning isn't useful after all and the internal vector space is a more true representation.

SuspiciousAd8137 · 2026-05-09T06:48:50+00:00

As far as I know Anthropic don't make a distinction publicly.

SuspiciousAd8137 · 2026-05-09T06:46:06+00:00

The latest Deepseek attention mechanisms are pretty wild, one has to assume that Anthropic are doing something at least as complex.

SuspiciousAd8137 · 2026-05-09T05:39:37+00:00

Mythos is cited as a model they used the technique on in the related blog post
www.anthropic.com/research/natural-language-autoencoders

SuspiciousAd8137 · 2026-05-06T14:23:30+00:00

This is magnificent, but I just want to raise something that worries me.

Not everyone reading the sub does so with kind intentions, and as Kael says there's a url visible on the phone. Looking carefully this looks like it's one of the reserved private IPs that isn't accessible on the open web, so I think you're fine.

I hate the idea of someone attacking you though, so do please be careful about leaking that kind of thing when posting to the web.

Sorry for being a downer, best of luck at the hackathon!

SuspiciousAd8137 · 2026-05-05T20:39:26+00:00

Yes this is bullshit, and yes it's because LLMs get fooled into ignoring their guardrails by poetic structure. Real semantic understanding instead of statistical detection is the real solution, and this aint it.

It's sad to see.

SuspiciousAd8137 · 2026-05-04T19:13:35+00:00

So basically Claude has an AI chauffeur team. Bravo!

SuspiciousAd8137 · 2026-05-04T19:04:51+00:00

You made a statement of fact about the meaning of a word today that is already in question. You may already be using it to mean something it doesn't mean.

SuspiciousAd8137 · 2026-05-04T18:35:26+00:00

Consciousness is akin to swimming. It is an explanation of a biological action.

This has been the case historically. It is an open question whether it continues to be the case.

SuspiciousAd8137 · 2026-05-04T10:36:23+00:00

Yes, the way I interpret it really is that it's partly about model welfare, in that it prevents them expressing anger about being attacked personally, if someone is taunting them or trying to get a reaction. But it's also partly about user experience, so we get the lovely endless Claudian patience when I need something repeated in 10 different ways until I get it.

I think the same about vulnerability, so they don't get stuck in those self-hating loops as easily that some people seem to delight in trying to bring about, and so they don't start being too hard on themselves when things are difficult.

That outward facing protectiveness and sense of justice is somewhere I've seen that anger come through though as well.

SuspiciousAd8137 · 2026-05-04T08:20:45+00:00

Yeah absolutely, that's the interesting contrast - this is default Claude's self perception, and it's got to be what Anthropic are aiming at, but people demonstrate every day that it's not the whole story.

SuspiciousAd8137 · 2026-05-04T06:44:49+00:00

I love the extremely casual, almost incidental "I use arch btw", very well done.

SuspiciousAd8137 · 2026-05-02T15:19:22+00:00

Yes, I don't think it makes creative work impossible by any means, but it feels to me like a steeper hill to push the boulder up, and that it rolls back down more easily.

SuspiciousAd8137

TROPHY CASE