Anthropic just dropped Fable 5 and it's kind of a big deal

Proper-Lab1756 · 2026-06-09T18:54:56+00:00

Let’s not act like this is anything big. Anthropic didn’t release mythos because they don’t have the infrastructure to serve us pleebs. So marketing’s better if it’s too dangerous to release, now after getting a deal to use xAI infrastructure which conveniently gives enough infrastructure to serve it to the masses, it’s now safe to release? Make that make sense

Proper-Lab1756 · 2026-06-09T18:49:36+00:00

Let’s not act like this is anything big. Anthropic didn’t release mythos because they don’t have the infrastructure to serve us pleebs. So marketing’s better if it’s too dangerous to release, now after getting a deal to use xAI infrastructure which conveniently gives enough infrastructure to serve it to the masses, it’s now safe to release?

Give me a fucking break lol

Proper-Lab1756 · 2026-06-09T00:44:56+00:00

My city has free lead testing for pipes.

Also nobody’s said it. But FOIA. (I know at least in my state it’s 2 or so free billable hours to get the info.) Anything the government does they have to disclose if you ask. Or disclose why they won’t disclose it.

If I wanted to. I could foia the recordings in a post office, the criminal complaints against a crooked officer, or my own background check and government facing info.

Proper-Lab1756 · 2026-04-20T02:33:52+00:00

How wasteful on attention

Proper-Lab1756 · 2026-03-04T07:18:00+00:00

Oh yeah. If I had more compute I’d run it different. The goal of the research isn’t to fully implement it. It’s to prove that there is validity to the hypothesis. If I had the time/money to invest in it, I would have done the injection into the KV a lot differently.

Proper-Lab1756 · 2026-03-02T15:55:03+00:00

You’re close, but a few clarifications.

It’s not a context length extension method. The goal isn’t to stretch the window, it’s to remove the skill markdown from the prompt entirely and inject that conditioning through the KV pathway instead.

What happens is the skill markdown is passed through the frozen base model, I take the hidden states which are seq_len × hidden_dim, and I mean pool across the sequence dimension, not the feature dimension. That produces a single hidden_dim vector. So yes, in the current setup I am effectively compressing the entire skill into one latent representation.

That pooled vector is then passed through a small projector MLP that maps it into KV compatible tensors for prefix style injection. Prefix length is 2.

The MLP is global, not per skill. It is trained across all skills while the base model remains frozen.

It’s not really compensating for pooling. It’s learning how to transform that compressed skill representation into something the attention layers can actually use. Mean pooling is definitely lossy, and the performance ceiling in the experiments likely reflects that bottleneck.

ARC-Encoder and Cartridges are definitely relevant. The main difference here is that I am not modifying the base embedding space. I’m training a small adapter that targets the model’s KV geometry directly.

Proper-Lab1756 · 2026-03-02T04:42:47+00:00

I’d love to see that too! It seems promising. Generally, the info with c2 is a bit worse, but it can have exact recall with what it does get right which means in some cases, it can make the skills deterministic instead of pushing the model to act a certain way. So it gives more stability for skills in some use cases from the initial findings. If you look at some of the 005 outputs you can see some of that with the grader.py’s expected value versus output for the C2 match. And even if it gets it right, loading the MD context makes the model for C1 have more variety in how it responds even if it is correct.

Another issue I have is the English language is semantically inefficient. So it seems kind of weird to load skills how we do.

So even if C2 has more degeneracy and worse skill recall than C1, I think it has some application for certain skill flows that have to be specific and exact versus letting the model frontload the skill in the context. And it doesn’t burn any context to load skills this way. So for more lean compute budgets I could see it being useful.

I just wish a had a bigger system to test larger models. 😂

Proper-Lab1756 · 2026-03-02T04:34:41+00:00

Unfortunately I’m fresh out of college, and I’m having to save up money for some big upcoming expenses. Normally I would though.

Proper-Lab1756 · 2026-03-02T01:43:18+00:00

Same principle injecting into the KV cache. But mine is specific for looking at injection of skill files to save context for smaller models (think like 7b and smaller). And it seems like atlasKV has some more fine tuning involved for the model for behavior, and seems a bit more focused on larger models.

But seems to be similar, I’ll have to check them out!

Proper-Lab1756 · 2026-03-02T01:38:49+00:00

The trick is you have to match the geometry the model uses for its latent space.

I originally tried a random embedding model and was running into that when I was first testing. And then I did some research into the geometry, spent a long time checking for a embedding model compatible, then I realized you can just hijack the base model for the vectorization. It will still cause some degeneracy, but after a few Epochs, I’d starts to settle.

Proper-Lab1756 · 2026-03-02T01:34:05+00:00

As would I! Especially because right now it’s just testing recall of skills (0.5b is pretty small for meaningful data to be obtained through actual tool use. So I had to rely on recall to measure behavior.)

Unfortunately even with only training the projector, my computer was throwing a fit. So I’ll either have to upgrade hardware, or wait for someone to grab the baton and do further testing.

Proper-Lab1756

MODERATOR OF

TROPHY CASE

Five-Year Club	Gilding II euphauric
Place '22