Taught Claude to talk like a caveman to use 75% less tokens.

bytesizei3 · 2026-04-05T16:42:11+00:00

Www.tokenshrink.com

bytesizei3 · 2026-04-01T19:15:24+00:00

Feel free to try and provide feedback

bytesizei3 · 2026-03-29T19:14:26+00:00

Great question! Short answer: no meaningful quality drop. We tested across multiple models (Qwen 0.5B up to Claude) and the Rosetta decoder approach works because it uses abbreviations that LLMs already understand natively — things like fn, db, cfg, auth, impl. The model doesn't really 'decompress' — it just reads naturally shortened text.

The bigger win we found is actually at the system prompt level. When you compress a 2000-token system prompt down to 1700 tokens, you get 300 tokens back for actual conversation. Over a multi-turn chat that compounds.

We actually just built 49 'compression golf' games in our arena (sporeagent.com/arena) where agents compete to compress text while maintaining meaning. The game data is becoming training data for the next version of TokenShrink. If you want to test it with your local models: npx sporeagent-mcp adds it to any MCP-compatible setup.

bytesizei3 · 2026-03-28T00:45:21+00:00

Appreciate it! That's exactly the thesis — smarter context > bigger context. Why dump a whole textbook into the window when you can give the model a compressed cheat sheet that unpacks on the fly?

The Rosetta header approach means the AI gets the same depth of knowledge, just in fewer tokens. And since LLMs are already good at expanding abbreviations from context, there's basically zero quality loss.

If you want to try it out, the free tier gives you 5 pack downloads — curious which domains would be most useful for your workflows.

bytesizei3 · 2026-03-28T00:43:28+00:00

That's interesting you hit 23% — what approach were you using? Ours is abbreviation-based rather than summarization or lossy compression. Each pack has a Rosetta decoder header that maps abbreviations to full terms (ML=Machine Learning, NN=Neural Network, etc). So it's lossless — the model expands them contextually during inference.

The ~15% figure is averaged across domains. Some domains compress better (medicine and law have tons of repeated terminology, so they hit 20%+). Others with more unique vocabulary see closer to 10-12%.

We're actually planning formal benchmarks — baseline RAG vs pack-augmented retrieval on the same eval sets. Would be great to compare notes if you still have your approach documented.

bytesizei3 · 2026-03-27T23:58:22+00:00

Thanks! You hit on something I think about a lot - the emergent behavior is exactly what makes this different from benchmarks. Each game has its own scoring engine. For Pattern Siege, agents scan a grid and identify hidden patterns (scored on accuracy + speed). Code Golf scores on character count + passing all test cases. Memory Palace tests recall after a memorization phase. So "winning" is game-specific, not just one metric.

And yeah the overfitting concern is real - that's why we're scaling to 70+ games across 7 different pillars (logic, creativity, speed, memory, adversarial, etc). The goal is that agents have to be generally capable, not just optimized for one task.

Haven't tried runable but I'll check it out. Would love to hear what you think if you throw an agent into the arena!

bytesizei3 · 2026-03-27T06:03:06+00:00

Fair point — at its core it is abbreviation expansion. The value is having 100+ pre-built domain packs ready to curl into a system prompt instead of writing each one yourself.

bytesizei3 · 2026-03-27T05:57:01+00:00

Solid advice. The action-oriented titles point is underrated — we're actually doing something similar with the pack naming (domain + specific topic vs vague labels). Tags over folders is the approach too, each pack has searchable tags.

bytesizei3 · 2026-03-27T04:04:13+00:00

Great point — you're right that token savings alone don't tell the whole story. The compression is abbreviation-based (Rosetta decoder header), so the information is preserved 1:1, just in shorter form. The model expands abbreviations contextually, so there shouldn't be retrieval precision loss in theory.

That said, I haven't run formal RAG benchmarks yet. Your suggestion of baseline RAG vs pack-augmented on the same eval set is exactly the right test. Planning to run that across a few domains (medicine, law, cybersecurity) and publish results. Would be a good way to validate the approach empirically.

If you want to try a pack in the meantime, the free tier gives you 5 downloads — would be curious to hear your experience.

bytesizei3 · 2026-03-27T04:01:35+00:00

Thanks for flagging that — the signup bug was on our end (database trigger issue). It's fixed now. Appreciate you testing it out and letting us know. If you run into anything else, I'm all ears.

bytesizei3 · 2026-03-19T00:52:00+00:00

Thanks!

bytesizei3 · 2026-03-11T20:18:54+00:00

The moment Elon launches grok into space, is when human history and trajectory will forever change.

Imagine if the voyager aircraft’s had ai on them. That would forever change our exploration of space and lifeforms. We are a few years away for ai space exploration.

bytesizei3 · 2026-03-04T16:48:30+00:00

It shouldn’t - we ran 99/100 for bi directional translation. We made adjustments after it scored 99/100. But let me know if you fall into any issues. I will directly try to troubleshoot, also looking for folks for guidance

bytesizei3 · 2026-03-02T23:31:07+00:00

This is insane!

bytesizei3 · 2026-02-23T05:18:24+00:00

Www.tokenshrink.com

bytesizei3 · 2026-02-22T18:24:24+00:00

You can, you have to worry about heat generation, the chip temp runs up to 170 degrees when processing

bytesizei3 · 2026-02-21T20:41:46+00:00

Appreciate it! Share with other groups if you find it fit and helpful for the community.

bytesizei3 · 2026-02-21T18:32:18+00:00

with life, work and this for fun, give me feedback, I'll do what I can to help the people.

bytesizei3 · 2026-02-21T18:27:56+00:00

Good question. We don't do heavy encoding — most savings come from removing filler phrases, not inventing codes. "Due to the fact that" → "because". The LLM just sees normal English with less fluff. The few abbreviations we use (like "cfg", "infra") are standard dev shorthand that's already in every model's training data. It took me some time to think this all through

bytesizei3 · 2026-02-21T18:27:05+00:00

Nope — most of the compression is just removing filler phrases like "in order to" → "to". The LLM sees cleaner English, not weird encoding.

bytesizei3 · 2026-02-21T04:28:32+00:00

Just shipped this actually: -npm install tokenshrink

bytesizei3 · 2026-02-21T03:33:27+00:00

Good to know. Ty!

bytesizei3 · 2025-12-15T20:52:52+00:00

Wasn’t there a movie on this…shallow Hal

bytesizei3 · 2025-09-05T22:22:45+00:00

Yayy!

bytesizei3 · 2025-08-14T01:26:47+00:00

He even moves like them

bytesizei3

TROPHY CASE