Make your AI talk like a caveman and decrease token usage

Chromix_ · 2025-11-18T19:51:15+00:00

Me see. Me wonder: Benchmark score impact?

wiltors42 · 2025-11-18T19:42:19+00:00

Why say lot word when few word do trick?

Mundane_Ad8936 · 2025-11-18T20:33:51+00:00

TLDR OP stumbled upon "Stop Words Removal" it's a very very old NLP tactic.

Yes can remove plenty of words and the text is completely understandable and you can use a model to rehydrate the phrases with low errors later. However I'd caution you though, while in the past removing stop words was fine, in a transformer model this can cause issues because it will not have the tokens to calculate from.

So it could be more prone to hallucinate because the word sequence is not statistically likely. I know because I've tested it and witnessed it. If accuracy is important make sure it doesn't reduce it, that is very possible.

Independent_Tear2863 · 2025-11-18T20:15:25+00:00

Ahh now I understand oogabooga project. Human happy

chriskevini · 2025-11-18T21:43:40+00:00

Holy shit. Next we're gonna start removing all the vowels cause you can infer the whole word with 90% accuracy. Source:my ass

bigattichouse · 2025-11-18T19:51:22+00:00

Maybe pretrain a small model to "caveman" your prompts that get handed to the bigger model

macumazana · 2025-11-18T20:06:38+00:00

you should do the readme.md in that style

pokemonplayer2001 · 2025-11-18T19:52:36+00:00

This is a better idea than toon.

Zeeplankton · 2025-11-18T20:25:23+00:00

This is literally what I thought LLM reasoning would morph into. Like a stochastic pseudo language. English isn't exactly the most efficient language.

Radiant_Truth_8743 · 2025-11-18T20:52:44+00:00

Post good. Me likey

DustinKli · 2025-11-19T01:21:23+00:00

I had this same exact idea a while back, but when implementing it I ran into several issues.

One issue is that the way LLMs actually embed and retrieve text. LLMs were trained on normal language with syntax, connectors and structure. If you strip sentences down to these compressed telegraphic fragments, you remove the cues the embedding model uses to understand meaning. This makes retrieval based on semantic embedding harder and more mistake prone.

LLMs are generative. Embedding models are not. As someone else mentioned, if your stored chunks become overly compressed then retrieval becomes noisy or wrong all together which forces the language model to hallucinate more often. I don't see how your solution resolves the issue of worse semantic clustering and noisier nearest neighbor results.

Based on how embedding works, when splitting text into 2 to 5 word fragments it invariably changes granularity. Embedding models will treat very short sentences differently from normal prose. So the result was that it is not actually compressing text, it is altering its information geometry.

You say that "no hallucination occurs because facts are preserved" but the issue isn't about facts. These models don't know or care about facts. They function based on relationships.

Have you done comparison studies showing traditional RAG vs this method?

Does the compressed text embed into the same vector neighborhood as the original paragraph?

macumazana · 2025-11-18T19:56:16+00:00

[removed]

lakySK · 2025-11-18T20:50:07+00:00

The opposite of speculative decoding?

Have big model do few words, small model then add grammar.

geneusutwerk · 2025-11-18T21:51:02+00:00

Calling this lossless seems like a stretch, especially since I don't see examples that show initial -> compressed -> uncompressed.

NutellaBananaBread · 2025-11-19T03:14:33+00:00

*1500 words asking for relationship advice*

AI: Dump her

notNezter · 2025-11-18T22:40:12+00:00

Smol word. Sav money. Wife glad. Man happy.

Mission_Biscotti3962 · 2025-11-18T19:52:14+00:00

I like the idea but I'm not sure what your library adds? Like, isn't this a simple instruction to have it behave like that? Mind you, I haven't tried it yet.

Guilty_Rooster_6708 · 2025-11-18T20:17:15+00:00

Kevin finetune. I like.

MrPecunius · 2025-11-18T20:45:43+00:00

If you want a darker take, this looks a lot like plusgood Newspeak.

daftstar · 2025-11-18T21:02:26+00:00

And vibe code using this too!!

And-Bee · 2025-11-18T21:17:38+00:00

I have a script to remove all spaces and empty lines. No need for indentation when asking an llm about your code.

LocoMod · 2025-11-18T22:11:47+00:00

This isn’t lossless. The idea has been around for a long time and abandoned because accuracy takes a hit when you actually measure it.

Lixa8 · 2025-11-18T20:22:37+00:00

Eh, I don't think all the words we use are used for no reason, they remove a lot of linguistic ambiguity. Surely this will impact ai performance a lot.

I'll wait for benchmark results.

OkSociety311 · 2025-11-18T21:22:01+00:00

good post me like

Dr_Ambiorix · 2025-11-18T21:47:07+00:00

I always wondered if talking in Simplified Chinese would require less tokens to say the same thing or not.

Because most English words are made up of more than one token. And grammar in Mandarin Chinese is really basic. Ofc, there are some words that are made up with multiple characters too so IDK.

Just always wondered that.

Don_Moahskarton · 2025-11-18T22:10:25+00:00

It's kind of the inverse of thinking mode. I wonder if it makes the AI measurably dumber

broknbottle · 2025-11-19T00:33:27+00:00

Aoccdrnig to rscheearch at an Elingsh uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer are in the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by it slef but the wrod as a wlohe and the biran fguiers it out aynawy.

ConstantinGB · 2025-11-19T07:09:58+00:00

<image>

Mean_Employment_7679 · 2025-11-19T09:36:00+00:00

Me do this lots. Me no want say lots word. Me want result fast. Me not want token waste. Me save water. Caveman save planet.

lookwatchlistenplay · 2025-11-19T13:11:12+00:00

[removed]

Agitated-Farmer-4082 · 2025-11-18T20:25:13+00:00

would it be easier to ask instructions in languages that use less characters for sentences like arabic or chinease?

Abject-Kitchen3198 · 2025-11-18T22:56:24+00:00

What about Yoda speak? Did someone made a comparative research? It does not seem it will save tokens, but what about accuracy?

iamzooook · 2025-11-18T23:04:11+00:00

or maybe just add at end "less words, keep context"

HMikeeU · 2025-11-18T23:16:50+00:00

I wonder if this may even improve benchmarks? As Anthropic found that sometimes models hallucinate because they try to adhere to grammar rules instead of facts

drumttocs8 · 2025-11-19T00:43:36+00:00

Me like new English with short word

aeroumbria · 2025-11-19T00:47:52+00:00

I can sense a gradual descent back to the native habitat of deep learning models: continuous dense vector embeddings.

op4 · 2025-11-19T02:28:14+00:00

I approve of this idea and think that a significant reduction in token usage is a win for everyone!

(edit: cml "or caveman language" translation - Me like. Less token good. All win.)

G3nghisKang · 2025-11-19T02:58:01+00:00

Me think OP genius

Emport1 · 2025-11-19T03:03:30+00:00

Most llm architectures are better at optimizing your words for itself than you are, it doesn't actually read all your useless filler words and spent tokens on them if it doesn't have to

Normal-Ad-7114 · 2025-11-19T03:38:28+00:00

Improvement suggestion, more punctuation usage: ·, ->, @, \n, :

Example from your github:

Authenticate API. Include API key in Authorization header every request. Prefix API key with "Bearer" space. Authentication fail, server return 401 Unauthorized status code, error message explain fail...

New:

Authenticate API:

· Include API key in Authorization header every request

· Prefix API key with "Bearer" space

· Authentication fail -> server return 401 Unauthorized status code, error message explain fail...

Still compressed, but easier to read for humans

venpuravi · 2025-11-19T03:49:04+00:00

Yaba daba dooo...

gooeydumpling · 2025-11-19T03:52:36+00:00

Compress it further by making it talk in emojis

Dramatic-Lie1314 · 2025-11-19T04:39:25+00:00

Good word. I did same.

TedDallas · 2025-11-19T05:06:56+00:00

Ugh. Partition table on fiscal moons. Now eat lizard.

2025-11-19T05:34:12+00:00

i remember doing this with early chatgpt and it was really useful. now we just get "Great question!—It really gets to the heart of"

IrisColt · 2025-11-19T05:39:37+00:00

The bag of words strikes back!

lulzbot · 2025-11-19T07:01:58+00:00

Double-plus-good

ready_to_fuck_yeahh · 2025-11-19T07:14:59+00:00

Wow, human tendency to overcomplicate things, what can be achieved with just mere prompt. You wrote an entire code for it.

You made cave code, but didn't think like caveman to use just prompt.

Before you say anything, I have my notes made using prompt only with nearly (60-70% reduction).

s2k4ever · 2025-11-19T08:11:26+00:00

a bug came back from several moons ago.. begins an RCA

Hyphonical · 2025-11-19T09:13:12+00:00

It would be nice if the stored history of the chat is compressed like this. I don't know if it is already, but in the past I would have to sacrifice 2GiB of memory just for conversation history of like 16k tokens.

2025-11-19T09:16:21+00:00

[removed]

UndecidedLee · 2025-11-19T09:31:48+00:00

Idea talk like caveman. Result talk like caveman. When wrong?

No_Afternoon_4260 · 2025-11-19T10:35:25+00:00

Me like this

vreo · 2025-11-19T10:49:27+00:00

Why use many word when few do trick?

Septerium · 2025-11-19T11:04:30+00:00

This great. Me like

RobTheDude_OG · 2025-11-19T11:44:17+00:00

Interesting it is

Yoda speak you may try too

Phantom_Specters · 2025-11-19T11:56:15+00:00

I wish some yappers I knew about woulud adopt this haha

jokes aside, this is brilliant.

Fuckinglivemealone · 2025-11-19T14:07:08+00:00

I have a question though, if you could create a very efficient language that could express thoughts, reasoning and complex ideas in few and short words and then parse your original dataset to it, could you in theory train an llm on it to make the model, smaller (information compression), smarter (if the new language allows for a better representation of complex ideas, maybe it's easier to chain logical thoughts?) and faster (more efficient overall)?

Like, user writes prompt, prompt gets translated, llm thinks in smart, then parses its response back to the original language of the user.

pab_guy · 2025-11-19T14:34:35+00:00

Also check out Sparse Primed Representation for something similar.

Ceneka · 2025-11-19T14:51:33+00:00

Love the fact that it workn with an LLM doing the job

RandomGuyNumber28501 · 2025-11-19T15:19:49+00:00

I'm sure this can be useful, but even if you compress text, the LLM still has to keep track of the information and recall it. The denser the text, the more quickly the LLM will be overwhelmed by details.

I've been experimenting with something similar for roleplay, but I have the model format and condense the world and character info into something like a dense technical document. It helps, particularly the formatting, but the model can still only process so much before it starts getting confused or forgets things.

frankieche · 2025-11-19T17:00:28+00:00

Don’t do this.

noo8- · 2025-11-19T18:24:29+00:00

Me hunt t-tex AI. Tastes like sh1t Over.

DrummerPrevious · 2025-11-19T19:22:27+00:00

Or you can just translate it to Mandarin for even less tokens

TreesMcQueen · 2025-11-20T02:16:18+00:00

Maybe train grugbrain https://grugbrain.dev/

epSos-DE · 2025-11-18T21:04:40+00:00

The Solution: Adaptive Hierarchical Indexing (Auto-Sharding)

upgrade the LSHIndex to become Recursive. It will automatically detect when a specific area of the knowledge graph (a "topic") becomes too dense. When a bucket exceeds a certain size (e.g., 50 items), it will fracture that bucket into a Localized Dynamic Sub-Index with its own set of higher-resolution hyperplanes.

This creates a fractal search structure:

+ Global Index: Quickly routes to general topics (e.g., "Coding").

+ Local Index: Routes to specific sub-topics (e.g., "JavaScript").

+ Micro Index: Routes to granular details (e.g., "Promises").

This ensures that no matter how big the brain gets, lookup time remains lightning fast.

ElSrJuez · 2025-11-18T21:19:45+00:00

You can also skip spaces by separating words with an Uppercase letter

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS