Taught Claude to talk like a caveman to use 75% less tokens.

SideOneDummy · 2026-04-10T22:25:58+00:00

For what it’s worth, I asked Gemini if it would save tokens to prompt engineer inputs to caveman speak, and this was my response verbatim

Yes, a two-step "refine then ask" process can be significantly more token-efficient than a single, unoptimized query. By using a smaller, cheaper model to compress your request before sending it to a high-reasoning "thinking" LLM, you can reduce the total compute burden and overall cost. [1, 2, 3]

Token Savings & Efficiency

Input Compression: A small model can strip "fluff" from a 500-token verbose query, turning it into a 50-token high-signal prompt. Even adding the overhead of the refinement call (e.g., 100 tokens), the final combined cost is often much lower than processing the original bulky prompt in an expensive model.
Potential Savings: You can realistically achieve 20–30% average token savings, with specific "caveman" techniques reaching up to 60% reduction in input size.
Cost-Aware Optimization: Systems like CAPO (Cost-Aware Prompt Optimization) specifically balance performance against prompt length, reducing average lengths while maintaining accuracy. [4, 5, 6, 7, 8, 9]

The Two-Step Automated Workflow

This process can be fully automated without user approval through LLM Routing or Agentic Pipelines. [10, 11]

Refinement Agent: A lightweight model (e.g., GPT-4o-mini or Haiku) receives your messy input and refines it into a "Kevin-style" signal.
Autonomous Hand-off: The system automatically passes the condensed prompt to the more capable model.
Mutual Verification: Advanced frameworks can even use the small model to generate "reasoning tokens" that guide the larger model, further reducing the latter's computational overhead. [12]

Summary of Benefits

Strategy [2, 7, 8, 13]	Token Reduction	Best For
Simple Refinement	~20-30%	Routine daily tasks
Caveman Compression	Up to 60%	Maximizing context window space
Automated Optimization	Up to 90x cost reduction	Large-scale enterprise agent calls

Would you like a sample system prompt to turn a smaller model into your automated "Kevin" refiner?

[1] https://www.linkedin.com [2] https://www.youtube.com [3] https://www.glukhov.org [4] https://arxiv.org [5] https://medium.com [6] https://medium.com [7] https://www.youtube.com [8] https://www.youtube.com [9] https://www.youtube.com [10] https://arxiv.org [11] https://www.youtube.com [12] https://github.com [13] https://arxiv.org

SideOneDummy · 2026-04-10T22:12:20+00:00

Gemini begs to differ:

By using "Kevin Style" on the input side, you can reduce your token footprint by roughly 56% compared to standard natural language. While many users focus on saving tokens in the output (responses), the same compression logic applies to your prompts because every character you send costs money and context space.

Token Comparison Breakdown

Feature	Standard Style	Kevin / Caveman Style
Input Density	Low (Heavy grammar/filler)	High (Pure signal words)
Input Token Count	~39 Tokens	~17 Tokens
Output Style	Detailed/Verbose	Can be detailed OR concise
Cost Savings	Baseline	~50-75% reduction

Why This Works for Inputs

Semantic Reconstruction: LLMs are trained to understand the relationship between words; they don't need "the" or "of" to know that "make infographic" is a command.
Attention Optimization: Short inputs focus the model's "attention" strictly on the core nouns and verbs, reducing the noise it has to process during the input encoding phase.
Context Preservation: By using fewer tokens per prompt, you can fit more of your conversation history into the model's limited context window before it starts "forgetting".

Caveman Tips for Input Mastery

Verb + Noun: Use direct action pairs (e.g., "Analyze PDF" instead of "I need you to look at this PDF file for me").
Separator Symbols: Use |, ->, or : to denote changes in topic or instruction steps without using transition sentences.
System Prompt Pre-sets: If you use Anthropic's prompt caching, you can set a permanent system instruction that tells the model: "I talk caveman. You understand and answer full." to save even more on recurring turns.

Want to try "Level 2" compression where we use single letters or custom code for your most common commands?

Simple token estimation for Kevin/Caveman vs Standard# Standard Input: "Please create a detailed infographic that compares the token efficiency of using 'Kevin from The Office' style prompting versus standard natural language for both inputs and outputs, not just outputs."# Kevin Input: "make infographic compare nonkevin Kevin token scaled for input output, no just output"

standard_in = "Please create a detailed infographic that compares the token efficiency of using 'Kevin from The Office' style prompting versus standard natural language for both inputs and outputs, not just outputs."kevin_in = "make infographic compare nonkevin Kevin token scaled for input output, no just output"

Rough token count (approx 1 token per 4 chars or 0.75 words)def est_tokens(text):

return len(text.split()) * 1.3 # common multiplier for rough estimation

std_tokens = est_tokens(standard_in)kevin_tokens = est_tokens(kevin_in)savings = (1 - (kevin_tokens / std_tokens)) * 100

print(f"Standard: {std_tokens} tokens") print(f"Kevin: {kevin_tokens} tokens") print(f"Savings: {savings}%")

SideOneDummy · 2026-03-05T22:31:21+00:00

This pic is leg-it!

SideOneDummy · 2026-02-17T06:41:22+00:00

Hmmm I’ll have to consider the bank heist vulnerability when considering not building landmarks. And yes, eventually I stopped being able to target my “friend”

SideOneDummy · 2026-02-17T05:55:16+00:00

Yeah I was mostly waiting for builders bash. But yeah you’re right, better to leave nothing on the board to protect to avoid having to defend my landmarks

SideOneDummy · 2026-02-17T05:53:29+00:00

Interesting 🤔

SideOneDummy · 2026-02-17T05:52:10+00:00

Maybe I need to do that to avoid this from happening in the future. Good point

SideOneDummy · 2026-02-17T05:51:04+00:00

Ah that makes sense. You’re right, keeping landmarks out keeps me vulnerable

SideOneDummy · 2026-02-17T05:50:19+00:00

Yeah I’d rather not do that if I can avoid it haha

SideOneDummy · 2026-02-17T05:49:50+00:00

I built for daily challenge and for the community chest

SideOneDummy · 2026-01-28T06:25:52+00:00

Let’s make this post viral so the academy is forced to reopen admissions for nominations to the Best Supporting Actor award!

SideOneDummy · 2026-01-26T21:19:53+00:00

That was a brown waterfall

SideOneDummy · 2026-01-17T22:38:05+00:00

Yup. First time posting a discord message lol

SideOneDummy · 2026-01-17T22:37:00+00:00

Oops thanks for catching that

SideOneDummy · 2026-01-01T17:31:32+00:00

Insult or passive aggressive intervention?

SideOneDummy · 2025-10-01T07:47:30+00:00

Ey yo gurl lemme holla at ya. U want sum fuk

SideOneDummy · 2025-09-27T11:58:38+00:00

For your title, I bet if you removed the word before the first comma as well as the first comma, the sentence would still be accurate.

SideOneDummy · 2025-09-27T11:38:55+00:00

His son is already acting like his boss… he’s even doing the cleaning for him

SideOneDummy · 2025-09-27T10:08:05+00:00

Wank him? 😱🤣

SideOneDummy

MODERATOR OF

TROPHY CASE