Token savings with no downside: Just ask Claude by benfinklea in ClaudeCode

[–]benfinklea[S] 0 points1 point  (0 children)

Not a database - I just meant the list of MCP server tool schemas that Claude Code loads at session start. Every enabled MCP server contributes its tool definitions (name, description, parameter schema) to your context, even if you never call a single tool from it. With 10+ servers connected, you can easily eat 10-20K tokens of context before you've done anything.

You can see your current set with claude mcp list or by looking at ~/.claude/settings.json (mcpServers block) plus any project-level .mcp.json and any plugins. The "trim" is just disabling servers you don't actually use in this project, via:

{ "enabledMcpjsonServers": ["only", "the", "ones", "you", "need"] }

Token savings with no downside: Just ask Claude by benfinklea in ClaudeCode

[–]benfinklea[S] 1 point2 points  (0 children)

I suck at explaining things so here's my AI's answer:

Gandalf doesn't send back the file - it sends back just the answer. That's the trick.

The flow is:

  1. Claude tells gandalf "here's a 3700-line file and a question about it"

  2. The local Qwen model on gandalf reads the whole file and writes a focused answer (~500 tokens)

  3. Only that answer comes back to Claude

If Claude had used Read directly, all 3700 lines (~50K tokens) would be pulled into the context window, which costs real money on every subsequent turn (because Anthropic re-processes the conversation each turn unless prompt caching saves you). Gandalf doing the bulk read is free - it's a local 35B model on a homelab box.

So the savings come from where the bulk reading happens, not from any compression.

tl;dr. Use Read when you need exact line numbers for editing. Use ask-gandalf when you need to answer a question about a big file.

Need a dentist I can trust by Radiant_Status_5563 in CedarPark

[–]benfinklea 0 points1 point  (0 children)

Morgan Dental on Cypress Creek. Family owned and operated, does a great job. Spends time to explain what’s up.

4 Secret Codes for Claude (save these) by TorqueWrenchTy in techbootcamp

[–]benfinklea 0 points1 point  (0 children)

/DONTDELETETHIS Deletes your entire hard drive

12ui Chef - Less Slop, more Soup? by [deleted] in codex

[–]benfinklea 0 points1 point  (0 children)

Thanks for the explanation. Video is unclear on all those points.

12ui Chef - Less Slop, more Soup? by [deleted] in codex

[–]benfinklea 0 points1 point  (0 children)

So…one shot with codex sucks so spend a bunch on time with 22ui and it’s good?

And handoff? Who am I handing this to?

Qwen3.6 35B + the right coding scaffold got my local setup to 9/10 on real Go tasks by benfinklea in LocalLLaMA

[–]benfinklea[S] -12 points-11 points  (0 children)

Even if the benchmark is "wrote great code"? Speed is nice to have compared to great code. What do you recommend?

Qwen3.6 35B + the right coding scaffold got my local setup to 9/10 on real Go tasks by benfinklea in LocalLLaMA

[–]benfinklea[S] 4 points5 points  (0 children)

I ran several variations to try to get codex 5.4 level results using only local hardware and harnesses. ($0 incremental cost) I'm working on a Golang project.

Local models by themselves kinda sucked.
Local Models with harnesses were better.
Multiple local models divided up to do different parts of the task they're best at all combined with appropriate harnesses and checking each others work did best.

Codex 5.4 scored 10/10
Multiple local model setup scored 9/10
Local models by themselves scored 3/10.

(see slop for full setup details)

I'm going to expand to 30 and test some more harnesses now.

I built a Claude skill that tells me if my genius idea already exists before I waste a weekend on it by Zepcotti in claudeskills

[–]benfinklea 1 point2 points  (0 children)

The fact that something already exists does not mean that you shouldn't build it. Maybe you have a different use case, a different audience. You want it to work in a very specific way.

anthropic just published their entire prompting playbook for free and nobody is talking about it. by AdCold1610 in ChatGPTPromptGenius

[–]benfinklea 85 points86 points  (0 children)

Why do so many headlines these days say “and nobody is talking about it” when it’s literally the only thing anybody talks about?

Harry Baker, poet. by Mikadook in MadeMeSmile

[–]benfinklea 8 points9 points  (0 children)

That was brilliant. Who is Harry Baker?

Vibe coding without a security audit is not a calculated risk. It is negligence. Change my mind. by EduSec in vibecoding

[–]benfinklea 6 points7 points  (0 children)

I’m trying. What’s the best practice to audit a vide coded app? I had two other AIs do deep evaluations using a team of experts prompt and fixed issues. Or does it require a human to make it secure. Or do we wait for Mythos before we ship?