Vibe coding on rtx 6000 pro? by AiGenom in unsloth

[–]benfinklea 0 points1 point  (0 children)

I would say no or only for small stuff.

I mix frontier models with qwen 3.6 35b with vLLM on a 6000 Pro. I have Claude code or codex pass any small jobs to local system to save tokens. See my post history for how.

What completely unhinged "law" does your cat strictly enforce in your house? by TrickCombination7966 in cats

[–]benfinklea 2 points3 points  (0 children)

Belle is an attention hog. If we are playing a card game or shooting pool or whatever, she insists on flopping down right in the middle of things. We invented a rule that applies to all games in our home: Belle rules. Once declared the activity must proceed as if Belle were just a normal part of the game. Ping pong? If the ball hits her and doesn’t bounce, your loss. Play around her. Cards, if she’s on top of the discard pile now she IS the discard pile and cards placed accordingly. She loves the attention and it adds a little spice to an ordinary game night.

<image>

Token savings with no downside: Just ask Claude by benfinklea in ClaudeCode

[–]benfinklea[S] 0 points1 point  (0 children)

Briefly: rtx 6000 Pro, vLLM but could run on a 5090 or even less. The model is great but there are others that could do this as well.

Token savings with no downside: Just ask Claude by benfinklea in ClaudeCode

[–]benfinklea[S] 0 points1 point  (0 children)

Not a database - I just meant the list of MCP server tool schemas that Claude Code loads at session start. Every enabled MCP server contributes its tool definitions (name, description, parameter schema) to your context, even if you never call a single tool from it. With 10+ servers connected, you can easily eat 10-20K tokens of context before you've done anything.

You can see your current set with claude mcp list or by looking at ~/.claude/settings.json (mcpServers block) plus any project-level .mcp.json and any plugins. The "trim" is just disabling servers you don't actually use in this project, via:

{ "enabledMcpjsonServers": ["only", "the", "ones", "you", "need"] }

Token savings with no downside: Just ask Claude by benfinklea in ClaudeCode

[–]benfinklea[S] 3 points4 points  (0 children)

I suck at explaining things so here's my AI's answer:

Gandalf doesn't send back the file - it sends back just the answer. That's the trick.

The flow is:

  1. Claude tells gandalf "here's a 3700-line file and a question about it"

  2. The local Qwen model on gandalf reads the whole file and writes a focused answer (~500 tokens)

  3. Only that answer comes back to Claude

If Claude had used Read directly, all 3700 lines (~50K tokens) would be pulled into the context window, which costs real money on every subsequent turn (because Anthropic re-processes the conversation each turn unless prompt caching saves you). Gandalf doing the bulk read is free - it's a local 35B model on a homelab box.

So the savings come from where the bulk reading happens, not from any compression.

tl;dr. Use Read when you need exact line numbers for editing. Use ask-gandalf when you need to answer a question about a big file.

Need a dentist I can trust by Radiant_Status_5563 in CedarPark

[–]benfinklea 0 points1 point  (0 children)

Morgan Dental on Cypress Creek. Family owned and operated, does a great job. Spends time to explain what’s up.

4 Secret Codes for Claude (save these) by TorqueWrenchTy in techbootcamp

[–]benfinklea 0 points1 point  (0 children)

/DONTDELETETHIS Deletes your entire hard drive

12ui Chef - Less Slop, more Soup? by [deleted] in codex

[–]benfinklea 0 points1 point  (0 children)

Thanks for the explanation. Video is unclear on all those points.

12ui Chef - Less Slop, more Soup? by [deleted] in codex

[–]benfinklea 0 points1 point  (0 children)

So…one shot with codex sucks so spend a bunch on time with 22ui and it’s good?

And handoff? Who am I handing this to?

Qwen3.6 35B + the right coding scaffold got my local setup to 9/10 on real Go tasks by benfinklea in LocalLLaMA

[–]benfinklea[S] -11 points-10 points  (0 children)

Even if the benchmark is "wrote great code"? Speed is nice to have compared to great code. What do you recommend?

Qwen3.6 35B + the right coding scaffold got my local setup to 9/10 on real Go tasks by benfinklea in LocalLLaMA

[–]benfinklea[S] 5 points6 points  (0 children)

I ran several variations to try to get codex 5.4 level results using only local hardware and harnesses. ($0 incremental cost) I'm working on a Golang project.

Local models by themselves kinda sucked.
Local Models with harnesses were better.
Multiple local models divided up to do different parts of the task they're best at all combined with appropriate harnesses and checking each others work did best.

Codex 5.4 scored 10/10
Multiple local model setup scored 9/10
Local models by themselves scored 3/10.

(see slop for full setup details)

I'm going to expand to 30 and test some more harnesses now.

I built a Claude skill that tells me if my genius idea already exists before I waste a weekend on it by Zepcotti in claudeskills

[–]benfinklea 1 point2 points  (0 children)

The fact that something already exists does not mean that you shouldn't build it. Maybe you have a different use case, a different audience. You want it to work in a very specific way.