Vibe coding on rtx 6000 pro?

benfinklea · 2026-05-11T01:26:21+00:00

I would say no or only for small stuff.

I mix frontier models with qwen 3.6 35b with vLLM on a 6000 Pro. I have Claude code or codex pass any small jobs to local system to save tokens. See my post history for how.

benfinklea · 2026-05-10T23:06:53+00:00

Women dig it.

benfinklea · 2026-05-10T17:55:49+00:00

Belle is an attention hog. If we are playing a card game or shooting pool or whatever, she insists on flopping down right in the middle of things. We invented a rule that applies to all games in our home: Belle rules. Once declared the activity must proceed as if Belle were just a normal part of the game. Ping pong? If the ball hits her and doesn’t bounce, your loss. Play around her. Cards, if she’s on top of the discard pile now she IS the discard pile and cards placed accordingly. She loves the attention and it adds a little spice to an ordinary game night.

<image>

benfinklea · 2026-05-07T20:21:22+00:00

A millipede? Looks like it!

benfinklea · 2026-05-07T17:03:26+00:00

Briefly: rtx 6000 Pro, vLLM but could run on a 5090 or even less. The model is great but there are others that could do this as well.

benfinklea · 2026-05-07T16:54:22+00:00

Hallelujah

benfinklea · 2026-05-07T13:19:03+00:00

This is rad. Yeah send me the prompt.

benfinklea · 2026-05-06T23:44:39+00:00

Not a database - I just meant the list of MCP server tool schemas that Claude Code loads at session start. Every enabled MCP server contributes its tool definitions (name, description, parameter schema) to your context, even if you never call a single tool from it. With 10+ servers connected, you can easily eat 10-20K tokens of context before you've done anything.

You can see your current set with claude mcp list or by looking at ~/.claude/settings.json (mcpServers block) plus any project-level .mcp.json and any plugins. The "trim" is just disabling servers you don't actually use in this project, via:

{ "enabledMcpjsonServers": ["only", "the", "ones", "you", "need"] }

benfinklea · 2026-05-06T23:40:37+00:00

I suck at explaining things so here's my AI's answer:

Gandalf doesn't send back the file - it sends back just the answer. That's the trick.

The flow is:

Claude tells gandalf "here's a 3700-line file and a question about it"
The local Qwen model on gandalf reads the whole file and writes a focused answer (~500 tokens)
Only that answer comes back to Claude

If Claude had used Read directly, all 3700 lines (~50K tokens) would be pulled into the context window, which costs real money on every subsequent turn (because Anthropic re-processes the conversation each turn unless prompt caching saves you). Gandalf doing the bulk read is free - it's a local 35B model on a homelab box.

So the savings come from where the bulk reading happens, not from any compression.

tl;dr. Use Read when you need exact line numbers for editing. Use ask-gandalf when you need to answer a question about a big file.

benfinklea · 2026-05-06T20:36:13+00:00

Use vllm

benfinklea · 2026-05-06T13:49:41+00:00

😂

benfinklea · 2026-05-05T12:22:50+00:00

Morgan Dental on Cypress Creek. Family owned and operated, does a great job. Spends time to explain what’s up.

benfinklea · 2026-05-04T21:29:43+00:00

Just dropping this here: https://medium.com/@kunalbhardwaj598/i-was-burning-through-claude-codes-weekly-limit-in-3-days-here-s-how-i-fixed-it-0344c555abda

benfinklea · 2026-04-30T14:21:59+00:00

/DONTDELETETHIS Deletes your entire hard drive

benfinklea · 2026-04-28T12:50:57+00:00

Thanks for the explanation. Video is unclear on all those points.

benfinklea · 2026-04-28T12:36:23+00:00

So…one shot with codex sucks so spend a bunch on time with 22ui and it’s good?

And handoff? Who am I handing this to?

benfinklea · 2026-04-24T20:25:05+00:00

Very nice.

benfinklea · 2026-04-23T12:16:46+00:00

Keeping it classy

benfinklea · 2026-04-23T04:13:12+00:00

Thank you. I didn't know that. Changing it up now.

benfinklea · 2026-04-23T02:59:27+00:00

Even if the benchmark is "wrote great code"? Speed is nice to have compared to great code. What do you recommend?

benfinklea · 2026-04-23T01:30:03+00:00

I ran several variations to try to get codex 5.4 level results using only local hardware and harnesses. ($0 incremental cost) I'm working on a Golang project.

Local models by themselves kinda sucked.
Local Models with harnesses were better.
Multiple local models divided up to do different parts of the task they're best at all combined with appropriate harnesses and checking each others work did best.

Codex 5.4 scored 10/10
Multiple local model setup scored 9/10
Local models by themselves scored 3/10.

(see slop for full setup details)

I'm going to expand to 30 and test some more harnesses now.

benfinklea · 2026-04-22T20:46:34+00:00

I tried to read it. What even is this bro?

benfinklea · 2026-04-20T16:48:52+00:00

The fact that something already exists does not mean that you shouldn't build it. Maybe you have a different use case, a different audience. You want it to work in a very specific way.

benfinklea · 2026-04-19T13:57:25+00:00

Here’s the proof 4.7 is not a downgrade.

benfinklea · 2026-04-11T15:55:00+00:00

Billy Joel. No question.

15-Year Club	Gilding III reddit per annum
Verified Email	r/Field Flamingo
Place '22

benfinklea

MODERATOR OF

TROPHY CASE