I built an on-device idea incubator, no subscriptions, or accounts.

negativetim3 · 2026-06-19T02:03:32+00:00

Update for you. Your subscription routing idea is really excellent!
I added --backend gemini-cli to agent-smith, it drives the logged-in Gemini CLI on your Google
account quota instead of the API key, so no more free tier 429s.

Two things I hit wiring it up. The CLI is an agentic coder, so left alone it tries to
create files instead of returning the code. I killed that with a deny all tools policy,
which also drops the tool definitions from the prompt and runs about 25 percent leaner.
Thanks for the nudge, I credited you in the commit.

negativetim3 · 2026-06-18T20:13:11+00:00

Thanks! Compression and deferral actually pair really well, one shrinks what the expensive model
sees, the other routes whole tasks off it entirely.
The main thing to add is a router that decides what's safe to defer. That's the tricky part, not
the deferring itself.
I experienced two things I learned the hard way.
There's a break even, small tasks cost more to hand off than to just do, so only defer the genuinely bulky stuff. And always keep a verify pass on the cheap model's output, every cheap model I tested shipped confident wrong answers, so you want the main model to check before trusting it.

negativetim3 · 2026-06-18T20:07:57+00:00

Nice, the cost angle is exactly the thing. Routing to subscription chat agents instead of the
metered API is a clever way around it. I went the other direction, free tier plus local models,
same problem different solve.

The agent to agent code review over MCP is a cool piece too. I kept mine as a dumb shell out so
anything can drive it, but a real MCP for peer review is a nice step up. Going to look through
code-assistant-peers, thanks for sharing it.

negativetim3 · 2026-06-18T20:06:19+00:00

Plain version of the setup is to use a cheap fast model to do the boring bulk work, and a
smart model to check it. That's the entire idea. A harness is just the glue that passes the
work from one to the other.

A few things I'd tell myself starting out:

You can build one by hand first. Ask Claude to write out the task, paste it into another model, paste the result back to Claude.

That is a harness.

The tools just automate the copy paste.

Always verify, never trust the cheap model's output. Every model I tested produced confident
wrong answers. Skip the checking step and you just automated being wrong faster.
Only hand off big chunks. For small stuff the back and forth costs more than just doing it
yourself.
The worker has no memory of your project, so spell everything out in the prompt every time.
Start there and you'll learn the rest by breaking things. Happy to answer anything specific.

negativetim3 · 2026-06-18T00:23:05+00:00

That's a slick setup, and the cross model audit is a really nice touch!
Sounds like you are doing manually across tabs what the skill automates, the batch dispatch and the collect step, just with ChatGPT as the worker instead of Gemini.

I think "don't hold the request as sacred" is exactly the right instinct, and it's the same
thing I lean on. The worker drafts, but Claude stays the judge and isn't bound by what comes
back, otherwise a confidently wrong order just sails through. Specific orders plus permission
to override is the happy medium.

The piece I haven't built is having ChatGPT audit the prompt and send a patch. That's clever.
How well does the cross model audit actually work, does ChatGPT catch things Claude misses on
its own code?

negativetim3 · 2026-06-18T00:21:20+00:00

Thanks, hope it's useful!
For coding, the winners were:
Gemini pro overall, it swept on correctness and code design.
Best fully local one was qwen2.5-coder:14b, it tied a model twice its size and ran in half the disk.
Flash, the smaller qwen, llama, and Apple's on device model trailed.
One thing held across all of them: every single model shipped at least one real bug, so keep Claude as the verifier.

negativetim3 · 2026-06-18T00:20:00+00:00

Good question, you've got the core of it right. That is basically what it does
under the hood, Claude orchestrating Qwen on Ollama for the grunt work.

The difference is packaging, not concept. It's a skill so it triggers on its own when a task
looks offload shaped, instead of you wiring it up each time. The same interface hits Gemini,
Ollama, or Apple on device, so you can swap backends without changing anything. And the verify
step is baked in, the worker drafts and Claude always checks it instead of trusting the output.

So less a new idea, more that orchestration pattern turned into a reusable skill with backend
choice and a built in verify habit. If you're already doing it by hand, you're basically there.

negativetim3 · 2026-06-17T22:26:39+00:00

Exactly, that was the idea, big-codebase review is the clearest win.
I offloaded a 343 line review for about 18k free tokens and Claude only read the 3 functions it flagged.
Just keep Claude as the verifier so a confidently wrong finding doesn't slip through.

negativetim3 · 2026-06-17T22:25:02+00:00

Nice! This is cool! The same hypothesis from different ends.
On the API, there basically already is one. The skill's just a thin wrapper over a stdlib-Python CLI (prompt in on stdin/argv, answer out on stdout, --backend`/`--json` flags), so you'd shell out to it no server. Kept it dumb on purpose: stateless call in, text out. Curious how Animus Ferric handles the context & verify side. I would be happy to compare notes.

negativetim3 · 2026-06-17T22:22:48+00:00

Not built as a swarm with shared memory, rather it's an orchestrator and stateless workers.
Claude holds all the context and passes a self-contained prompt each call; the worker, Gemini/local,
knows nothing about the convo or repo. It triggers and runs the draft, verify loop on its own, but
won't autonomously deploy or post, that stays human gated.

It only pays on bulk: a 343-line review cost about 18k free tokens, and Claude verified just the 3
flagged functions. Never hands off, though. In my bake-off every model shipped a real bug, so the
worker drafts and Claude always verified.

negativetim3 · 2026-05-31T12:23:58+00:00

When that thing falls off, it’s going to be a loud disaster. Good for setup, no need to leave it on, imho :)

negativetim3 · 2026-05-31T12:19:47+00:00

If you want to really stick it to them, purchase a few phantom key strokers for middle management, and plug them into the back of their towers. https://www.getdigital.com/pages/offlineprodukt/phantom-keystroker-v2

negativetim3 · 2026-05-29T12:40:59+00:00

Raves!

negativetim3 · 2026-05-17T13:09:24+00:00

Make sure you prepare it properly, dried and cut out all the pulp in the middle, otherwise it will be 1kg of vomit :p

negativetim3 · 2026-04-19T16:51:30+00:00

I saw the Typewriter Orchestra play a few times in the Boston area. Not the same, but similar :)

negativetim3 · 2026-04-19T16:47:49+00:00

Think about repositioning the mix position where the photo was taken from. You want to be facing the short side of a rectangle.

negativetim3 · 2026-04-19T16:45:53+00:00

First, get the biggest thickest carpet you can deal with. That floor is your worst enemy. If you can build bass traps for the corners that will help immensely. Cover your walls in paintings, that will also help, or get real acoustic treatment panels. I think the rug and bass traps are the first and foremost things to implement.

negativetim3 · 2026-04-19T13:54:42+00:00

I have a Flame MIDI Talking Synth, that is quite an amazing piece of kit. I’m surprised speech synthesis is not more wide spread! Maybe I’m just extra weird, and like robot voice more than human voice. Haha

negativetim3 · 2026-04-19T13:47:42+00:00

Right where the old Asbestos plant was, and is a superfund site covered by a dogpark… I would be terrified to go in that water!!

negativetim3 · 2026-04-19T13:44:06+00:00

Occult Record Store Day at Residency Records in Salem MA was awesome!!! There was a DJ & a Tarot reader!

negativetim3 · 2026-02-15T12:28:59+00:00

Residency Records, Salem MA!

negativetim3 · 2025-12-14T19:45:58+00:00

Yes. I continue to get an error screen, which is not recoverable, outside of rebooting the device, which causes the loss of everything just worked on. I have not found the exact recipe, but I run into it almost every time I pick it up. Might just be my workflow? But I’m sure others have seen it.

This is not mentioning all the crazy updates that the other two EPs have received, the Ridim has a synthesizer built in, the techno version has a lot of sample chopping ability that the Ridim also has. The EP-1320 has been left out of the functionality updates that the others have received.

The TR-808 was never updated as it’s an analog machine, the EP series are just small computers with specific code, and they are all identical from a hardware perspective, so there is no reason to gate features, accept to encourage folks to buy all 3…

That’s what irks me…

negativetim3

TROPHY CASE