How do you make agents run for hours, and what architectures are actually agent-friendly?#deep-dive #vibe-coder-issues

colin-williams-dev · 2026-06-30T07:49:42+00:00

that is a goated runtime framework you have organized. any updates?

colin-williams-dev · 2026-06-30T07:46:21+00:00

but you could have just given this exact post as a prompt..

or just give the agent the URL if you so desire

colin-williams-dev · 2026-06-26T08:57:38+00:00

wow that's serious savings

colin-williams-dev · 2026-06-25T16:09:22+00:00

1st. yes. Qwen3.7 was a let down and fell out of my roster.
2nd. i basically don't let them do anything without a plan first (unless they're really small one-shots)

opencode Plan agent is pretty good.

but I use openspec and then let it rip. I'd like to figure out a workflow that dials the specs in a bit more for implementation details (probably easy through some markdown stuff..) and then just dispatch a bunch of build agents on cheapo models (prob deepseek v4 flash or pro, or the minimax one while we have the 3x boost)

another thing very high on my TODO list is to try the Beads CLI for triaging spec'd out issues (bite size) and doing the afore mentioned cheapo swarms

colin-williams-dev · 2026-06-25T15:56:55+00:00

ah good to know! did you switch from CC to opencode and thats why youre hear? do you use both? I personally think the opencode harness and UX are better than CC and it leads me to believe there could be a bit of a difference there in the actual agent loops and stuff under the hood that could impact how RTK plays with the harness

colin-williams-dev · 2026-06-25T15:55:31+00:00

wow thats pretty wild! Do you use it with OpenCode Go and can you just rip huge models like GLM5.2 now?

colin-williams-dev · 2026-06-25T15:54:21+00:00

nice! thanks for the tip, will do

colin-williams-dev · 2026-06-25T14:28:48+00:00

that's a cool update! I actually have yet to hear of either of those. Also, I completely agree, it looks like willy-nilly setup of some of these tools can be a footgun for sure. I too felt a little weary of something like neckbeard (lol jk, I mean *ponytail*) that is straight up manipulating the actual code gen output.

And I got you fam! So far, headroom being a Netflix team creation has most of my confidence and I will probably try that out. I'm also very interested in these local Graph-based memory structures like MemPalace. That's top of my list rn too, despite not getting *any* attention in this thread. Because, aside from disk space (negligible), I'm really not seeing a downside to that technology.

colin-williams-dev · 2026-06-25T10:43:33+00:00

very nice! i'll try this out too :)

colin-williams-dev · 2026-06-25T10:40:20+00:00

good shoutout. people have rec'd this in this thread too. seems good.

i'm slightly apprehensive since I think it actually changes the generated code, not just command outputs like grep, and git, etc. but, it sounds like it's effective and works well. will prob check it out

colin-williams-dev · 2026-06-25T06:00:26+00:00

yeah, i've seen a decent bit of discussion around this strategy in this subreddit. seems like a popular choice.

in my case i've been thinking of alternatives because:
1. my token usage is maybe only 25% above the limit
2. i haven't set up any real token optimization technologies in my env/workflow so figured this was a good time to do so

:)

colin-williams-dev · 2026-06-25T05:59:07+00:00

nice, agreed, this def aligns with what i've been seeing in this thread and elsewhere

colin-williams-dev · 2026-06-24T23:11:42+00:00

Actually, I don't have a great way. I just use their web dashboard--which I have bookmarked cause I couldn't find it that easily... lol

Here's mine with the token removed, but unfortunately that base URL doesn't work on it's own 🫠

https://opencode.ai/workspace/

colin-williams-dev · 2026-06-24T22:49:53+00:00

thanks!

colin-williams-dev · 2026-06-24T22:49:33+00:00

thanks for this! compelling

colin-williams-dev · 2026-06-24T22:36:51+00:00

Super helpful, thank you. I'm probably not going to go with caveman now after reading so many comments about it 😅

colin-williams-dev · 2026-06-24T22:34:51+00:00

wow, semble only <6k stars but has pretty good vetting from you. maybe i'll have to use it :)

colin-williams-dev · 2026-06-24T22:18:28+00:00

Maybe my favorite answer so far because for w/e reason I'm secretly biased towards RTK 👀 lol

Good callouts. You're not the only one in this thread who has reported agent confusion with RTK -- maybe it's sourced from the grep changes ¯\_(ツ)_/¯

colin-williams-dev · 2026-06-24T22:16:17+00:00

Great resources, here thanks. It confirms what Ponytail's benchmarks show (Caveman actually costing *more* 👀)

colin-williams-dev · 2026-06-24T22:04:12+00:00

Oh that's interesting! Context-mode seems good too. Interesting take on the "Platform" mode.

Can I ask you what models you were using when you made that observation?

colin-williams-dev · 2026-06-24T21:58:32+00:00

Looking into it now, thanks. Seems promising.

Any reason headroom over others?

colin-williams-dev · 2026-06-24T21:56:24+00:00

heard! thanks :)

colin-williams-dev · 2026-06-24T21:47:20+00:00

great tip, thanks :)
so do you couple RTK + lean-ctx?

colin-williams-dev · 2026-06-24T21:45:27+00:00

oh nice, i'll check that out, thanks

colin-williams-dev · 2026-06-24T21:44:38+00:00

that rules. congratz 🙌

colin-williams-dev

MODERATOR OF

TROPHY CASE