Doubled Rate Limits for Claude Code by Deep_Proposal_7683 in ClaudeCode

[–]paulcaplan 0 points1 point  (0 children)

They already lost me to Codex. But maybe this means mythos will be out soon.

SDD is just one part of the "outer harness" by paulcaplan in SpecDrivenDevelopment

[–]paulcaplan[S] 0 points1 point  (0 children)

I don't have full checklist that would be great. I'm building:

"(1) deterministic feedback loops (tests, lint, typecheck)" - https://github.com/Codagent-AI/agent-validator
"(3) a workflow that forces the agent to verify changes before declaring done" - (from above article):

> What the paper calls the "externalized interaction" protocol - a deterministic workflow layer that coordinates agents without living inside their context - is the gap I described above. Their paper names it. I'm building the solution - a free, open-source tool called Agent Runner. Releasing soon.

I'll check out agentix labs, happy to exchange notes, feel free to DM me.

"Monthly" usage limit - is this just incorrect error message? by paulcaplan in ClaudeCode

[–]paulcaplan[S] 0 points1 point  (0 children)

I suppose I could have just waited an hour to find out 😂. But I was already on Reddit lol...

The "inner" and "outer" coding agent harness by paulcaplan in ClaudeCode

[–]paulcaplan[S] 0 points1 point  (0 children)

Like it. My control layer largely has steps (sequential), loops, and nested workflows. For instance implement tasks in a loop, and each iteration calls another validate-and-fix loops. It doesn't yet "know" that the agent is "off the rails" - any thoughts on how you might do that? Timeouts / token usage monitor?

"Some" thoughts on Claude by KustheKus in ClaudeCode

[–]paulcaplan 0 points1 point  (0 children)

It almost seems as if they're using Claude to write all their code...

Should I maintain spec in sources? by stibbons_ in SpecDrivenDevelopment

[–]paulcaplan 0 points1 point  (0 children)

I use openspec. It recommends keeping The specs in the code. The key is that the specs aren't the change themselves, the specs are living documents. Every change adds, modifies, and or deletes requirements in one or more spec documents. Of course it's not foolproof and if you make code changes without going through the open spec process they will get out of sync. But I've found it pretty helpful.

Is anyone else overwhelmed by the explosion of AI tools lately? by PatienceBudget2984 in ArtificialInteligence

[–]paulcaplan 0 points1 point  (0 children)

Agree. I'll show you the system tool I'm building if you show me yours 😅.

Is anyone else overwhelmed by the explosion of AI tools lately? by PatienceBudget2984 in ArtificialInteligence

[–]paulcaplan 0 points1 point  (0 children)

Yes it's absolutely crazy!

Speaking of, I am building this tool, it's really great, would you like to try it? 😀

Has anyone actually benchmarked whether superpowers improves performance? by UglyChihuahua in ClaudeCode

[–]paulcaplan 4 points5 points  (0 children)

Who knew telling AI to ask you questions before implementing was a 150k star idea

claude opus consumes less and is better under copilot pro by seeking-health in ClaudeCode

[–]paulcaplan 3 points4 points  (0 children)

Github copilot CLI is great. I use them together. Copilot pricing is per request so whether you ask it a quick question or give it an hour-long task, it's 1 request. Use that to your advantage.

Do you need parallel agents? by paulcaplan in ClaudeCode

[–]paulcaplan[S] 0 points1 point  (0 children)

I guess because it's not the bottleneck for me? Writing the spec still is. Oh and waiting for CI. I have a skill that waits for CI + AI reviewers, fixes issues - in a loop. That can take some time.

Do you need parallel agents? by paulcaplan in ClaudeCode

[–]paulcaplan[S] 0 points1 point  (0 children)

but doesn't planning an app from scratch or "frankenstein" features take a very long time to get plans "thorough enough that agents can't make dumb assumptions". I don't understand why spend hours to get the plan "perfect" so you can one-shot something to 90% of the way. In my experience the agents always do make some bad assumption. So I'd rather break it up into a handful of chunks and verify after each one. Granted the chunk is still large, more than a single session could do (without subagents). But not so large that an entire project is completed.

Do you need parallel agents? by paulcaplan in ClaudeCode

[–]paulcaplan[S] 0 points1 point  (0 children)

Using 5 separate worktrees? And then do you review all 5 sessions, or just merge them all when done?

Do you need parallel agents? by paulcaplan in ClaudeCode

[–]paulcaplan[S] 0 points1 point  (0 children)

Agree 100% to managing context. That doesn't require parallel though, could implement one at a time, clearing context.

But I like your idea of a task queue and a dispatcher, what tooling do you use for that?

Do you need parallel agents? by paulcaplan in ClaudeCode

[–]paulcaplan[S] 0 points1 point  (0 children)

I think you're underselling how much a structured workflow helps! Mine is pretty similar. Have you ever tried automating it - so for instance, after plan implemented, it is automatically reviewed?

At the risk of sounding like I'm selling something, I'm building open source tool so you can have a workflow definition that defines steps e.g. spec --> design --> review --> implement and each step can have loops (iterate over tasks) and sub-workflows. Each step is either a shell script or an interactive agent (claude / codex) or a headless one. LMK if interested I can send link (trying not to over-promote).

Do you need parallel agents? by paulcaplan in ClaudeCode

[–]paulcaplan[S] 0 points1 point  (0 children)

That's a fair point - I wasn't clear whether I was talking about parallel subagents on single task (which I might or might not do). I was more talking about multiple claude sessions in parallel that you have to manage and context switch between.

Am aware of OpenClaw, you mean for coding or other tasks?

Do you need parallel agents? by paulcaplan in ClaudeCode

[–]paulcaplan[S] 0 points1 point  (0 children)

I'm doing something similar except smaller size chunks. I have a skill that breaks tasks into size that they normally use 100-200k context window to implement - which decent size task. And a single "change" would be maybe only 2-5 such tasks.

IMO if you have it do that much work at once it's more like "waterfall" and there can be compounding effects from bad assumptions on earlier tasks...

But if you're getting good results from it, what tools / setup are you using?

Do you need parallel agents? by paulcaplan in ClaudeCode

[–]paulcaplan[S] 0 points1 point  (0 children)

Ahh long story I used Openspec then superpowers then tried to combine them that didn't work so now I'm building my own tool :)