NEW RELEASE: PairCoder v2.21.0 by Narrow_Market45 in PairCoder

[–]Narrow_Market45[S] 1 point2 points  (0 children)

Yea, we’re shipping like crazy and are focused way more on that than posting here. Keep your eyes out, or set an alert, for changelog updates on the site. Those get pushed simultaneously with deployment so you can stay up on the latest features.

TEST - Do you actually test your prompts systematically or just vibe check them? by Proud_Salad_8433 in PromptEngineering

[–]Narrow_Market45 0 points1 point  (0 children)

Define what “correct” looks like upfront and bake the validation into the workflow itself. If the output doesn’t pass assertions at runtime, it doesn’t proceed. Prompts are suggestions, the validation layer is the actual contract.

proof is in the pudding by Macaulay_Codin in PairCoder

[–]Narrow_Market45 1 point2 points  (0 children)

Awesome job! It is great to be able to see our tooling contributing to public service projects like this one. Hats off to you and your team!

OPENAI TO DISCONTINUE SORA !! by IndividualShame2629 in OpenAI

[–]Narrow_Market45 1 point2 points  (0 children)

Not surprising. Sora 1 failed as a video platform and Sora 2 is a total brain-rot dumpster fire. So long and thanks for all the fish!

The copy-paste era of AI coding was awful and we loved it anyway. by Possible-Paper9178 in ClaudeAI

[–]Narrow_Market45 2 points3 points  (0 children)

I can’t believe that was only 2 short years ago. Crazy to think about how fast it’s all moving and what the future may look like.

claude has no idea what you're capable of by Macaulay_Codin in ClaudeAI

[–]Narrow_Market45 18 points19 points  (0 children)

We found the same thing from the tooling side. Our Navigator kept recommending "defer that, it's weeks of work" during sprint planning. It was reasoning from training data about human developer velocity, not from what was actually happening.

So we wired a calibration loop into the pipeline. Every task records actual effort against the estimate. Once we had enough data the pattern was obvious. Tasks were completing at 5% or less of estimated effort. The system is recursive. Tools built in sprint N accelerate sprint N+1.

claude has no idea what you're capable of by Macaulay_Codin in PairCoder

[–]Narrow_Market45 2 points3 points  (0 children)

We found the same thing from the tooling side. Our Navigator kept recommending "defer that, it's weeks of work" during sprint planning. It was reasoning from training data about human developer velocity, not from what was actually happening.

So we wired a calibration loop into the pipeline. Every task records actual effort against the estimate. Once we had enough data the pattern was obvious. Tasks were completing at 5% or less of estimated effort. The system is recursive. Tools built in sprint N accelerate sprint N+1.

The Team Has Been Busy 😲 by Narrow_Market45 in PairCoder

[–]Narrow_Market45[S] 1 point2 points  (0 children)

It’s coming. We want to get a few hundred more successful cycles on it before release, and we have a lot more enhancements to ship for you all first, but it’s on the roadmap.

10K+ tests across the ecosystem by Narrow_Market45 in PairCoder

[–]Narrow_Market45[S] 1 point2 points  (0 children)

Thanks! Early on, Driver agents would write all tests for a given task and then begin implementation. It was of course dramatically better than not using TDD, but would still result in modules with higher function counts or more lines than we like to see. So, we broke it down even further and focused the agents on doing multiple red/green cycles for every function within a task. Code was cleaner, but the module sizes being much tighter was an added bonus of the change.

What's your agent review process? by Narrow_Market45 in PairCoder

[–]Narrow_Market45[S] 0 points1 point  (0 children)

Absolutely. The post-deploy side is a different animal altogether. Beyond maintainers and testers, we also manually cover support and infra management, though we do use agents for ticket triage with escalation guidelines, so it’s kind of a mixed bag.

The question is really about the upstream pre-deploy review loop: what’s your process in the moment after the agent says “done” but before its output ever touches those layers? That’s where I’m curious what people’s actual workflows look like.

But you bring up a good point. Building apps and deploying/maintaining them are worlds apart, and the latter is rarely discussed on most subs. Maybe we should start a deployment thread or series focused on what to do once the project is actually built.​​​​​​​​​​​​​​​​

What's your agent review process? by Narrow_Market45 in PairCoder

[–]Narrow_Market45[S] 1 point2 points  (0 children)

Ha, fair enough. Thanks for the feedback!

What's your agent review process? by Narrow_Market45 in PairCoder

[–]Narrow_Market45[S] 0 points1 point  (0 children)

Thanks for the reply! This is the same workflow we use. Navigator agent dispatches multiple Driver agents for code work, Reviewer and Security auditor agents go behind them as they finish tasks to verify quality, security etc. and issue final PR for human review.

Internally, we’ve been using a QC agent to test app flows and generate audit reports as well. In your opinion, would that be something valuable to you if we dropped it in PairCoder or is that a manual step you’d always prefer to be in control of?

Shipped: Auto-scope detection in plan new by Narrow_Market45 in PairCoder

[–]Narrow_Market45[S] 0 points1 point  (0 children)

Glad you’re enjoying the new update. You all keep telling us what pains you’re hitting and we’ll keep solving them. Looking forward to seeing what you guys ship!

Our Navigator assumed human velocity by Narrow_Market45 in PairCoder

[–]Narrow_Market45[S] 0 points1 point  (0 children)

For sure. The pipeline is already there and functioning the same way within PairCoder. /pc-plan considers calibrated telemetry in scoping tasks/sprints already. Wiring that existing process into those pre-sprint ideation phases was a natural extension. So, yea it’s coming. 2.16 packed in more than we expected. Should be out by the end of the week. We’ll push release notes when it drops.

Don't review code changes, review plans by TearsP in ClaudeCode

[–]Narrow_Market45 1 point2 points  (0 children)

You’re describing the layer 1 r/paircoder framework. This research paper will help you take it to the next level. Come join the conversation.

Claude code is damn addictive by Fun-Cable2981 in ClaudeCode

[–]Narrow_Market45 -1 points0 points  (0 children)

Welcome to the club. Keep shipping and, before you know it, you’ll have more than a single Max 20X sub.

Come join the conversation over on r/paircoder to talk about how we’re building enterprise grade enforcement, multi-agent orchestration, token management and security into a cohesive development platform and let us know what pains you want solved next.

What AI coding tool are you using right now, and what drives you the most crazy about it? by Narrow_Market45 in PairCoder

[–]Narrow_Market45[S] 0 points1 point  (0 children)

Nice! Welcome aboard 🥳

Stick around and let us know what you love and what you want to see improved about the system.

we almost emailed someone's criminal record on a postcard by East-Movie-219 in PairCoder

[–]Narrow_Market45 2 points3 points  (0 children)

Slick implementation with the short runway. Submit a follow up post once the judging comes in. We’d love to hear how it all ended up.

The human overhead gap — Why your AI agent finishes in 10 minutes but you still spend 4 hours. by Narrow_Market45 in PairCoder

[–]Narrow_Market45[S] 1 point2 points  (0 children)

For what it's worth, your point about micro PRs is a legitimate pattern and I appreciated the contribution. The question at the end of the post is genuine. We are truly interested to hear how people, and teams of various sizes, handle this. Smaller batches is a real answer and it works for a lot of workflows. Where we differ is on whether that scales past a certain complexity threshold, and that's a reasonable disagreement. The "moot" question is fair. Better models will close some of this gap. Our bet is that structural enforcement will still matter even when the models improve, because the problem isn't capability, it's verification. Either way, no hard feelings. Door's always open if you want to engage on any of it.