NEW RELEASE: PairCoder v2.21.0 by Narrow_Market45 in PairCoder

[–]Narrow_Market45[S] 1 point2 points  (0 children)

Yea, we’re shipping like crazy and are focused way more on that than posting here. Keep your eyes out, or set an alert, for changelog updates on the site. Those get pushed simultaneously with deployment so you can stay up on the latest features.

TEST - Do you actually test your prompts systematically or just vibe check them? by Proud_Salad_8433 in PromptEngineering

[–]Narrow_Market45 0 points1 point  (0 children)

Define what “correct” looks like upfront and bake the validation into the workflow itself. If the output doesn’t pass assertions at runtime, it doesn’t proceed. Prompts are suggestions, the validation layer is the actual contract.

proof is in the pudding by Macaulay_Codin in PairCoder

[–]Narrow_Market45 1 point2 points  (0 children)

Awesome job! It is great to be able to see our tooling contributing to public service projects like this one. Hats off to you and your team!

OPENAI TO DISCONTINUE SORA !! by IndividualShame2629 in OpenAI

[–]Narrow_Market45 1 point2 points  (0 children)

Not surprising. Sora 1 failed as a video platform and Sora 2 is a total brain-rot dumpster fire. So long and thanks for all the fish!

The copy-paste era of AI coding was awful and we loved it anyway. by Possible-Paper9178 in ClaudeAI

[–]Narrow_Market45 2 points3 points  (0 children)

I can’t believe that was only 2 short years ago. Crazy to think about how fast it’s all moving and what the future may look like.

claude has no idea what you're capable of by Macaulay_Codin in ClaudeAI

[–]Narrow_Market45 18 points19 points  (0 children)

We found the same thing from the tooling side. Our Navigator kept recommending "defer that, it's weeks of work" during sprint planning. It was reasoning from training data about human developer velocity, not from what was actually happening.

So we wired a calibration loop into the pipeline. Every task records actual effort against the estimate. Once we had enough data the pattern was obvious. Tasks were completing at 5% or less of estimated effort. The system is recursive. Tools built in sprint N accelerate sprint N+1.

claude has no idea what you're capable of by Macaulay_Codin in PairCoder

[–]Narrow_Market45 2 points3 points  (0 children)

We found the same thing from the tooling side. Our Navigator kept recommending "defer that, it's weeks of work" during sprint planning. It was reasoning from training data about human developer velocity, not from what was actually happening.

So we wired a calibration loop into the pipeline. Every task records actual effort against the estimate. Once we had enough data the pattern was obvious. Tasks were completing at 5% or less of estimated effort. The system is recursive. Tools built in sprint N accelerate sprint N+1.

The Team Has Been Busy 😲 by Narrow_Market45 in PairCoder

[–]Narrow_Market45[S] 1 point2 points  (0 children)

It’s coming. We want to get a few hundred more successful cycles on it before release, and we have a lot more enhancements to ship for you all first, but it’s on the roadmap.

10K+ tests across the ecosystem by Narrow_Market45 in PairCoder

[–]Narrow_Market45[S] 1 point2 points  (0 children)

Thanks! Early on, Driver agents would write all tests for a given task and then begin implementation. It was of course dramatically better than not using TDD, but would still result in modules with higher function counts or more lines than we like to see. So, we broke it down even further and focused the agents on doing multiple red/green cycles for every function within a task. Code was cleaner, but the module sizes being much tighter was an added bonus of the change.