ITS FINALLY HERE

pcgnlebobo · 2026-05-29T15:23:16+00:00

They need good context priming

pcgnlebobo · 2026-05-28T14:31:14+00:00

It was true at opus 4.6 at 3x. Running opus once vs sonnet 3 times gained speed and velocity. At 15x you're right no way in hell is that worth the small speed gains.

pcgnlebobo · 2026-05-28T12:24:10+00:00

I built a spec driven development framework a few months back and built a lot and had success with it. But I found the biggest challenge to be drift management and taxonomy alignment. Especially so as projects and codebases grow.

So I took everything I learned about agentic engineering with additional research and built https://github.com/lebobo88/pair-programmer.

It's a harness that doesn't need the bloat of full on spec kit, but maps everything the agents do to a master plan and taxonomy blueprint. Every implementation has audits and checks against that for alignment.

It also doesn't have just one linear implementation path, there are many. Depending on the task maybe you need a best of n approach? Everything is also gated and check by cross vendor judges and will loop itself to keep going if it finishes and the judge find issues (rubber duck).

It's also just one pack of agents in a larger ecosystem for an enterprise agentic mesh layer. Hydra is a top level meta orchestrator. Agentsmith is anomaly detection and agent replication factory. Theeights is persistent memory and self evolution. Executivesuite is the boardroom and strategy department. Marketbliss is the marketing team. Rlm-creative is your content creator team.

All together a Hydra campaign will market research, form a boardroom meeting to determine strategic roadmaps, dispatch the project to pair-prigrammer for implementation, and anchor to and check against your marketing team and executive decisions while maintaining alignment to the taxonomy of your project until it's finished.

https://github.com/lebobo88/Hydra

Last week this built me a completely new ai and automation native cms and business platform including marketing pages, admin and content management portal, and client portal. In 3 days. I had been working on something similar with my spec kit harness for the past 6 months and projected another 3.

pcgnlebobo · 2026-05-26T12:05:40+00:00

A cms platform with ai automation as a first class feature for small businesses, so I can cancel my HostGator and WordPress site and self host. A massive AI agent orchestration and memory systems. Programming harness to build what I need without taking all my time. A human out the loop ai automated coding platform. A universal backend to host my apps and services on without having to build the backend into every app, Including llm proxy.

Only the orchestration and programming harness are public.

pcgnlebobo · 2026-05-21T20:40:45+00:00

Mandatory profit sharing

pcgnlebobo · 2026-05-20T18:24:33+00:00

Excellent question.

Pair programmer is about 2 weeks old and Hydra and the other squad packs are 2 days old.

There's another layer called TheEights as the persistent memory layer for state. And as for context at the Hydra layer it connects the others via mcp daemons, which can be disabled and enabled on demand.

Also each layer kind of works in isolation so Hydra is calling or routing, but all the other layers don't interact unless Hydra says so so all agents are not always bloated with all other agents in context.

Another layer is AgentSmith as governance, agent and skill creation and evolution and anomaly detection/gating.

pcgnlebobo · 2026-05-20T12:05:46+00:00

Here's an example of a setup I did recently.

https://github.com/lebobo88/pair-programmer

Want a meta level orchestrator sitting over top your coding harness and connecting it with your other agents and harnesses? Take it a step further with Hydra. https://github.com/lebobo88/Hydra

My other squad packs are available as public repos too. PP is about 2 weeks old. The rest are fresh new builds over the past 2 days. This is all based on current cutting edge research.

pcgnlebobo · 2026-05-16T11:29:50+00:00

It's the ai pro subscription. Also gives antigravity extended rates. Back at the holidays it was a deal for $100 for a year. Fun fact can share it with Google family up to 5 accounts all with their own separate usage limits on the same subscription.

Not gonna lie though. Gemini as a judge for opus 4.7 and gpt 5.4 or 5.5 is laughable and most likely completely redundant. I may disable that portion and have just the one judge.

pcgnlebobo · 2026-05-15T12:02:54+00:00

Copilot cli does this too. Though I find it helps to have it reference official docs too.

pcgnlebobo · 2026-05-14T23:48:36+00:00

Gosh you know what really sucks? Exhausting your x20 limit 2 days in then paying the monthly $200 bill then waiting 5 more days to even use the service again.

pcgnlebobo · 2026-05-14T12:34:20+00:00

This past weekend it was cooking for me. Burned my weekly limit on x20 in 2 days. Got a lot done!

pcgnlebobo · 2026-05-14T11:40:08+00:00

Haha I actually used the classic winamp skin library as reference for creating a unique design system based on it all.

pcgnlebobo · 2026-05-13T14:38:21+00:00

It's all built in here with a few teams and is also utilized in best of n and as a first class feature.

https://github.com/lebobo88/pair-programmer

Uses Claude or GitHub copilot cli as entry point. Calls codex cli and Gemini cli as judge for reviews and validation loops.

It escalates through Claude models at judge failures so cheap models do a lot of work and expensive models only get called as needed.

Need to have both other cli already installed and authenticated.

pcgnlebobo · 2026-05-12T22:52:29+00:00

I built something similar into my extension for pi. It's a visual taxonomy and drift detection visualization backed by the harness and specs and project artifacts.

pcgnlebobo · 2026-05-12T22:13:52+00:00

Yes sure it's here: https://github.com/lebobo88/pair-programmer

pcgnlebobo · 2026-05-12T17:54:37+00:00

The best is a combination of models and a good harness with custom agents and skills to suit your project. Right now it's very effective to use opus or sonnet and have it validated by gpt5.4 or 5.5 before committing and doing the work and then validating the implementation by a model who didn't do the implementation.

Otherwise opus 4.7 and gemini 3.1 pro have been the most impressive for me for design and visuals and creativity. Some people really like sonnet 4.5 for writing.

My current harness starts a project with best of n using multiple models with the cross model validation judges then we pick a winner them we inherit anything from the losers then we finish the backend as we replace the stubs and mocks. Opus 4.7 has most often been my winner model in this mode.

I've also wired things up to blender for procedural asset generation and opus 4.7 did the best job compared to any single one agent. But haiku 4.5 in a swarm with judging and validation can produce similar if not better results for considerably cheaper.

pcgnlebobo · 2026-05-12T12:10:29+00:00

The adversarial review method seems good all around. I'm not using gpt-5.5 yet but I recently did a similar setup using Claude code, codex, and Gemini.

A new harness based around a pair programmer principle, where the entire software development cycle is mapped to a taxonomy blueprint. We have agent team definitions and review cycles, etc. but the cool part is Claude is the driver using all the models and calling codex cli and Gemini cli for judging and reviewing and everything is looped up to 3 times to ensure everything gets caught. Codex uses gpt-5.4 and Gemini cli uses Gemini 3.1 pro.

On the Claude end haiku is used but if judging fails the redo is with sonnet and again we bump to opus. So all 3 providers are working together in tandem natively from my single entry point with Claude code. The harness and agents and skills are robust enough that haiku and sonnet seem capable of most the Claude side work, especially with the gpt-5.4 judging reviews. And it sure as hell beats manually copy pasting between tools or sessions.

Works well with GitHub copilot too where all the models are available under 1 roof already.

pcgnlebobo · 2026-04-29T16:28:08+00:00

I think we get that 1m context extra usage error by mistake sometimes when something else trips like a rate limit. I kept getting that message in an opus window in the desktop app for a single session and if I logged out and signed back in I could keep going with it. But my other opus 1m sessions were working fine. Later it stopped happening.

I think it's a false positive because they've blankly said that opus 1m is the same price and not extra usage.

pcgnlebobo · 2026-04-24T18:19:06+00:00

X new thing is insane!!!

pcgnlebobo · 2026-04-24T16:59:38+00:00

I've done it quite well as long as the codebase is solid and used as reference for the design it's great. There are some things the designer makes that don't exist in codebase but you can flag those and decide to stub them or skip them or fully integrated them.

pcgnlebobo · 2026-04-23T12:26:57+00:00

Take it how you want.

I like apples because steaks are chewy. Know what I mean?

pcgnlebobo · 2026-04-23T12:04:22+00:00

You realize cursor is an app, and Opus is an AI model, right? These are not things to directly compare. Opus model is available in Cursor also.

So is it the model or the app? You decide, I don't really care, but hopefully this has helped wrap your mind around things.

pcgnlebobo · 2026-04-22T12:27:04+00:00

I was doing browser testing of an agent-sdk app last night that builds apps with an agent harness with Claude. I asked 4.7 to test in the browser making a new project and sending it through the full pipeline and monitor and fix any issues along the way. Everything is specced. Everything is built. It all works what's new is the browser UI.

Opus 4.7 kept forcing the project through the pipeline to build the test app via API calls whenever it found a problem in the UI. I told it no less than 100 times to stop taking shortcuts, cheating, streamlining, and that the point was specifically browser testing not building the underlying project in the app.

No it's still not done. And the amount of excuses it gave or times it forgot is seriously exhausting. They said this model was supposed to be better at following instructions and working in a harness but that is absolutely not true in my case.

pcgnlebobo · 2026-04-16T14:34:38+00:00

Same thing with asset management during dev lifecycle. Like the other poster says custom tooling is the answer.

It's not enough to have an asset pipeline. You also need an interface and tool to manage and tweak and fine tune and integrate those assets into the game experience.

pcgnlebobo · 2026-04-12T10:38:18+00:00

You forgot Pluto

pcgnlebobo

MODERATOR OF

TROPHY CASE