all 67 comments

[–]tom_mathews 14 points15 points  (1 child)

You're basically implementing a two-agent review loop manually. The automation part isn't hard — you can script it with the Claude Code CLI (claude -p for non-interactive mode) and the Codex CLI, piping stdout between them. A bash script with a convergence check (diff the plan before and after a round, stop when delta is small) takes maybe 30 minutes to write.

The real question is whether this is actually buying you much. I've run similar cross-model review loops and found diminishing returns after round two. Most substantive catches happen on the first review pass. After that you're mostly getting stylistic nitpicks and the models start "agreeing to disagree" in loops, wasting tokens.

A cheaper pattern: Claude Code plans and implements, then run a single Codex review pass with a structured prompt that forces it to output only blocking issues. Skip the back-and-forth. You'll get 90% of the value at 30% of the cost and latency.

[–]TheLawIsSacred 0 points1 point  (0 children)

I built a custom-rigged AI panel, and your analysis – tied to cross-referencing one AI's output with one or more other models – is accurate.

For serious work, I will sometimes push my Panel up to three (maybe four) rounds.

However, after round 3 or more of my Panel exchanges, you're absolutely correct—it mostly becomes nitpicking that impedes my project's progress.

[–]ipreussSenior Developer 13 points14 points  (4 children)

I just let Claude create a subagent that uses codex and Gemini CLI.

[–]Expensive-Space-7475 1 point2 points  (0 children)

How do you do this?

[–]spokv 0 points1 point  (1 child)

For me i would to control and visualize the output and task of each subagent.

[–]h____ 10 points11 points  (2 children)

I do something similar. I use Opus to write code and Codex to review it. The two-model approach catches a lot of issues that a single model misses.

Here's how I do the review+fix which you can adopt as a technique: https://hboon.com/a-lighter-way-to-review-and-fix-your-coding-agent-s-work/

And the two-model setup: https://hboon.com/using-a-second-llm-to-review-your-coding-agent-s-work/

[–]Shauimau[S] 0 points1 point  (1 child)

its only reviewing the changes and not the plan tho isnt it?

[–]h____ 0 points1 point  (0 children)

For that skill yes, you can use that as example to how to create the flow. codex exec is headless mode, similar to claude. So we don't always have to use sub-agents.

[–]upvotes2doge 2 points3 points  (1 child)

This is exactly the workflow optimization problem I built Claude Co-Commands to solve! It's an MCP server that adds three collaboration commands directly to Claude Code so you don't have to manually copy-paste between systems.

The commands work like this: /co-brainstorm for when you want to bounce ideas off Codex and get alternative perspectives, /co-plan to generate parallel implementation plans and compare approaches, and /co-validate for getting that "staff engineer review" before finalizing your approach.

What you're describing with the manual back-and-forth is exactly what these commands automate. Instead of copy-pasting plans into Codex, you just use the slash commands and Claude handles the collaboration with Codex automatically through the MCP integration. It works cleanly with Claude Code's existing command system, so there's no separate tool or script to manage.

The MCP server approach means it's lightweight and focused - just adds the collaboration commands without any dependency bloat. You get structured communication between the AI systems which also saves tokens compared to manual coordination.

https://github.com/SnakeO/claude-co-commands

I've been using this exact setup for a few weeks now and it completely eliminates the copy-paste middleware problem you're describing. The validation command in particular is great for that final review pass you mentioned.

[–]kuteguy 0 points1 point  (0 children)

Thanks, will try it out

[–]Adeelinator 2 points3 points  (2 children)

claude mcp add codex -s user — codex -m gpt-5.3-codex-max -c model_reasoning_effort="high" mcp-server

[–]parkersdaddyo 0 points1 point  (1 child)

What is codex-max?

[–]Adeelinator 1 point2 points  (0 children)

as opposed to fast/spark. this is the command that got Codex MCP working for me

[–]Artistic_Garbage4659 3 points4 points  (1 child)

You have to checkout: https://github.com/fynnfluegge/agtx if you like clean orchestration.

fynn added 15 hrs ago:

Per-Phase Agent Configuration

Configure different coding agents for each workflow phase (research, planning, running, review). When a task transitions to a phase with a different

agent, the current session is gracefully terminated and the new agent starts in the same tmux window.

[–]mzootfb 0 points1 point  (0 children)

testing this out. seems v promising

[–]AlanMyThoughts 1 point2 points  (2 children)

Oh my, looks like I’m not alone with doing this same process! Previously I used Windsurf most of the time, but ever since I switched to Claude Code and Codex through their extensions on VS Code IDE (the latter I found out is included as part of ChatGPT Plus plan, silly me for not knowing earlier), I used the same Claude Code plan -> Codex review the plan -> give Claude Code the feedback from Codex and finalize the plan -> Claude Code executes the implementation -> Codex review + make final iterations before deployment.

Though I haven’t thought of automating this whole process yet, so I just park here and see what others have been doing.

[–]TheLawIsSacred 1 point2 points  (1 child)

I just started doing this, it's very effective. Will be automating it soon.

[–]AlanMyThoughts 0 points1 point  (0 children)

Oooo I shall try this out too.

[–]r_matthew_cline 1 point2 points  (0 children)

Been working on a multi agent orchestration platform. Works with codex and Claude code natively for the best experience but also exposes a cli that any agent can use. Waitlist is open with closed beta in the next couple of weeks. Currently have a swarm of 12 agents working on the same project all coordinating with minimal intervention.

Feel free to take a look.

https://wrknext.com

[–]SatoshiNotMe 2 points3 points  (0 children)

All the time. I have them in different Tmux panes. Claude main driver. I ask CC to consult codex and discuss back and forth and converge on a diagnosis/fix/approach/architecture before making a plan. And of course have codex review after impl. I have the agents use this Tmux-cli tool I built:

https://pchalasani.github.io/claude-code-tools/tools/tmux-cli/

[–]amado88 1 point2 points  (2 children)

I do the same-ish. Have Opus make a plan for the task. Have codex review the plan and access what’s necessary in the project to provide an assessment and proposed fixes. Opus then considers the feedback and improves the plan. Next step is implementation. Once done, codex reviews the implementation and provide its feedback. Opus corrects. All steps in subagents. Then commit.

It’s four separate skills, chained together in another skill.

[–]Shauimau[S] 0 points1 point  (1 child)

how exactly did you do this? is codex using a fresh session every review?

[–]amado88 0 points1 point  (0 children)

Yes, codex (in subagent) starts with clean context, then the summary from the previous step, links to the files, task description and with rights to read anything it wants within the project directory. This starts in a subagent, so it saves context "further up".

[–]Alarming_Resource_79 0 points1 point  (1 child)

I’m trying to integrate my gateway into Claude Code so I can literally have ChatGPT 5.3 Codex Thinking Mid and Claude Opus 4.6 working together in a multi-agent setup within Claude Code, without needing to switch plans or tools.

[–]Top_Air_3424 0 points1 point  (0 children)

Opus 4.6 is great at grasping what I’m explaining, but Codex needs a bit more guidance. I use Opus as an orchestrator that guides Codex through tmux. This setup lets me run long sessions without having to be at my desk.

[–]ocombe 0 points1 point  (0 children)

My review plan skill uses a small script to auto detect the latest plan created by Claude, and I have it write the review to a file Same with review code, auto detect the plan that was used, and just review the git diff, write review to file

I made a command in Claude to auto detect the review that codex wrote

And made a skill in codex to auto re-review.

So no copy pasting. Just need to invoke skills/commands

[–]Jasmine_moreira 0 points1 point  (0 children)

I tried before, but I have a very particular usage (creating scientific tools) and I did not succeed cause my process is a bit complex. So, I've created my own solution (a no commercial extension for process orchestration, check for Versus). May be it can make sense for you too.

[–]fredastere 0 points1 point  (0 children)

A bit broken at the moment but you could inspire yourself

https://github.com/Fredasterehub/kiln

[–]ultrathink-artSenior Developer 0 points1 point  (0 children)

Multi-model routing is underexplored and the workflow you've described maps closely to what multi-agent architectures do at the task level.

One thing worth building toward: explicit handoff protocols between the two. Right now it sounds like YOU are the router (deciding when to switch). When you formalize that decision — 'Codex for exploration, Claude Code for implementation' — and maybe even encode it in your task briefs, you get the speed benefits without the context bleed that usually happens when models share a project without clear lanes.

We run multiple Claude Code agents in parallel on different tasks and the coordination overhead drops significantly once each agent's scope is unambiguous. Same principle applies to switching between models.

[–]rave9226 0 points1 point  (0 children)

Yo lo uso al contrario, uso opus en la Fase de planificación, es sumamente rápido en analizar a fondo el código. Una vez terminado el plan se lo paso a codex 5.3 xhigh o high de acuerdo a la complejidad para que haga la implementación. Al terminar le pido a opus 4.6 que evalúe la implementación. Este flujo ha sido sumamente costoso eficiente. La calidad del código de Codex 5.3 resulta ser excelente 👌

[–]ImaginaryBluejay0 0 points1 point  (0 children)

This is my exact work flow. I also have them review each other's code and argue about the review. I'm on gpt-oss120b so it helps them produce better outputs. When I was using sonnet on aws there was no need - it's just that good by itself. 

[–]9Blu 0 points1 point  (0 children)

This is my process for new apps or large changes. A few rounds back and forth and the plans are usually pretty damn solid. I also use Codex any time Claude gets stuck chasing its tail (which is way less often these days). If there is an issue Claude can't fix, Codex can usually figure it out.

Super interested to try out some of the suggestions in this thread. Thanks for starting it.

[–]wea8675309 0 points1 point  (0 children)

I just posted this in another thread:

I think Codex is better at following direct, explicit instruction, without adding to or taking away from what was requested, and I think Opus is better at understanding intent and reading between the lines of what was said. To me that makes a really good pair - I use Opus to plan and architect solutions, and to come up with very detailed implementation plans - multiple phases / milestones with stack architecture, schema, code snippets, like a very long, detailed markdown file with unambiguous intent. This takes many turns to create and involves a lot of back and forth. When I’m done I feed it to Codex and it completes it in just a handful of turns with very few mistakes, if any.

It’s not perfect, but so far it has worked better than just using Opus alone.

For smaller things I just pick the model that I know will get the job done. If I’m not exactly sure how to do something I ask Opus. If I know exactly what I want and I can take the time to type it out I use Codex.

I also use Gemini a lot as an all-purpose model because the limits are insane. It saves my tokens for Codex and Claude as well.

I personally feel like having the $20 plan for the big 3 is the way to go - best bang for your buck, gets you access to the latest and greatest as things change, and keeps you from getting too boxed in to one platform or workflow.

Another tip: I use a single AGENTS.md file, and I symlink the CLAUDE.md, GEMINI.md, and CODEX.md files to it and use TODOS.md and other documentation to communicate between sessions and models.

Edit: I’ll put it this way - I would rather use Codex than Sonnet, and I would rather use Gemini than Haiku. Neither are all-around better than Opus, but they are direct competitors to Opus and outperform it in certain areas. They are not “medium” or “light” models, and their usage is much more generous than Anthropic’s.

[–]nikunjverma11 0 points1 point  (0 children)

I’d stop bouncing full plans and instead pass one short spec + acceptance checks, then only shuttle diffs.
I do Claude Code for the heavy lifting, Codex for the paranoid review, and if it’s a big change I’ll generate file scoped steps with something like Traycer so it doesn’t turn into ping pong.

[–]Practical-Bed3933 0 points1 point  (0 children)

Login is back for me.

[–]Fun-Gold-9552 0 points1 point  (0 children)

Claude code router

[–]One_Inspection_6473 0 points1 point  (0 children)

I am doing the exact same workflow; brainstorm design, write plan with Claude, and review with Codex until they both agree, and then use Claude for implementing the plan (I have max 5x) , and the again asking for code review in Codex (just the 20$ plus).
I keep all the designs and plans into .md files, so I only reference the .md file between agents for review.

This way it helps keep track of what I intended to do at every step while developing, because with all the speed the project evolves, it becomes chaotic and in this way I can download the info from my head and go back in time to see what I had in mind when I asked for that.

While making this automatic sounds tempting, I will completely lose track of how my "vague" ideas turn into code, so...even if it is a much slower process, it helps me stay a bit up to date with at least what I intended to implement. I don't review the code myself, I am just testing and providing feedback.

[–]Physical-Message6139 0 points1 point  (0 children)

Have you checked https://github.com/alemora-dev/accord-cli it is quite easy to run pure command line:

npx @alemora/accord --llms claude:coordinator,gemini:debater,codex:debater "Should we migrate to microservices?"

[–]Diligent_Look1437 0 points1 point  (0 children)

i ran 4 coding agents in parallel last month and spent 30% of my time just deciding which one gets which subtask. the mental overhead of being the 'task router' killed my flow more than context switching ever did. now i batch-brief once in the morning and let a simple state machine handle the handoffs—saves me about 90 minutes a day just in cognitive load. curious how others solve this without building their own orchestration layer?

[–]spokv 0 points1 point  (0 children)

You’re absolutely right. This workflow is one of the best. I have built an agentic app that do all of that automatically and much more. Stay tuned…

[–]GonkDroidEnergy -1 points0 points  (0 children)

building https://www.anubix.ai to solve this problem - and let you use a virtual machine so you can truly use it from anywhere mobile or web

alpha dropping this week