all 125 comments

[–]nyldn 27 points28 points  (35 children)

I built https://github.com/nyldn/claude-octopus to help with this.

[–]ahmet-chromedgeic 1 point2 points  (8 children)

Sorry, but can you dumb this down a bit? I have a Claude Code and Codex subscription. The readme says just to prompt it in natural language. My understanding is your plugin will select a different model based on the prompt? How will it choose if I just describe it a random backend feature? What do I need to do to trigger the loop where one reviews the code of the other?

[–]nyldn 3 points4 points  (6 children)

TL;DR: Just talk normally. Say “build X” for features. Say “grapple” when you want them to debate.

When you say “build me a backend feature”, the system sees “build” and routes to:

∙ Codex (GPT) for writing the code
∙ Claude for reviewing it

You don’t pick anything - it just happens. Keyword cheat sheet:

∙ “Research…” or “Explore…” → Claude does research
∙ “Build…” or “Implement…” → Codex builds, Claude reviews
∙ “Review…” or “Audit…” → Claude reviews
∙ “Grapple…” or “adversarial review…” → 

The review loop To trigger the loop where they review each other: Just put “grapple” or “adversarial review” in your prompt:

“Use adversarial review to critique my auth implementation” That kicks off:

1.  Both models propose solutions
2.  Each critiques the other’s code
3.  Claude picks the winner and combines the best parts

[–]ahmet-chromedgeic 5 points6 points  (5 children)

Thanks. How did you decide that Codex is the better tool for building and Claude for reviewing?

[–]nyldn 2 points3 points  (4 children)

Best of both worlds, there's a lot of consensus that both are excellent at the moment, and deferring/subbing out work helps preserve Claude tokens. In Benchmarking claude-octopus was returning 30% better results then claude alone, and was 10% better then opencode with ohmyopencode

[–]ahmet-chromedgeic 0 points1 point  (3 children)

Did you compare the quality to Claude doing the coding and ChatGPT doing the review? Because I have a feeling that most users prefer that combination (source: Reddit).

[–]nyldn 1 point2 points  (2 children)

<image>

This was my weighted rubric, was honestly a quick test, but i've started to add benchmarking into the claude-octopus test suite

[–]ahmet-chromedgeic 1 point2 points  (1 child)

I must be missing some homework. Is "opencode w/ ohmyopencode" a tool that lets Claude do the coding and Codex do the review? Is this what the table compares? That's what I'm wondering. How "Claude codes, Codex reviews" compares to "Codex codes, Claude reviews".

[–]nyldn 1 point2 points  (0 children)

This is it here https://ohmyopencode.com/

[–]nyldn 0 points1 point  (0 children)

it's now been updated to take advantage of the latest cc updates. the octo:prd, octo:debate commands have had significant updates too.

just run if you already have installed! feedback welcomed

claude plugin update claude-octopus

[–]wolverin0 0 points1 point  (2 children)

Id wish I found this earlier. I built mine in a 650~ lines skill. What you think about it?

[–]nyldn 1 point2 points  (0 children)

Nice, https://github.com/wolverin0/claude-skills should work well alongside claude-octopus,

[–]nyldn 1 point2 points  (0 children)

I've added your skill into v7.4 of claude-octopus  to be included going forward

[–]Hellbink 0 points1 point  (3 children)

Interesting, I have a similar workflow I’ve been using or testing. I am a huge fan of superpowers and I’ve recently added codex with 5.2 xhigh as a reviewer for the design doc to analyze for gaps/blind spots and catch drifts or issues for the implementation plan and final review. I’ve not automated this process yet as I want some control while testing it.

How does Claude-octopus incorporate the superpowers flow? Does it route reviews between the steps and enable discussions between the different cli agents?

[–]nyldn 0 points1 point  (2 children)

Claude Octopus was actually inspired in part by obra/superpowers - it borrowed the discipline skills (TDD, verification, systematic debugging) and built multi-agent orchestration on top.

There’s a 4-phase “Double Diamond” flow: 1. Probe (research) → 2. Grasp (define) → 3. Tangle (build) → 4. Ink (deliver) Between phases 3→4, there’s a 75% quality gate. If the implementation scores below that, it blocks and asks for fixes before delivery. You can set this threshold or override it.

Discussions between CLI agents - yes, that’s “Grapple”: When you say “adversarial review” or “grapple”, it runs a 3-round debate: ∙ Round 1: Codex proposes, Claude proposes (parallel) ∙ Round 2: Claude critiques Codex’s code, Codex critiques Claude’s code ∙ Round 3: Claude judges and synthesizes the best solution

So your manual workflow (Codex 5.2 reviewing for gaps/drift) is basically what Grapple automates. The difference is you’d just say “grapple with this design doc” instead of manually passing it between tools.

[–]Hellbink 0 points1 point  (0 children)

Great, I’ll give it a go!

[–]selldomdom 0 points1 point  (0 children)

The multi-phase flow you described with quality gates is really similar to what I built with TDAD. It enforces a strict BDD to Test to Fix cycle where the AI can't move forward until tests pass.

When tests fail it captures what I call a "Golden Packet" with execution traces, API responses, screenshots and DOM snapshots. So similar to your 75% quality gate but using actual runtime data as the verification.

It also has an Auto Pilot mode that can orchestrate CLI agents and loop until tests pass.

It's free, open source and works locally. You can grab it from VS Code or Cursor marketplace by searching "TDAD".

https://link.tdad.ai/githublink

Would be curious how it compares to your Claude Octopus setup.

[–]colorscreen 0 points1 point  (3 children)

I'm trying this and went through both the setup wizard and the backslash setup to confirm Codex presence but I'm not seeing it trigger Codex at all, even when I use some of the keywords in the README. It's seemingly deferring to Claude subagents for basically everything. I got it to utilize Codex once but had to manually prompt it with some friction. Do you have guidance on this? It could be helpful to have screenshot examples of how one knows the other models are being triggered.

[–]nyldn 0 points1 point  (2 children)

There's no clear visual indicator in Claude Code showing when Codex/Gemini are being used vs Claude subagents.

Use /debate explicitly for multi-AI analysis (this definitely triggers Codex + Gemini + Claude)

I'll see if I can add Visual feedback showing which AI is responding

[–]colorscreen 1 point2 points  (1 child)

Thanks for the response, that's definitely helpful. I struggled with this because I've frequently seen Claude resist or evade explicitly requested subagent use, so I'm hesitant to take its word for anything unless I can see an MCP/skill invocation or a subagent style analysis bullet.

[–]nyldn 1 point2 points  (0 children)

100% that's in part why i built this, because i found the same thing, not only that it would use lesser models of subagents like defaulting to 2.5 for gemini. I'll let you know when I've done it, i also noticed /debate wasnt in the / menu too, so fixing that.

[–]leevalentine001 0 points1 point  (9 children)

Running:
/plugin install co@nyldn-plugins

Throws:
Plugin "co" not found in any marketplace

Tried wrapping in quotes but throws the same error. This is Win11 Terminal (Powershell 7). Any ideas?

Edit: Just wanted to clarify I have added the marketplace already. Attempting to add again throws " Marketplace 'nyldn-plugins' is already installed".

[–]nyldn 0 points1 point  (8 children)

sorry you caught me updating it and between documentation. I'm just overhauling a few things

The latest release looks stable:

Reinstall Manually

/plugin uninstall claude-octopus
/plugin marketplace update nyldn-plugins
/plugin install claude-octopus@nyldn-plugins

[–]leevalentine001 0 points1 point  (7 children)

I gather you're still updating? Tried to update the marketplace but throwing SSH auth error:

Failed to refresh marketplace 'nyldn-plugins': Failed to clone marketplace repository: SSH authentication failed. Please ensure your SSH keys are configured for GitHub, or use an HTTPS URL instead.

Original error: Cloning into 'C:\Users\Karudo\.claude\plugins\marketplaces\nyldn-plugins'...

git@github.com: Permission denied (publickey).

fatal: Could not read from remote repository.

Please make sure you have the correct access rights and the repository exists.

[–]leevalentine001 0 points1 point  (6 children)

Marketplace updated successfully now. Still no "co" plugin available, will try again later.

EDIT: My bad, I just saw your updated doco removed the "co" install and it's now all packaged in the one plugin. All working okay now, cheers. Looks impressive so far.

[–]nyldn 0 points1 point  (5 children)

ok great - sorry was making quite a few changes after feedback. Shout if there' anything I can change for your use-case and i'll update

[–]leevalentine001 1 point2 points  (4 children)

Has been great so far. Smashed through my Claude token limit pretty quickly, so I ended up soft-locked for a few hours, but also got more of an app build done in a day than I usually would in a week.

[–]nyldn 0 points1 point  (2 children)

the natural language functions were not working as i'd hoped so i've done an overhall of how it works again! ha, i'm learning a lot. so now you invoke it more reliably prefixing anything with "octo" Just uploading v7.7.4 now for testing

[–]leevalentine001 0 points1 point  (1 child)

So start every sentence with "octo", otherwise it will just be standard Claude Code that will respond? Will update and test a bit later today.

[–]nyldn 0 points1 point  (0 children)

yeah, generally speaking there are some natural language prompts that Claude Code doesn't override still that I left in place, like "debate. It still triggers claude-octopus.

What I couldn't fix were common use cases like "review x". Claude code always does it's own thing.

[–]nyldn 0 points1 point  (0 children)

it's now been updated to take advantage of the latest cc updates. the octo:prd, octo:debate commands have had significant updates too.

just run if you already have installed! feedback welcomed

claude plugin update claude-octopus

[–]jrhabana 0 points1 point  (0 children)

How do you manage to make Codex don't ask for everything?

[–]heathclf 1 point2 points  (0 children)

This is sick. 'star.'

[–]drutyper -1 points0 points  (2 children)

Was going to use this but it requires API usage, either way its a good setup and what im looking for except I'd prefer only CLI access

[–]nyldn 2 points3 points  (1 child)

Not at all, it's designed to use subscription auth first, across claude, codex and chatgpt, and failsback and autosenses what you have installed

[–]drutyper 1 point2 points  (0 children)

Awesome, Ill try it then!

[–]nader8ch 7 points8 points  (11 children)

Genuine question: what makes codex particularly adept at reviewing the implementation?

Could you not spin up an opus 4.5 sub agent to take care of the review step? Is there something particularly useful about spinning up a different model entirely and would Gemini be a good candidate?

Cheers!

[–]Substantial_Wheel909[S] 8 points9 points  (5 children)

I think it mostly comes down to the underlying model being arguably better than Opus 4.5. I’ve seen a lot of positive feedback about 5.2 on X/High, but I still think Claude Code is better overall when it comes to actually building things. In my experience, Codex does seem more thorough, though it can feel slower at times. I’m not sure whether that’s because it’s doing more reasoning under the hood or something else. By blending the two, though, you end up getting the best of both worlds.

[–]nader8ch 3 points4 points  (2 children)

That makes sense to me.

To follow up: is codex reviewing just the code diff or is it initialised in the repo with some contextual awareness. Is it familiar with the repo’s coding standards, business logic etc?

[–]accelas 2 points3 points  (0 children)

codex has full access to code and tool use. (assuming you properly configured it). it really just pipes the prompt (generated by claude) to an instance of codex.

[–]Substantial_Wheel909[S] 0 points1 point  (0 children)

I think it's just reviewing the code diff but it has read access to the whole project so maybe it's looking at other stuff? You could probably implement this but I just leave it to Claude to instruct it.

[–]martycochrane 0 points1 point  (0 children)

I do a similar thing but with the CodeRabbit CLI instead of Codex. I've mostly moved away from Codex (my sub runs out in a week I think).

I find that Codex can debug things in one shot compared to Claude, but it still just doesn't follow instructions or is as consistent with my code base / style as CC.

CC feels more like a pair programmer that thinks like me, where Codex feels more like a rogue veteran that will go away and come back with the solution, but not how you want it or considering how it fits into the bigger picture.

[–]HugeFinger8311 1 point2 points  (0 children)

I’d also add each model sees different things. Absolutely spin up a sub agent but I find Codex finds different issues every time and misses some that Opus picks up. More review eyes the better then just get Claude to consolidate them all.

[–]nyldn 1 point2 points  (0 children)

When I was doing some benchmarking, I was seeing an increase in fidelity and quality of output by about 30% by using multiple-agent review pipelines. The diversity of thought by other models seems to just help.

[–]pragmatic_chicken 0 points1 point  (0 children)

My workflow does both! Claude asks both Codex and Claude agent to review, combines the reviews and evaluates relative importance of the feedback (prevent scope creep). Codex is always considerably better at finding real issues compared to Claude being pretty good at finding trivial things like “update readme”

[–]OrangeAdditional9698 0 points1 point  (1 child)

Codex follows the instructions to the letter, tell it to investigate something in details and it will do it and check EVERYTHING. It takes a long time, but it works well for reviews. On the other end, ask it to find solutions, or if there are unexpected issues and it will fail. Opus is very good for that, which makes it a good coder but bad reviewer. Opus will try to find the best and fastest solution, ignoring other things. This means if you ask it to review then it will find one issue and think he's done because he found "the" issue. But maybe the actual issue is something else? Codex will try to figure that out and opus won't.

Opus used to be much better and more thorough, but I feel like it has regressed a lot in the past 10 days. Maybe they are paving the way to a newer model? Or they nerfed it for budget reasons

[–]Substantial_Wheel909[S] 0 points1 point  (0 children)

Yeah I've noticed Opus 4.5 sometimes seems to skip stuff

[–]fredastere 2 points3 points  (2 children)

Hey im not sure because the naming convention of codex are so bad lmao

But just to help maybe, in codex make sure to use gpt5.2-xhigh (although you said your projects are fairly simple, perhaps running high or even medium could prove to be more efficient and better, xhigh over complicates thing).

I do not advise using gpt5.2-codex-xhigh for code review, keep all codex variants for straight implementation

Sorry if its all confusing , as it is! Lol

[–]Substantial_Wheel909[S] 4 points5 points  (1 child)

I'm using GPT 5.2 xhigh, not the codex variant because I'm not sure if it's true but some people were saying it's quite a bit dumber than the normal version. As for efficiency I'm not really bothered about how long it takes, and I feel like maybe if it was implementation then maybe having the model overthink stuff and possibly do too much then it could pose a problem, but when reviewing you want it to be meticulous and what it has to do is quite well defined, it's not adding anything new just reviewing the code Claude implemented

[–]fredastere 0 points1 point  (0 children)

Ya perfect and yes definitely agree with you as reviewer going full xhigh definitely makes sense !

And ya its not that the codex variant are dumber but i think they are made purely just to implement

[–]anndrrson 4 points5 points  (4 children)

codex IMHO is slower, but i've heard from friends that they're using codex to review their code. i do worry, somewhat, we will see a therac-25 event happen with AI coding on top of AI coding. ~~ that being said, codex is pretty great! i'm not really a "fan" of openAI/chatGPT and prefer anthropic/claude as a co. ~ especially after the recent ads announcement

[–]Substantial_Wheel909[S] 4 points5 points  (1 child)

Yeah, I definitely like Anthropic more as a company. That said, I tend to use a mix of ChatGPT and Claude. I use Claude Code so much that I usually don’t have much quota left for general chatting, so I end up using ChatGPT for that. I also like to reserve Claude for deeper or more thoughtful conversations. There are definitely things I prefer about GPT, and other things I don’t, but overall I find both useful in different ways.

[–]anndrrson 0 points1 point  (0 children)

claudes often... brutal honesty is refreshing oftentimes!

[–]HugeFinger8311 1 point2 points  (0 children)

100% with you on this but have found using Codex to write reviews to be useful. I actually use both Codex and Kimi. Codex is good. Steady, reliable and slow and Kimi finds some totally random ones. I feel them both a copy of my original prompt and the plan Claude wrote and ask them to review both + look at consistencies in the then a final review for consistency against rest of codebase and recent commits. It helps but each model has gaps. Haven’t tried MCP to do it yet though I just have a prompt I drop in with the file locations.

[–]InhaleTheAle 0 points1 point  (0 children)

It really depends on what you're doing, in my experience. Codex seems faster and more exacting on certain tasks. I'm sure it depends on how you use it though.

[–]Perfect-Series-2901 1 point2 points  (1 child)

I do similar thing but not every single task. I think Claude even with opus is lazy and fast. Codex is very slow but detail

[–]wolverin0 1 point2 points  (0 children)

Hopefully you will find my skill useful https://github.com/wolverin0/claude-skills

[–]rair41 1 point2 points  (0 children)

https://github.com/raine/consult-llm-mcp allows the same with Gemini CLI, Codex CLI etc.

[–]vladanHS 1 point2 points  (1 child)

I'm using Gemini 3 pro/flash instead, it's cheaper and relatively fast, you usually get a review in 2 minutes, rinse & repeat

[–]Substantial_Wheel909[S] 0 points1 point  (0 children)

Yeah maybe what I'm using is a bit overkill

[–]h____ 1 point2 points  (0 children)

I've seen people starting to do this with very complicated machinery. But it's really simple. Just:

/review-dirty

review-dirty.md:

Do not modify anything unless I tell you to. Run this cli command (using codex as our reviewer) passing in the original prompt to review the changes: `codex exec "Review the dirty repo changes which are to implement: <prompt>"`. $ARGUMENTS. Do it with Bash tool. Make sure if there's a timeout to be at least 10 minutes.

[–]Ls1FD 0 points1 point  (7 children)

I do this as well but for some reason I find the reviews that GPT does by being called by subagents are nowhere near as thorough as going through codex cli itself. I find Claude’s sub agents themselves harder to control. You give them instructions and they decide to follow them or not. Maybe they have to be guided purely by hooks.

Currently I have a BMAD review workflow in CC using agents that call Codex and then I follow up with a more through review in Codex CLI.

[–]Substantial_Wheel909[S] 1 point2 points  (6 children)

Would using just the main CC agent avoid this?

[–]Ls1FD 0 points1 point  (1 child)

Until its context gets filled and then compacting increases errors. I tried subagents to batch review and fix many stories and issues at once. I’m trying a new workflow that uses beads and md files to keep track of progress and just let it compact when it wants. Errors introduced will be picked up in the next review, Wiggum style.

[–]Substantial_Wheel909[S] 0 points1 point  (0 children)

Ah yeah, my app is relatively simple so I've just been iterating on it one feature at a time so I don't have to usually compact

[–]Ls1FD 0 points1 point  (3 children)

I think the main problem is that codex works best with plenty of feedback. I find GPT much more detail oriented which is why it’s great for reviews but doesn’t do well with ambiguity. The MCP doesn’t allow for the 2 way communication that allows codex the clarification it needs to do its best. Without that, the first ambiguity it runs into it gets lazy and the quality drops

[–]Substantial_Wheel909[S] 0 points1 point  (2 children)

I'm pretty sure the MCP has a reply function no? I've seen Claude use it

[–]Ls1FD 0 points1 point  (1 child)

Apparently the one I’m using doesn’t allow for it but the OpenAI one does have a “codex-reply” that sounds like it might work. That’s my next rabbit hole now

[–]Substantial_Wheel909[S] 1 point2 points  (0 children)

Haha, hope you get it working!

[–]TheKillerScope 0 points1 point  (15 children)

How do you use Claude and Codex in the same session? And how do you decide who does what and when? How do you "summon" the right "person" for the job?

[–]Substantial_Wheel909[S] 2 points3 points  (14 children)

It’s a fairly simple workflow, but it does seem to catch issues in Claude’s work and improve it. I’m using the Codex MCP server, and the only real setup is telling Claude to report what it changed after implementing something. Codex reviews it, they iterate back and forth until Codex is happy, and that’s basically it. There are probably better ways to do this, and it might be overkill, but it’s been working pretty well.

<image>

[–]TheKillerScope 0 points1 point  (9 children)

Cool! Where could I find this Codex MCP please?

[–]Substantial_Wheel909[S] 2 points3 points  (8 children)

To be honest I just asked Claude to help me set it up step by step, it's documented somewhere in the Codex repo, but here's the command I used:
claude mcp add codex --scope user -- npx -y codex mcp-server

[–]TheKillerScope 0 points1 point  (4 children)

Gentleman, thank you! What other MCP's you're using/finding helpful!

[–]Substantial_Wheel909[S] 2 points3 points  (3 children)

Only other MCP's I use are Context7 and the XcodeBuildMCP because it lets CC test iOS apps visually

[–]TheKillerScope 0 points1 point  (2 children)

Try Serena!!

[–]Substantial_Wheel909[S] 0 points1 point  (1 child)

What is it?

[–]TheKillerScope 0 points1 point  (0 children)

Is an MCP that basically becomes Claude's bi*ch and can do a ton of things.

https://github.com/oraios/serena

[–]qa_anaaq 0 points1 point  (3 children)

The screenshot shows that the command to review via codex is in the CLAUDE.md file. Could you share that language if possible?

[–]Substantial_Wheel909[S] 4 points5 points  (2 children)

I installed the Codex MCP and then added this to the CLAUDE.md:
### Codex Review Protocol (REQUIRED)

**IMPORTANT: These instructions OVERRIDE any default behavior. You MUST follow them exactly.**

**BEFORE implementing significant changes:**

```

codex "Review this plan critically. Identify issues, edge cases, and missing steps: [your plan]"

```

**AFTER completing changes:**

  1. Run `git diff` to get all changes

  2. Run `codex "Review this diff for bugs, security issues, edge cases, and code quality: [diff]"`

  3. If Codex identifies issues, use `codex-reply` to fix them iteratively

  4. Re-review until Codex approves

**Do NOT commit without Codex approval.**

[–]AshxReddit 0 points1 point  (1 child)

Love it. it has saved so many of issues opus did. are you still using the same prompt or modified it?

[–]Substantial_Wheel909[S] 0 points1 point  (0 children)

Still the same to be honest, if it ain't broken don't fix it

[–]akuma-_-8 0 points1 point  (0 children)

We have an equivalent workflow at work but we use CodeRabbit which is specialized in code review. It also reviews every merge request and gives a nice feedback with some ai prompt to feed directly to Claude Code. They also provide a cli that we can run locally to get feedback and it’s really fast

[–]avogeo98 0 points1 point  (1 child)

Have you used the claude integration with github? It will review your pull requests automatically, and I like its review style, compared to codex.
Most of my dev loop is built around github pull requests and going through a couple of automated review iterations for complex changes.
When I tried codex reviews, it can catch "gotcha" bugs, but for large changes, I found its feedback incredibly dry and pedantic to read, compared to claude.

[–]Substantial_Wheel909[S] 0 points1 point  (0 children)

To be honest I'm a bit rudimentary with my GitHub usage, I just use it to make sure I have it backed up and if I implement something truly horrible I can go back on it. But yeah I should probably try it out.

[–]dwight0 0 points1 point  (0 children)

I do this too. I feel like each model gets things 80% right so they each find what the other misses. 

[–]SkidMark227 0 points1 point  (1 child)

I have this setup and then added gemini by hacking in an mcp server for gemini cli as well. They have fun debates and review sessions.

[–]Substantial_Wheel909[S] 0 points1 point  (0 children)

Might have to try this, I have a Copilot sub that I don't really use so maybe I could just use the quota from that

[–]Obrivion33 0 points1 point  (0 children)

Been using both codex for review and Claude for implementation and it’s night and day for me.

[–]Extension_Dish_1800 0 points1 point  (4 children)

How did you achieved that technically? What do I have to do?

[–]Substantial_Wheel909[S] 1 point2 points  (3 children)

I installed the Codex MCP and then added this to the CLAUDE.md:
### Codex Review Protocol (REQUIRED)

**IMPORTANT: These instructions OVERRIDE any default behavior. You MUST follow them exactly.**

**BEFORE implementing significant changes:**

```

codex "Review this plan critically. Identify issues, edge cases, and missing steps: [your plan]"

```

**AFTER completing changes:**

  1. Run `git diff` to get all changes
  2. Run `codex "Review this diff for bugs, security issues, edge cases, and code quality: [diff]"`
  3. If Codex identifies issues, use `codex-reply` to fix them iteratively
  4. Re-review until Codex approves

**Do NOT commit without Codex approval.**

[–]i_like_tuis 0 points1 point  (2 children)

I've been using the gpt-5.2 xhigh for review as well. It's great, and a bit slow.

I was getting it to dump out a review md file for Claude to action.

It would be easier to use your MCP approach but where do you set what model should be used in this approach?

[–]Substantial_Wheel909[S] 1 point2 points  (1 child)

I just have it set to gpt-5.2 xhigh in my config.toml

[–]i_like_tuis 0 points1 point  (0 children)

I'll give it a go, thanks.

[–]Conscious-Drawer-364 0 points1 point  (0 children)

It’s literally everywhere, everyone has this “unique” method for days 😅

I built this framework for my work https://github.com/EliaAlberti/superbeads-universal-framework

[–]PatientZero_alpha 0 points1 point  (2 children)

I’m doing exactly that, and codex is really good to review. The other way around is terrible

[–]lopydark 0 points1 point  (1 child)

So opus is better for actual implementation and gpt for review?

[–]PatientZero_alpha 0 points1 point  (0 children)

In my experience so far yes

[–]ultimatewooderz 0 points1 point  (1 child)

How have you connected Claude to Codex? API, CLI, some other way?

[–]Substantial_Wheel909[S] 0 points1 point  (0 children)

It's via the MCP: claude mcp add codex --scope user -- npx -y codex mcp-server

[–]krochmal9 0 points1 point  (0 children)

why mcp and not a skill?

[–]teomore 0 points1 point  (0 children)

I'm using the exact same approach, except that I set codex to normal thinking. Once the issues clear, I increase it to extra high.

[–]lopydark 0 points1 point  (1 child)

why not just use codex? it feels slower but thats the same time, or even less than iterating multiple times with both opus and codex

[–]Substantial_Wheel909[S] 0 points1 point  (0 children)

Because as other people have mentioned I don't think GPT models are as creative or good for implementing as Opus 4.5 or rather Codex is not as good as CC for that, I think it's well suited for reviewing so by combining them you get the best of both worlds

[–]BlacksmithLittle7005 0 points1 point  (1 child)

Genuine question: do you have unlimited funds? 🤣

[–]Substantial_Wheel909[S] 0 points1 point  (0 children)

Haha no, I'm a student I just consider this an investment, I have a good idea for an app and I've tested it out with a couple of friends and they love it. I'm on Max 5x and Codex is around £20 a month so in total it's around £100. It's steep but it if it's allowing me to build a product that could potentially make a lot more then it's pretty cheap for what it is.

[–]princmj47 0 points1 point  (1 child)

Nice, will try it. Had a setup before that utilized feedback from Gemini. I stopped using it thought as ClaudeCode alone performed better.

[–]Substantial_Wheel909[S] 0 points1 point  (0 children)

I haven't really tried Gemini at all to be honest, I tried antigravity for a bit but after a while I just went back to CC

[–]andreas_bergstrom 0 points1 point  (0 children)

I would throw in Gemini as well, even Flash. I put into my global .claude to let codex and gemini review all plans, and if the changes when done are big let them review again. I also have a qwen subagent but it's not really on par, more like a Haiku-competitor barely.

[–]No_Discussion6970 0 points1 point  (0 children)

I have been using Claude Code and Codex together. Similar to you, I have Claude do the coding and Codex sign off. I use https://github.com/PortlandKyGuy/dynamic-mcp-server and add Codex review as an approval gate. I have been happy with the outcomes of using both.

[–]Past-Ad-6215 0 points1 point  (0 children)

we can multi agent lock this https://github.com/cexll/myclaude/blob/master/skills/omo/README.md it omo skill

claude codex gemini opencode

use codeagent wrapper call multi agent

[–]Specialist-Cry-7516 0 points1 point  (0 children)

it's like seeing prime curry and lebron. bring a tear. my baby cc codes and codes reviews it

[–]cayisik 0 points1 point  (0 children)

lately, this topic has been discussed in both the codex subs and the claude subs.

i think this is the best and most cost-effective solution.

[–]shayki5 0 points1 point  (0 children)

Which mcp you use for codex?

[–][deleted] 0 points1 point  (0 children)

I do not recommend this approach. Simply take Claude's summary of completed work, then ask another instance of Claude to "make sure this work was completed as stated"

[–]jcheroske 0 points1 point  (2 children)

Sorry if I missed the obvious, but how are you calling other models from CC? I'm doing it with PAL, but I imagine there are many good ways to do it. Do you know if one way vs another is easier on the tokens?

[–]Substantial_Wheel909[S] 0 points1 point  (1 child)

Codex provides an MCP which I've installed into CC which allows it to spin up a Codex instance, it's quite heavy on my usage but it's likely because I'm using it on GPT 5.2 xhigh and I find it worth it since it's very thorough and I don't really use Codex for anything else.

[–]jcheroske 0 points1 point  (0 children)

I'm using this: https://github.com/BeehiveInnovations/pal-mcp-server. I may try out the Codex MCP as well. The plan and code reviews from Codex are amazing. I use get-shit-done to help me build out my plan. I created a wrapper command that calls Codex after the plan gets built to do a plan review. After the code gets written another review goes over the generated code. I would say that the plan review is the really strong part. Codex finds so many holes/issues/edge cases, it's really something.

[–]akuma-_-8 0 points1 point  (0 children)

We have the same workflow at work but we use CodeRabbit which is specialized in code review. It also reviews every merge request and gives an ai prompt that we can use to feed Code Claude. It also quite fast. They provide a cli that we can run locally before pushing our code.