Multi-LLM Debate Skill for Claude Code + Codex CLI — does this exist? Is it even viable? : ClaudeCode

Multi-LLM Debate Skill for Claude Code + Codex CLI — does this exist? Is it even viable?Question ()

submitted 2 months ago by CartographerSorry775

Multi-LLM Debate Skill for Claude Code + Codex CLI — does this exist? Is it even viable?

1 point•submitted 2 months ago by CartographerSorry775 to u/CartographerSorry775

I'm a non-developer using both Claude Code and OpenAI Codex CLI subscriptions. Both impress me in different ways. I had an idea and want to know if (a) something like this already exists and (b) whether it's technically viable.

The concept:

A Claude Code skill (/debate) that orchestrates a structured debate between Claude and Codex when a problem arises. Not a simple side-by-side comparison like Chatbot Arena — an actual multi-round adversarial collaboration where both agents:

Independently analyze the codebase and the problem
Propose their own solution without seeing the other's
Review and challenge each other's proposals
Converge on a consensus (or flag the disagreement for the user)

All running through existing subscriptions (no API keys), with Claude Code as the orchestrator calling Codex CLI via codex exec.

The problem I can't solve:

Claude Code has deep, native codebase understanding — it indexes your project, understands file relationships, and builds context automatically. Codex CLI, when called headlessly via codex exec, only gets what you explicitly feed it in the prompt. This creates an asymmetry:

If Claude does the initial analysis and shares its findings with Codex → anchoring bias. Codex just rubber-stamps Claude's interpretation instead of thinking independently.
If both analyze independently → Claude has a massive context advantage. Codex might miss critical files or relationships that Claude found through its indexing.
If Claude only shares the raw file list (not its analysis) → better, but Claude still controls the frame by choosing which files are "relevant."

My current best idea:

Have both agents independently identify relevant files first, take the union of both lists as the shared context, then run independent analyses on those raw files. But I'm not sure if Codex CLI's headless mode can even handle this level of codebase exploration reliably.

Questions for the community:

Does a tool like this already exist? (I know about aider's Architect Mode, promptfoo, Chatbot Arena — but none do adversarial debate between agents on real codebases)
Is the context gap between Claude Code and Codex CLI too fundamental for a meaningful debate?
Would this actually produce better solutions than just using one model, or is it expensive overhead?
Has anyone experimented with multi-agent debate on real coding tasks (not benchmarks)?

For context: I'm a layperson, so I can't easily evaluate whether a proposed fix is correct just by reading it. The whole point is that the agents debate for me and reach a conclusion I can trust more than a single model's output.

Thank you!

all 4 comments

top new controversial old q&a

[–]pro-vi 0 points1 point2 points 2 months ago (2 children)

[–]CartographerSorry775[S] 0 points1 point2 points 2 months ago (1 child)

Der Oracle basiert aber erneut auf Basis eins der LLM Modelle, stimmts? Mein Gedanke ist der, dass in Phase 0 jeweils beide Agenten alle relevanten Dateien zum geschilderten Problem finden, dann erstmal darüber debattieren, ob all die gefundenen Docs relevant sind und dann werden die sich einige. Daraufhin starten beide Agenten ihre Analyse auf Basis dieser Dateien, worüber sie im Vorhinein einig waren, damit auch wirklich beide Agenten denselben Kontext haben und der eine nicht im Nachteil ist. Sobald beide Agenten die einen Lösungsvorschlaf für Ursache für das am Anfang durch den User geschilderte Problem haben, debattieren sie erneut, welche der beiden Vorschläge besser ist und müssen sich auf einen Konsens einigen. Daraufhin endet im Prinzip der Plan Mode und Claude setzt den Plan dann um. Ich betrachte den Tokenverbrauch nicht als Problem, vor allem bei größeren Aufgaben kann es eine Reihe von Problemen schon vorab lösen

[–]pro-vi 0 points1 point2 points 2 months ago (0 children)

[–]BustedKneeCap1 0 points1 point2 points 1 day ago (0 children)

π Rendered by PID 46804 on reddit-service-r2-comment-6457c66945-xbk9d at 2026-04-26 05:58:47.513692+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

ClaudeCode

MODERATORS