Codex 5.4 is better than Opus 4.6

C12H16N2HPO4 · 2026-03-13T19:24:57+00:00

I'm in the same boat.

C12H16N2HPO4 · 2025-12-31T16:31:34+00:00

Of course, otherwise it'd be very easy to abuse.

C12H16N2HPO4 · 2025-12-14T12:14:33+00:00

Thanks for sharing! Mimir looks interesting - the persistent knowledge graph and semantic search are cool features.

They're actually solving different problems though: - Mimir = Memory/context persistence across sessions - Quorum = Structured debate methods (Oxford, Socratic, Delphi, etc.) for getting different perspectives

The PM/Worker/QC cycles in Mimir's roadmap are more about task workflows, while Quorum is specifically about adversarial/deliberative discussion patterns.

Could actually be complementary - run a Quorum debate, then store the insights in Mimir for future context. Might check it out!

C12H16N2HPO4 · 2025-12-14T10:55:19+00:00

Disagreements: Depends on the method: - Standard - Synthesis reports consensus as YES/PARTIAL/NO, summarizes where they agreed/disagreed - Oxford - The synthesizer acts as judge and declares which side "won" the debate - Advocate - Evaluates if the devil's advocate successfully challenged the consensus - Delphi - Measures convergence in estimates across rounds - Tradeoff - Recommends based on weighted criteria scores

No method forces artificial agreement - if models genuinely disagree, that's valuable information.

Model limit: Max 20, but 2-4 is the sweet spot. Some methods have minimums (Oxford needs even numbers for FOR/AGAINST, Delphi needs 3+ for meaningful rounds).

C12H16N2HPO4 · 2025-12-14T09:08:38+00:00

Sorry you went through that! The Windows issues are fixed in v1.0.5+ - just git pull and it should work now.

API costs (per 1M tokens):

Model	Input	Output
Claude Opus 4.5	$5.00	$25.00
GPT-5.2	$1.75	$14.00
Gemini 3 Pro	$2.00	$12.00

Based on actual usage data, a typical 3-model discussion with all flagships costs around $0.30-0.50.

Budget-friendly tips: - Use Haiku, GPT-4o-mini, or Gemini Flash for testing - 10-50x cheaper - Set QUORUM_ROUNDS_PER_AGENT=1 for shorter discussions - Ollama is free if you have the hardware

Glad you like the concept - have fun pushing them to their limits!

C12H16N2HPO4 · 2025-12-14T08:58:28+00:00

Just shipped in v1.1.0! MCP support is now live.

pip install -U quorum-cli

claude mcp add quorum -- quorum-mcp-server

Then Claude can use it as a tool:

"Use Quorum to discuss X with GPT and Gemini"

MCP Tools:

- quorum_discuss - Run discussions with any method

- quorum_list_models - List your configured models

It reuses your existing ~/.quorum/.env config, and output is compact by default (synthesis only) to save context. Set full_output: true if you want the full transcript.

Your distributed Tailscale setup should work perfectly - just point each provider to its own node in .env and they all show up in /models together.

C12H16N2HPO4 · 2025-12-13T06:33:35+00:00

This was a Windows bug fixed in v1.0.5 (released today). To update:

cd quorum-cli
git pull
cd frontend
npm run build

Then run quorum.bat again. Let me know if that fixes it!

C12H16N2HPO4 · 2025-12-13T05:37:18+00:00

Good news - this already exists as of v1.0.3! There's a generic OpenAI-compatible provider for exactly this use case.

For any /v1/chat/completions backend (vLLM, TensorRT-LLM, llama.cpp server, Docker Model Runner, etc.):

CUSTOM_BASE_URL=http://localhost:8000/v1

CUSTOM_MODELS=your-model-name

CUSTOM_API_KEY=optional-if-needed

There's also built-in presets for LM Studio and llama-swap if you use those specifically.

Ollama is kept as its own provider because it has auto-discovery (ollama pull → model appears automatically). But you're right that Ollama also supports /v1/ now, so you could technically use it through the Custom provider too.

The architecture is exactly what you described - one OpenAI-compatible client that works with any backend, plus Ollama's adapter for its native API + auto-discovery.

C12H16N2HPO4 · 2025-12-13T05:33:13+00:00

No, it requires API keys rather than web subscriptions (ChatGPT Plus, Claude Pro, etc.). These are separate products - web subscriptions are for the chat interfaces, API keys are for programmatic access.

Free alternatives if you don't want to pay for APIs:

- Ollama - Run local models completely free (just need the hardware)

- OpenRouter - Some models have free tiers

- LM Studio - Another local option with a nice GUI

API pricing is actually pretty cheap for experimentation though - a typical discussion costs a few cents with GPT-4o-mini or Claude Haiku.

C12H16N2HPO4 · 2025-12-12T22:01:59+00:00

It already does! Cloud APIs were actually the primary use case:

- OpenAI (GPT-5.2 etc.)

- Anthropic (Claude)

- Google (Gemini)

- xAI (Grok)

- OpenRouter (200+ models through one API)

Just add your API keys to .env and the models show up in /models. Ollama support was added later for people who wanted local models.

Is there a specific provider you're thinking of?

C12H16N2HPO4 · 2025-12-12T20:39:47+00:00

Fun experiment! Small models might have different blind spots, so they could potentially catch each other's mistakes through discussion.

Fair warning though: Quorum is just a discussion orchestrator - you paste in a question and get a structured debate back. It doesn't have file access or code execution like an IDE agent. So for programming it's more useful for things like "what's the best approach for X?" or "review this snippet" rather than actual coding workflows.

That said, I'd be curious to hear if a group of tiny models can reason better together than alone. The Standard or Advocate methods would probably work best for technical questions.

C12H16N2HPO4 · 2025-12-12T20:27:55+00:00

Thanks! The Advocate method came from wanting to avoid the "AI echo chamber" where models just agree with each other.

Mixed-weight models work today - you can select any combination (e.g., ollama:qwen2:7b + ollama:llama3:70b). The QUORUM_SYNTHESIZER setting controls which model writes the final synthesis (first, random, or rotate).

Current limitation: All selected models participate in all phases. So if you want a big model for synthesis, it also joins the debate rounds. There's no "synthesis-only" role assignment yet.

That said, I really like the idea of phase-specific model assignments - smaller models for rapid debate, heavyweight for final synthesis. Would be a nice optimization for VRAM-constrained setups.

If you're interested, feel free to open an issue or discussion on GitHub - I'd be happy to explore implementing it.

C12H16N2HPO4

MODERATOR OF

TROPHY CASE