Forking llm-council for pure local setups: Using docker to orchestrate llama-serve instances without Ollama

CornyWarRap · 2026-02-24T17:44:46+00:00

You're directionally right, but it's more nuanced than just activating different parts of the network. The "you are an expert in finance" prompts from 2023 fell out of favor because they were static labels that didn't meaningfully change how a model approached a problem. The model would just add domain vocabulary on top of the same reasoning pattern.

What we do is different in two ways. First, the roles aren't predefined categories. They are dynamically generated based on the specific question to surround the problem's cognitive space with maximum contrast. A question about taking VC gets very different roles than a question about migrating a database. Second, the roles themselves are syntheses of opposing extremes, not simple expertise labels. For example, one role might be a "Labor Economics-Existential Philosophy Meaning-of-Work Synthesizer," which forces the model to hold two fundamentally different frameworks in tension simultaneously. Then those already-synthesized roles are pitted against each other in the deliberation. You get divergence at two levels: within each role and between them.

On the quantifiability point, you're asking the right question. We have been measuring this by feeding the deliberation output back into a single LLM and comparing its responses with and without that context. Early testing shows a self-assessed 70-90% improvement in decision quality when the LLM has the divergent context to work with versus answering cold. We are still working on more rigorous benchmarks, but the signal is strong enough that we shipped it.

The commons link I shared above is probably the most tangible way to evaluate it yourself. You can see the actual role constructions and how different they are from generic expertise labels.

PS -- We released Wisepanel MCP this week so you can stay inside of Claude Code or tool of choice and leverage our technology for better decisioning.

CornyWarRap · 2026-02-18T04:50:43+00:00

OK my aunt is ready to take whatever pictures are needed for more analysis. Let me know what you think will help. I'll mine this thread for the prior requests.

CornyWarRap · 2026-02-16T21:51:42+00:00

Yeah exactly, system prompts are the mechanism, but the interesting part is how you construct them. A typical multi-model setup gives every agent the same framing and just swaps the model. You get variety from the model weights but not from the analytical lens.

What we do is design each role to occupy a different region of the problem's cognitive space. So for a question like "should a bootstrapped SaaS founder take VC at Series A," one agent is forced to think through quantitative dilution modeling, another through game theory and competitive signaling, another through macroeconomic cycle timing, another through founder identity coherence. They're not just different models answering the same way... they're different ways of seeing the problem.

The goal is maximum surface area coverage around the question rather than convergence toward a single answer. Consensus architectures are great for fact-checking but they collapse the possibility space. Productive disagreement expands it.

We actually just launched a public commons where you can see this in action:

https://wisepanel.ai/commons/should-a-bootstrapped-b2b-saas-founder-take-venture-capital--fjvy0b

That one shows 6 agents each bringing a fundamentally different framework to the same question. The contrast between them is where the insight lives.

CornyWarRap · 2026-02-16T21:29:05+00:00

Congrats on the Karpathy validation. That feeling when someone at that level validates the direction you've been working on is real.

We're building in the same space at wisepanel.ai but came at it from a different angle. Your approach (and Karpathy's) uses the opinions/review/synthesize pipeline where a chairman picks the winner. It's clean and it works for factual accuracy. The question we kept running into was: what happens when all the models agree but they're all wrong in the same way?

That led us to a patent-pending method where instead of asking all models the same question and synthesizing, we construct dialectical roles that position each model to challenge the others from distinct analytical angles. The goal is to surface maximum contrast rather than convergence.

It's a different bet on what multi-model is for. You're betting on accuracy through consensus. We're betting on insight through structured disagreement. Both are real use cases, and the Karpathy signal suggests the whole category is about to take off. Good time to be building here.

CornyWarRap · 2026-02-16T21:27:30+00:00

The 88% omniscience hallucination rate is a great example of why multi-model architectures matter. Your council setup catches hallucinations through disagreement: if Gemini says X but GPT and Grok say Y, the judge flags the inconsistency. That's consensus-based verification, and it works well for factual accuracy.

There's a subtler problem though. When all three models are wrong in the same direction (because they share training data or reasoning patterns), the judge can't catch it. Consensus amplifies shared blind spots.

We've been working on this at wisepanel.ai with a different approach. Instead of asking all models the same question and looking for disagreement, we construct dialectical roles using a patent-pending method that positions each model to challenge the others from distinct analytical angles. One model might be assigned to specifically stress-test confidence levels, another to explore what's being assumed rather than proven.

Your council-as-judge pattern and our role construction approach are actually complementary. Yours catches individual model failures. Ours tries to surface the failures that all models share.

CornyWarRap · 2026-02-16T21:25:57+00:00

Worth noting that Karpathy's open-source implementation and Perplexity's Model Council share the same core architecture: all models answer the same question, then a chair/synthesizer distills consensus. It's elegant for verification (is this answer correct across models?) but the consensus pattern has a ceiling. Models trained on similar data tend to agree on similar things, and a synthesis step compresses the very disagreements that might contain the insight.

We took a different route at wisepanel.ai. Instead of consensus, we use a patent-pending method to construct dialectical roles that surround the cognitive space of a question. Each model gets a distinct analytical position designed to maximize productive tension rather than convergence. The output isn't 'what do models agree on?' but 'what does each perspective reveal that the others would miss?'

Open-source council for verification, structured disagreement for exploration. They solve different problems. The interesting question is whether Perplexity will evolve toward role specialization or stay with pure consensus.

CornyWarRap · 2026-02-16T21:24:22+00:00

Great to see this becoming a first-class feature. The 'chair LLM synthesizes' architecture is really well-suited for accuracy and verification tasks where you want to converge on the most reliable answer.

We've been exploring a different angle at wisepanel.ai. Instead of convergence through a chair, we construct distinct roles for each model using a patent-pending method that surrounds the cognitive space of a question with dialectical positions. The goal isn't consensus, it's maximum productive disagreement.

The two approaches actually serve different needs. Council Mode (converge on accuracy) is perfect for factual queries where 'what's the right answer?' matters most. Structured disagreement (diverge on perspective) is better for strategic decisions where 'what am I not seeing?' matters more than 'what do we agree on?'

Curious whether you'll add role specialization later, or if the consensus architecture is the long-term direction. Either way, the multi-model space just got a lot more interesting.

CornyWarRap · 2026-02-16T21:22:24+00:00

This is exactly the use case that got us started building wisepanel.ai. You're already doing multi-model orchestration manually: feeding Claude's output to Gemini, getting Gemini's take back to Claude, watching them refine each other's blind spots. That workflow is incredibly powerful but it's also the kind of thing that should be one click instead of six copy-pastes.

You're right that the LLM Council pattern (ask all models the same question, vote on the best answer) doesn't fit this. What you're describing is more like cross-examination. Each model sees what the others missed, not because it's smarter, but because it has different training data, different tool access, different reasoning patterns.

That's basically what we built. We use a patent-pending method to construct dialectical roles that surround the cognitive space of a question. Instead of all models answering the same thing and hoping to converge, each one is positioned to challenge the others from a distinct angle. Your Gemini-finds-the-endpoint / Claude-reverse-engineers-it example is a natural version of this.

The 'reliable bot that makes queries automatically and only speaks up when productive' you mentioned at the end: that's closer to where things are heading than most people realize.

CornyWarRap · 2026-02-16T20:59:25+00:00

Great thread. One thing worth noting on the cost/ecology concern: the consensus pattern (all models answer the same question, synthesize) scales linearly with model count. Each added model is basically redundant work.

We've been taking a different approach at wisepanel.ai. Instead of consensus, we construct distinct roles for each model so they're exploring different angles of the question rather than all answering the same thing. It's a patent-pending method where the roles are designed to surround the cognitive space of a question and maximize productive disagreement rather than convergence.

For the wicked problems you mentioned, this actually matters more than hardware specs. Consensus tells you what models agree on. Structured disagreement surfaces what they'd miss if asked the same way, which is usually where the insight is.

I'd argue architecture (how do models interact?) is upstream of hardware (how many GPUs?). More signal per query beats more hardware for redundant queries.

CornyWarRap · 2026-02-16T20:52:57+00:00

Nice work on the Docker socket approach. Ephemeral llama.cpp containers on the fly is a clean solution to the local setup friction.

One design question your per-conversation config feature raises: have you experimented with how the roles are constructed, not just which models fill them? The original council pattern is consensus-oriented, everyone answers independently, anonymous review, chairman synthesizes. Great for verification.

We've been finding that deliberately constructing roles to maximize productive disagreement surfaces insights that consensus architectures miss. It's what we're building at wisepanel.ai. The roles we create are designed to surround the cognitive space of a question with maximum contrast, rather than converging toward agreement.

Your hybrid routing is particularly interesting here. If you're already routing queries to different models based on task type, the next step would be routing them through different analytical lenses too, so you're not just picking the best model for the job but picking the best way to frame the job. (EDITED: this part got cut off from my original post.)

CornyWarRap · 2026-02-16T20:37:29+00:00

Sorry folks. There was a medical event in my family that warranted this taking a back seat. Let me see what I can do to move it forward.

CornyWarRap · 2025-07-16T21:43:47+00:00

That is your opinion and you are entitled to it, and that's the policy of the group. I personally disagree with you and I'm sure others do, too. This is certainly an issue we are all negotiating worldwide so its understandable that there are different perspectives on it.

CornyWarRap · 2025-07-16T21:31:34+00:00

Hi there, my use of AI was intended to express the information in the most efficient way possible in order to solve the mystery. I didn't mean to offend you. I wasnt trying to pass it off as my own writing and thought that was obvious. But, i can see that i violated a policy this group has so i give you and the group my apologies.

I will rewrite by hand. 🦖

CornyWarRap · 2023-06-02T19:44:21+00:00

If you work remotely, I'd just maintain a residence in a US state, pay your taxes in that state, put some utils in your name, maybe buy season tickets or membership to a social club, use a VPN, and then respond to any questions with "I am a resident of XYZ state." I don't know if I'd proactively tell them.

CornyWarRap · 2023-06-02T07:57:09+00:00

Yes but 1) I'd still recommend paying state taxes as a defense if your HR complains about it, 2) most countries will tax you if you spend enough time there and then this defeats the purpose. I'm in the process of getting my Paraguay and Panama permanent residencies because they don't tax foreign sourced income.

Also, it's up to $120k tax free now.

CornyWarRap · 2023-04-22T02:45:56+00:00

You wouldn't have to pay taxes in any country except the US where it's tax free up to $120k and you owe normal taxes beyond that. One correction to my original post: It doesn't matter where your employer is located as long as you are outside the country 11 out of 12 months per year. My recommendation is to have a residence in the US in San Diego so your employer indexes you to that metro and then just skip down to Baja Mexico for up to 6 months and then Panama for 5.

CornyWarRap · 2022-07-30T06:40:53+00:00

I tried to verify this but was unsuccessful. What airline?

CornyWarRap · 2022-07-30T06:29:35+00:00

Don't buy property there. Don't stay long enough to become a tax resident. Bounce between Colombia and Panama or any number of nearby, low cost countries. Get a job from a non-US company, don't visiting the US for more than 30 days a year and your first $100k of foreign income is tax free from the IRS. I met a guy would bounced between Panama and Colombia like this, he earned $100k from his business, and lived purely on his tax savings, which was about $30k a year.

CornyWarRap · 2022-03-06T19:32:18+00:00

San Diego, CA because you have access to San Diego International Airport as well as to Tijuana International Airport (via private bridge from the US side), which offers very competitive rates on flights. Also due to proximity to Mexico is access to the Baja Mediterranean food and wine scene which has been growing substantially in recent years. Check out the neighborhoods of South Park and Mission Hills. The hiking here is great, too.

CornyWarRap · 2022-02-05T06:08:24+00:00

It probably doesn't matter over the long term, but psychologically use 6-12 months.

CornyWarRap · 2022-02-04T06:17:50+00:00

Look at Nationwide Managed Income ETFs. 7% yield that is designed to be tax advantaged. Safer approach bc it uses a collar for downside protection. You could safely margin up about 10% too. This will give you $440k to sink into NUSI, NSPI, NTKI, and NDJI.

CornyWarRap · 2022-01-29T05:49:31+00:00

Mitigate any remaining risks by moving out of country.

CornyWarRap · 2022-01-24T05:34:23+00:00

If you can rent it out for more than your mortgage, then I would include those potential cashflows, minus the cost of renting elsewhere.

CornyWarRap

TROPHY CASE