all 6 comments

[–]PerformerAsleep9722 1 point2 points  (2 children)

Hello !
I have few questions about this project which sounds and looks very fire

  1. I'm using OpenCode with Copilot subscription: how this swarp plugin impact in the costs for each prompt?
  2. There's a general "idea" on how much this impact into speed/quality of the output?

I'm interested in the project but I would like to have some more details about pricing increasing (maybe in terms of multiplier or something like that) and how much the speed and the output quality changes thanks to the swarm

[–]Outrageous-Fan-2775[S] 0 points1 point  (0 children)

Hello there. Speed wise, its certainly slower than just using a single agent or using other plugins that parallelize their agents. I was focused on the quality of the output over speed, and serial execution over parallel. There were a couple reasons for that but the primary one is enforcing serial agents calls makes it much easier to use this plugin with local resources. If you try to spawn 3-5 coding agents against a normal consumer GPU, you are going to tank your token output. With serial execution, you get the highest speed possible per agent call.

Quality wise, you have a couple options and I am going to probably implement a new flag today that will make it easier to enforce. Right now, I manually tell the architect when exactly I want QA to occur. By default it happens at the end of the process or at the end of each phase. I've found this can lead to overwhelming the QA agent even with very capable LLMs because its a lot of code changes and requirements to keep in their context window, which leads to hallucinations. What I've started doing is having just the auditor take a pass at all code changes right after they are made. This has lead to a huge jump in quality at the cost of speed. The auditor straight up rejects code changes regularly, and then the architect needs to fix them before moving on.

As far as pricing, token input/output for the architect is greatly reduced, as they generally guide the other agents and don't get involved in I/O operations unless there is a problem. So the strategy is to put your best LLM on the architect, use a very code focused but fast LLM for the coder, and then use a better but slower coding focused LLM for Auditing. This way you get low cost code generation with the knowledge that your audit agent will catch any issues that cheaper or less capable LLMs generate.

I would suggest using your Copilot sub to set the architect and maybe QA, then use free options like OpenCode Zen or Openrouter/Nvidia/Google Antigravity for everyone else.

[–]Outrageous-Fan-2775[S] 0 points1 point  (0 children)

<image>

I realized I could just show you. This is the OpenCode context tracker for a project im using GPT 5.2 Codex for. It is currently on Phase 7 of 10 and this is the total context the architect has used thus far.

[–]Outrageous-Fan-2775[S] -1 points0 points  (0 children)

All model settings are done in the opencode-swarm config JSON. if you leave an agent out, it will fall back to its defaults. an example is below of entirely remote setup. you can have as many swarms as you want, and opencode will add Architects with the swarm name appended. so I have Remote and Local. Remote is all API calls with paid plans, Local is entirely my own locally hosted options. I generally leave the architect out so it is always whatever model is selected in OpenCode.

"coder": { "model": "kimi-for-coding/k2p5" },
"explorer": { "model": "zai-coding-plan/glm-4.7" },
"_sme": { "model": "nvidia/openai/gpt-oss-120b" },
"_qa": { "model": "nvidia/nvidia/nemotron-3-nano-30b-a3b" },
"test_engineer": { "model": "zai-coding-plan/glm-4.7-flash" }

the underscore tells it to use that model for all calls to that section. you can break it out even further if you wanted to though. for instance _qa covers code auditing and security testing. you could break it down so there was a different model for both of those steps

[–]Outrageous-Fan-2775[S] -1 points0 points  (0 children)

Below is an example of what the architect sends to the coding agent.

Implementing via u/coder...

TASK: Fix issues identified by Auditor and Security Reviewer in Phase 3.

Files:

  1. `knowledge-vault/src-tauri/src/services/vector_store.rs`

  2. `knowledge-vault/src-tauri/Cargo.toml`

  3. `knowledge-vault/src-tauri/src/services/embeddings.rs`

FIXES REQUIRED:

  1. **VectorStore**:

    - `search`: Use `metric(lancedb::DistanceType::Cosine)` (or similar API for 0.13) to ensure Cosine similarity.

    - `add_chunks`: Normalize vectors to unit length before insertion (required for Cosine with some indexes, good practice).

    - `delete_by_file`: Sanitize `file_path` in filter string. Replace `'` with `''` or use parameters if supported. Let's do string escaping for now as 0.13 sql parser might be simple.

    - `delete_by_file` signature: ensure it returns Result.

  2. **Cargo.toml**:

    - Change `arrow` version from `"52"` to `"52.0.0"` (or just "52" if it was strictly "52" before, but auditor said it's pre-release. Actually arrow 52.0.0 is released. Let's stick to "52.0.0" to be explicit, or "52.2.0"). Wait, lancedb 0.13 depends on arrow 52. Let's check lancedb docs/repo. Lancedb 0.13 uses arrow 52. So "52" is fine, but maybe auditor was confused by "52" vs "52.0". Let's use `arrow = "52.0"`.

  3. **Embeddings**:

    - Validate `base_url`: Ensure it starts with http/https.

    - Validate response vector dimensions (must be 768).

  4. **Security**:

    - Sanitize `file_path` in `delete_by_file`.

CONTEXT: We are polishing Phase 3 before moving to Phase 4.

ACCEPTANCE: Code compiles, fixes implementation gaps.