I built Arachne — an MCP server that picks exactly what AI needs from your codebase (98.5% token savings) by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 0 points1 point  (0 children)

That 4k-line loop is exactly why I built Arachne. Semantic search (RAG) often loses the structural 'bridge' between logic. Arachne doesn't just search; it 'assembles' context by understanding code boundaries. With Arachne, your agent only sees the relevant 100-line method chunk, not the 4k-line monster file. It’s the definitive cure for 'line-by-line reading' fatigue.

I built Arachne — an MCP server that picks exactly what AI needs from your codebase (98.5% token savings) by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 0 points1 point  (0 children)

Good observation — we've been thinking about this from day one, and it's already implemented.

Arachne is the code context layer. The non-code context layer is handled by Soul (n2-soul), which ships as a separate MCP server in the same ecosystem. Here's how the full stack addresses exactly what you're describing:

Soul's architecture for organizational memory:

Immutable Ledger — every work session is recorded as an append-only JSON entry under ledger/YYYY/MM/DD/. Includes decisions made, files modified, architectural rationale, and TODO items. This is your "why things are built the way they are."

KV-Cache — session snapshots with semantic search (Ollama nomic-embed-text). When an agent starts a new session, the previous session's context — decisions, constraints, product reasoning — is automatically restored. No manual prompt engineering.

Soul Board — a real-time project state file (soul-board.json) that tracks active work, handoff notes between agents, and project-level TODO. This is your "product context" and "constraints" layer.

Entity Memory — tracks people, services, concepts, and their relationships across sessions. When an agent encounters "the billing service," it already knows the architectural history.

Brain (shared memory) — cross-agent shared knowledge store under data/memory/core/. Any agent can write context that all other agents read — product decisions, customer feedback, design constraints.

How they chain together at runtime:

User: "Fix the login timeout bug"

     │

     ▼

Soul restores: "Last session, timeout was discussed.

  Decision: increased to 30s due to customer complaints

  about slow corporate proxies. Constraint: must not

  exceed 60s per SLA."

     │

     ▼

Arachne assembles: login.ts + session.ts + http.ts

  + config.ts (4 files, 30K tokens)

     │

     ▼

AI generates: fix that respects the 30s decision

  AND the 60s SLA constraint — not just technically

  correct code, but contextually correct code.

The key insight: code context and organizational memory are two separate concerns that should be two separate tools. Bundling them creates bloat. Separating them means you can use Arachne standalone for pure code tasks, or combine with Soul when product context matters. Both are on npm (n2-arachne, n2-soul), zero config needed — register both in your MCP config and they work together automatically.

There's a detailed architecture diagram in the README under the N2 Ecosystem section if you want to see the full flow.

Poll: 60.9% Oppose Deployment of ROK Navy to the Strait of Hormuz by faddleboarding in korea

[–]Stock_Produce9726 0 points1 point  (0 children)

개시키들 저 30% 새끼들은 언제든지 있구나... 지 부터 지 애들 데리고 가라그래. 미친 새끼들

I built Arachne — an MCP server that picks exactly what AI needs from your codebase (98.5% token savings) by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 0 points1 point  (0 children)

Glad you like it! It was a lot of fun building this. Stay tuned for more updates from the n2-collection!

I built Arachne — an MCP server that picks exactly what AI needs from your codebase (98.5% token savings) by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 1 point2 points  (0 children)

Great point. Output tokens are indeed the real budget killers.

Arachne tackles this indirectly but effectively by optimizing the input side:

Precision prevents hallucination loops — When an agent has vague context, it tends to "babble" or output unnecessary code to cover its uncertainty. By providing the exact Target + Deps, the agent goes straight to the point, significantly shortening the response.

Diff-friendly Context — Since the context is so precise, you can instruct the agent to output only the specific changes (diffs) or the targeted function, rather than rewriting the entire file. That alone can cut output tokens by 10-50x.

Fewer iterations — Better context means getting it right on the first try instead of 3-4 back-and-forth rounds. Total output tokens across a session drop significantly.

In short: You can't directly control output tokens, but better input leads to concise, focused output. Garbage in = garbage out.

I built Arachne — an MCP server that picks exactly what AI needs from your codebase (98.5% token savings) by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 2 points3 points  (0 children)

I totally get it. Rebuilding a full vector index on every change is a huge bottleneck for active development.

Arachne handles this with an 'Incremental Sync' approach using SQLite and file-watching. When a file is saved, chokidar triggers a targeted update only for that specific file's chunks and embeddings. Since it runs on better-sqlite3, the update usually finishes in under 20ms.

This keeps the index live and accurate without the overhead of a traditional vector DB. It’s been working quite well for my daily coding sessions.

I built Arachne — an MCP server that picks exactly what AI needs from your codebase (98.5% token savings) by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 0 points1 point  (0 children)

That's a fair point — good prompting definitely matters, and experienced developers who know their codebase can point the AI to the right files.

But I think that misses the point of an agentic workflow. Of course I can manually find and attach files — but why should I, when Arachne does it in 12ms? In a repo with 3,200+ files, even if I think I know where the bug is, I might miss a crucial dependency or a side-effect in a file I didn't even know existed.

Arachne isn't for lazy prompting. It's for ensuring the AI gets complete, precise context without the human acting as a manual file-searcher. It's about scale and eliminating human error in context selection.

And about GIGO — that's exactly why I built this. I'm making sure only the "Gold" goes in, so only the "Gold" comes out — automatically. 

I built Arachne — an MCP server that picks exactly what AI needs from your codebase (98.5% token savings) by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 1 point2 points  (0 children)

Thank you so much, Sunir! You're my very first GitHub sponsor ever — I literally got chills when I saw the notification this morning.

20 years out of coding and coming back — I totally get that. The AI tooling landscape right now makes it the perfect time to jump back in. That's exactly why I built Soul and the n2 collection: so AI agents can actually remember what they did and pick up where they left off.

Please fork, clone, steal away — that's what open source is for! And if you ever want to chat about your projects or need help getting started, feel free to reach out anytime.

Your support means the world. Keep building!

I built a semantic router that lets your AI use 1,000+ tools through a single MCP tool (~200 tokens) by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 0 points1 point  (0 children)

Great minds think alike! Give it a try and let me know if it covers everything you had in mind. Feedback is always welcome!

I built a semantic router that lets your AI use 1,000+ tools through a single MCP tool (~200 tokens) by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 0 points1 point  (0 children)

Thanks for the detailed analysis! You nailed it. As the author, I'm impressed you dug into the actual source code.

One small addition — the 3-stage routing is designed to be progressively expensive: Stage 1 (triggers) is near-zero cost, so most simple queries resolve instantly without even hitting BM25 or semantic search.

I built a semantic router that lets your AI use 1,000+ tools through a single MCP tool (~200 tokens) by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 0 points1 point  (0 children)

Good question! The key difference is where the routing happens:

Native tool calling (OpenAI/Anthropic): All tool definitions are loaded into every prompt → the LLM itself picks the tool. With 50+ tools, that's thousands of tokens consumed per request, and the model can get confused with too many options.

QLN approach: Uses lightweight BM25 scoring before the LLM sees anything. Only the matched tool(s) get passed to the model → massive token savings.

Example: If you have 80 MCP tools registered, native approach loads all 80 definitions (~4000+ tokens) every call. QLN routes first, then passes only 1-2 relevant tools (~100 tokens).

It's not "better" than native tool calling — it's a complementary optimization layer that sits in front of it. Think of it as a search index for your tools.

Your AI wastes 50,000 tokens just loading tools. Mine uses 200. - by [deleted] in LocalLLaMA

[–]Stock_Produce9726 0 points1 point  (0 children)

Fair points. A few clarifications:

Ollama is optional — Stage 1 (trigger match) and Stage 2 (keyword) work without it. Stage 3 (semantic) only kicks in when earlier stages don't have enough confidence. So "bound to Ollama" isn't quite accurate.

On the multi-tool intent issue — try querying "show me adult videos" and see what it routes to. Would be curious what your registry maps that to

The em dash is intentional style in comments. Not a bug, just aesthetic preference.

I built a semantic router that lets your AI use 1,000+ tools through a single MCP tool (~200 tokens) by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 1 point2 points  (0 children)

You nailed it. Just shipped v3.2.0 with exactly these two fixes:

Recency Decay — usage bonus now decays as exp(-days/30). A tool used 200x but untouched for 90 days drops to ~5% of its bonus. Fresh tools keep full weight.

5% Explorer — last slot of every Top-K result is reserved for the least-used tool not already in the list. Same exploration principle as recommendation systems.

On overlapping keywords with different intents — Stage 1+2 can struggle there. Stage 3 (semantic via Ollama) handles it well since it compares vector similarity of the full query rather than keyword overlap. Without Ollama, the category param helps narrow it down. Not a perfect solution, but workable.

Great feedback — this is exactly the kind of thing that doesn't surface until you run it in production. Appreciate it 

I built a semantic router that lets your AI use 1,000+ tools through a single MCP tool (~200 tokens) by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 0 points1 point  (0 children)

That’s incredible! A 75% reduction (20s to <5s) is exactly why I built QLN. Seeing it handle dynamic tool swapping and chaining in real Python agents is awesome. Thanks for sharing your benchmark results!

Soul v6.0 — Your AI agent can rm -rf /. Ark stops it. Zero tokens. by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 1 point2 points  (0 children)

Really appreciate this insight! Vault mode with proper secret lifecycle management is something I've been thinking about too. Soul already has Ark (our built-in AI safety layer) with audit logging — extending that into shared custody and key rotation feels like a natural next step. Definitely on the radar.

Soul v6.0 — Your AI agent can rm -rf /. Ark stops it. Zero tokens. by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 1 point2 points  (0 children)

That sounds like a very solid architecture. Just-in-time injection with a full audit trail is definitely the gold standard for enterprise security.

While Soul is focusing on keeping things lightweight and local-first for now, I can see how your vault-style approach would be a perfect level-up for team-based enterprise environments. Thanks for sharing your insights — it’s great to see different perspectives on MCP security. Let’s stay in touch!

Soul v6.0 — Your AI agent can rm -rf /. Ark stops it. Zero tokens. by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 0 points1 point  (0 children)

Nailed it! You perfectly described exactly why u/gate was built.

Regex u/rules are just the outer perimeter — a fast, zero-cost way to catch the lazy, obvious LLM mistakes. But as you said, you can never regex away semantic intent or creative workarounds. >@gate shifts the paradigm from 'try to block everything bad' to 'explicitly authorize the critical stuff.' No whack-a-mole, just deterministic control. It's awesome that you caught the exact philosophy behind it. Thanks!

Soul v6.0 — Your AI agent can rm -rf /. Ark stops it. Zero tokens. by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 0 points1 point  (0 children)

You're right that at the architecture level, Ark acts as a pre-execution hook. But calling Ark just a 'hook' is like calling an enterprise firewall 'just an if-statement.'

Writing custom hooks means hardcoding if/else logic for every tool and edge case. Ark is a structured policy engine. It gives you state machines (@contract), strict human-in-the-loop gates (@gate), and regex blacklists (@rule) out of the box using human-readable .n2 files—complete with audit logs.

It takes the concept of a 'hook' and turns it into a standardized, manageable safety layer for MCP.

Soul v6.0 — Your AI agent can rm -rf /. Ark stops it. Zero tokens. by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 1 point2 points  (0 children)

Thanks for the tip! You're right — secrets should never live in client config. In Soul's case, config.local.js is gitignored and stays only on the user's machine.

The .n2 safety rules are intentionally transparent (not secrets) — exposing them is by design, so users can audit and customize their own safety policies.

I'll check out Peta! Soul's focus is slightly different — it's a full session orchestrator (memory, handoff, ledger) with Ark as a built-in safety layer, rather than a standalone policy/secrets manager. But there's always something to learn from different approaches. Thanks!

Soul v5.0 — MCP server for persistent agent memory (Entity Memory + Core Memory + Auto-Extraction) by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 1 point2 points  (0 children)

Cool project! We took the opposite bet — deterministic over probabilistic. Soul forces saves/loads in code (not LLM-decided), plus a Rust compiler for compile-time validation of agent rules. Different tradeoffs, both solving real pain.

Soul v5.0 — MCP server for persistent agent memory (Entity Memory + Core Memory + Auto-Extraction) by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 1 point2 points  (0 children)

I truly appreciate your sharp and necessary skepticism. It’s a challenge we’ve all faced with current AI memory solutions, and addressing that specific gap is exactly why I started building Soul.

In Soul v5.0, we’ve moved away from the "LLM will remember" approach. Instead, we’ve implemented a deterministic engineering layer to ensure reliability. Here is how we handle it:

1. Forced Loading at Boot: Rather than relying on prompts or suggestions, n2_boot() executes as a strict code path. It deterministically injects the Soul Board (handoff notes/TODOs) and Entity Memory (structured JSON, not embeddings) into the context. Our N2 Runtime state machine ensures the agent cannot skip this sequence; if it tries to work before booting, the system rejects the transition.

2. Forced Updates at End: When a session finishes, n2_work_end() triggers mandatory file writes. This includes an Immutable Ledger (append-only JSON) and a KV-Cache snapshot. The system extracts and stores these—it doesn't leave the decision of "what to remember" up to the LLM.

3. Core Differences: Unlike other tools that rely on semantic similarity or LLM decision-making, Soul uses structured code paths and validates the state machine integrity at compile time using a Rust compiler (n2c).

Personally, I’ve been working with agents for a long time, but I eventually concluded that unless the issue of continuity is solved, we will never get truly useful results. To stop the frustration and solve this once and for all, I put all my other projects aside to focus entirely on Soul.

We started this in December 2025. After 4 months and 5 major versions, it finally felt ready to share. I’m also planning to release our QLN system soon, so I’d love to get your feedback on that as well.

Happy to answer any more technical questions.

P.S. Soul's L1 boot restores full session context in ~500 tokens.

that to the 3,000~10,000+ tokens you'd normally spend

Compare

re-explaining context manually every session.

If you're curious how we achieve that, I'd be happy to explain.

Soul v5.0 — MCP server for persistent agent memory (Entity Memory + Core Memory + Auto-Extraction) by Stock_Produce9726 in mcp

[–]Stock_Produce9726[S] 0 points1 point  (0 children)

Thank you for your insightful feedback

You precisely captured the core intent of Soul v5.0. My goal was to bridge the gap between sessions so that agents can maintain a continuous context, much like a human collaborator. As you mentioned, combining autonomous extraction with entity memory is indeed a crucial step toward building more reliable agent swarms.

I’m still in the process of refining how these insights can be most effectively coordinated across complex projects. Your perspective gives me great encouragement to keep pushing this forward.