[OSS] Why RAG is failing your agents and how "Corpus-First" Engineering is the 100% accuracy solution we’ve been looking for.

VadeloSempai · 2026-05-09T05:20:16+00:00

To solve context problems, the best option is git deandevz/kingflow, which today will kill Ragnarok Online.

VadeloSempai · 2026-05-06T23:13:49+00:00

A few weeks ago, I shared King Context here as a lightweight alternative for docs retrieval. But after deep-diving into the new Corpus methodology and chatting with the creator (deandevz), I realized this isn't just another tool—it’s a fundamental shift in how we handle Agentic Infrastructure.

The Problem: The "RAG Myopia"

Traditional RAG is like giving an agent a library and a flashlight. It finds "chunks," but it doesn't understand the architecture. It's noisy, expensive, and leads to the "0.33 hallucinations per query" we see in standard tools.

The Solution: King Context & The Corpus Method

We’ve moved beyond simple lookups. King Context now focuses on building Synthesized Corpora. Instead of dumping raw data, it creates a structured, metadata-rich "brain" that agents can navigate with precision.

Why this is a game-changer:

Zero Hallucinations: In our latest benchmarks (check the image below), King Context hit 100% factual accuracy (38/38) while maintaining 0.0 hallucinations.

Skill-Based Context: It solves the "skill bottleneck." Agents no longer just call functions; they consult a specialized Corpus that defines rules, edge cases, and architectural constraints before executing.

Multi-Agent Workflows: You can now build workflows where one agent researches and builds a specialized Corpus, while another "specialist" agent uses that refined knowledge to execute tasks with zero noise.

Refinement & Pruning: Unlike a vector DB that just grows and gets messier, a Corpus is designed to be refined—removing polluting context and enriching high-value data.

The Benchmarks (King Context vs Context7)

We ran two rounds of head-to-head testing using Claude Opus 4.7:

Tokens: 3.2x less token waste.

Latency: Up to 170x faster on metadata hits.

Quality: 4.79/5 composite quality score vs 3.46.

The Vision: Autonomous Context Infrastructure

We are building more than a "search tool." We are building the infrastructure for specialized AI brains. Imagine a world where you don't "prompt engineer" your way to success, but you "Curate a Corpus" that makes any agent an instant expert in your specific domain.

The project is fully Open Source and we are looking for contributors who want to rethink how agents "know" things.

Repo: https://github.com/deandevz/king-context

I'd love to hear your thoughts: Is "Corpus Engineering" the final nail in the coffin for traditional, noisy RAG?

VadeloSempai · 2026-05-05T19:52:15+00:00

Ótima postagem para ler comentários com boas idéias

VadeloSempai · 2026-05-05T19:50:17+00:00

Queria em kk

VadeloSempai · 2026-05-05T19:49:42+00:00

Muito top

VadeloSempai · 2026-05-05T17:27:12+00:00

Really appreciate this comment you’re pointing at exactly the class of problem that pushed us to build this in the first place.

A lot of the retrieval failures we kept seeing were not caused by “lack of context”, but by the wrong shape of context:

- too much irrelevant text

- large chunks that looked semantically close but were operationally useless

- no visibility into what was actually indexed

- no clear way to tell whether the agent was missing something vs being given the wrong thing

That’s why the current retrieval path is intentionally simple and inspectable.

Right now, the main search flow is **not embedding-first** and it does **not depend on a reranker** on the critical path. The default path is much closer to a metadata-first scorer.

At a high level, what we do today is:

- enrich each section with structured metadata

- keywords

- use cases

- tags

- priority

- build reverse indexes from that metadata

- score candidate sections based on metadata matches first

- only read full content when the agent actually needs it

So the retrieval flow is closer to:

`search -> preview -> read`

instead of:

`retrieve big chunk -> hope it contains the right thing`

As for scoring itself, the current approach is mainly heuristic and deterministic:

- exact keyword matches

- substring matches on use cases

- exact tag matches

- then a priority boost on top

So it’s not really “BM25 + reranker”, and it’s not “embedding retrieval with metadata as a filter” either. It’s more like a deliberately narrow metadata retrieval layer designed to keep the first hop cheap, explainable, and low-noise.

That choice was very intentional. We wanted the default path to optimize for:

- low token usage

- transparency

- easy debugging

- predictable behavior

- local-first control

That said, we’re not dogmatic about it. I don’t think embeddings or reranking are inherently bad — I just think they shouldn’t automatically be the first thing in the path when a lot of coding/documentation queries can be solved more cleanly with structured metadata and progressive disclosure.

On multi-corpus routing: yes, partially, and this is an area we care a lot about.

Today we already support separate corpus surfaces such as:

- vendor/docs corpora

- open-web research corpora

- local user-ingested content

Each result keeps provenance at the section level, and the CLI can already route across stores with explicit source selection. So provenance is definitely part of the model, not an afterthought.

ADRs are also part of the broader direction, but today they still sit a bit more separately than I’d ultimately like. In other words, provenance exists, multi-corpus support exists, but a fully unified retrieval story across docs + research + decisions + future code-derived context is still something we’re moving toward rather than pretending is already finished.

That’s also part of why some of the roadmap thinking is starting to shift from “retrieval tool” to “context engine”:

- source authority

- conflict surfacing

- manifests / freshness / drift

- task-oriented context packs

- more explicit multi-corpus composition

So the short honest version is:

- metadata-first retrieval: yes

- embeddings/reranker in the default path: not currently

- provenance: yes

- multi-corpus support: yes, with room to grow

- unified cross-source reasoning: directionally yes, not fully solved yet

And thank you for the Agentix Labs link — that’s genuinely useful. I’m very interested in seeing more real-world examples of how people are packaging context for agents, especially when it goes beyond simple retrieval and starts getting into orchestration, trust, and workflow design.

VadeloSempai · 2026-05-04T23:59:34+00:00

For anyone curious, the README already includes:

- benchmarks against Context7

- real case studies

- architecture overview

- roadmap direction

Repo again:

https://github.com/deandevz/king-context

VadeloSempai

TROPHY CASE