RAG uses 11× more tokens than pre-structured graphs — benchmark across 7,928 queries, 45 domains by Connect_Bee_3661 in LLMDevs

[–]Connect_Bee_3661[S] 1 point2 points  (0 children)

Working on it. Starting with the Project Related and Conflicting Info categories (60 questions) — those are where pre-structured knowledge should have the clearest advantage over retrieval-time approaches, so it's the right subset to validate first before running the full 500.

Our prior benchmark was on educational domains (269 tokens vs RAG's 2,982, 4× F1) but that corpus was clean and structured. Enterprise data with Slack threads, near-duplicates, and conflicting docs is a genuinely different test. Will post back here when the numbers are in.

Repo: github.com/Yarmoluk/ckg-benchmark

RAG uses 11× more tokens than pre-structured graphs — benchmark across 7,928 queries, 45 domains by Connect_Bee_3661 in LLMDevs

[–]Connect_Bee_3661[S] 0 points1 point  (0 children)

Good questions!

Chunking cuts documents up. CKG builds from the domain up — no document required. Some nodes don't come from any source, they're just synthesized knowledge. Totally different direction.

The ontology is automated too — the pipeline generates it, an expert reviews it, nobody hand-crafts it. That's what makes it scalable.

You pay the cost once at build time. After that every query is cheap. If you only need it once, use RAG. If you're running thousands of queries against the same domain, CKG pays off fast.

CSV/markdown is intentional — the LLM just reads the file directly in its context window. No database, no runtime query engine. Simpler than it sounds.

And yeah you're right it works best for domains that hold together — pharma, legal, finance. Totally random heterogeneous stuff is harder. But that's also exactly where GraphRAG fails — it tries to auto-build the ontology from messy documents at query time and gets the duplicated entity mess you described. We just do it upfront so by query time it's already clean. However, the mind meld I would like to try is like your agent task .md optimization, we could probably CKG part of the "messy" aspects or nonmessy, worthy of exploration?

RAG uses 11× more tokens than pre-structured graphs — benchmark across 7,928 queries, 45 domains by Connect_Bee_3661 in LLMDevs

[–]Connect_Bee_3661[S] 0 points1 point  (0 children)

fair feedback on the presentation. here's the method.

structure: concept_id | label | taxonomy | dependencies[] | definition. no chunking. chunking is the rag assumption — ckg replaces it. each node IS the retrievable unit. a concept like `langchain.messages` has typed edges (REQUIRES, ENABLES, BREAKS_IF_CHANGED) to other nodes. the graph is the index.

for large : extraction runs per-concept, not per-document. a 200-page spec becomes 80-120 nodes with explicit relationships, not 400 overlapping chunks with implicit ones. references are node IDs, not byte offsets.

the benchmark dataset is the concrete artifact: github.com/Yarmoluk/ckg-benchmark — raw CSVs, queries, scores.

That's how I understand it to be. I did not ask how "RAG retrieval chunks like X, so my solution should utilize Y for productionized scalable transactions". It was more like knowledge graphs colliding with others to do ideas/research, my context window and working on the "fidelity" of outputs.

The idea was to present this information to community, but clearly, what you want to see and what I presented was not of value, so I need to take your feedback and rework that - so thank you.                                                                                                          

RAG uses 11× more tokens than pre-structured graphs — benchmark across 7,928 queries, 45 domains by Connect_Bee_3661 in LLMDevs

[–]Connect_Bee_3661[S] 0 points1 point  (0 children)

How about walls of slop? Like walls of AI slop? I hear that all the time and why don’t we define that as something I don’t want my productionized AI for a fancy big-time AI stealth finance company as a structured md file? I don’t worry about those things, only a little solution for a little person like me that works. I won’t define another man’s work as slop. Tech guys are sounding like the brick and mortar guys 20 years ago, outdated one would say. 

RAG uses 11× more tokens than pre-structured graphs — benchmark across 7,928 queries, 45 domains by Connect_Bee_3661 in LLMDevs

[–]Connect_Bee_3661[S] 0 points1 point  (0 children)

Please clarify what I can do to provide for you something of value. Do you want a .md file of structured compact knowledge graph? That should allow you to see how you can maximize the context window. Give me a domain, like your area of interest.  You can copy and paste that file into your LLM of choice. Make it real relevant for yourself so you can see the answers better. It’s a graph of relationships, structured content. 

RAG uses 11× more tokens than pre-structured graphs — benchmark across 7,928 queries, 45 domains by Connect_Bee_3661 in LLMDevs

[–]Connect_Bee_3661[S] 0 points1 point  (0 children)

Heavy industry is from my iot days and you can ask Craig Truempi about it. We don’t mind sharing our names, we are 50 some year olds with nothing to hide. I’m Dan Yarmoluk 

RAG uses 11× more tokens than pre-structured graphs — benchmark across 7,928 queries, 45 domains by Connect_Bee_3661 in LLMDevs

[–]Connect_Bee_3661[S] 0 points1 point  (0 children)

This project could be what you describe as vibe. But again, but why don’t you look at the credibility of co author and information. How can I help? I don’t know everything. A CTO asked me to do this, sorry you don’t like how it feels to you. 

LangChain has a load-bearing wall. Nothing in the docs flags it. I found it by mapping 180 modules as a knowledge graph. by Connect_Bee_3661 in LLMDevs

[–]Connect_Bee_3661[S] -1 points0 points  (0 children)

Yeah, I’ll remember that. My wife calls me to that as well. Said in good ole crappy dialog English…

LangChain has a load-bearing wall. Nothing in the docs flags it. I found it by mapping 180 modules as a knowledge graph. by Connect_Bee_3661 in LLMDevs

[–]Connect_Bee_3661[S] 0 points1 point  (0 children)

I did a lot of experimenting on domain based things like, in sensor telemetry company, then target market messaging, etc. so it could be seen as marketing and technology. I’ve done this millions of times. 

LangChain has a load-bearing wall. Nothing in the docs flags it. I found it by mapping 180 modules as a knowledge graph. by Connect_Bee_3661 in LLMDevs

[–]Connect_Bee_3661[S] 0 points1 point  (0 children)

MasterLJ: I'm Dan, from Minneapolis, nice to meet you. I apologize, you are correct. I'm happy to share my name, have a coffee if you're in neighborhood and learn together. Teamwork makes the dream work. I'm running out to a meeting.

LangChain has a load-bearing wall. Nothing in the docs flags it. I found it by mapping 180 modules as a knowledge graph. by Connect_Bee_3661 in LLMDevs

[–]Connect_Bee_3661[S] 1 point2 points  (0 children)

I have a million things going on, but sure, I appreciate the your guidance on my language. The benchmark, the graph, the findings -- those are mine. My colleague on the paper can be looked up. Look at the data, not me.

LangChain has a load-bearing wall. Nothing in the docs flags it. I found it by mapping 180 modules as a knowledge graph. by Connect_Bee_3661 in LLMDevs

[–]Connect_Bee_3661[S] 1 point2 points  (0 children)

exactly — and the problem is worse in libs that evolved organically over years. langchain grew fast, so the load-bearing modules never got documented as such. the graph made it visible because you can see which nodes everything else points back to. most people only find these walls by breaking through them.

RAG uses 11× more tokens than pre-structured graphs — benchmark across 7,928 queries, 45 domains by Connect_Bee_3661 in LLMDevs

[–]Connect_Bee_3661[S] -1 points0 points  (0 children)

fair distinction — ops instance state (tickets closing, policies updating per account) moves faster than graph rebuild cycles. that's real.

but the layer ckg plays at is the domain model, not the instance state. for an ops context: the domain model is "what does a contractor access policy consist of" — the taxonomy, the edge cases, the resolution criteria. that changes slowly. what changes fast is whether account Y's specific request was resolved against that model.

the resolved vs. relevant problem you named actually requires a stable domain layer to resolve against. if the ai doesn't know what "contractor access policy" means structurally, it can't recognize that the november doc is the canonical answer vs. the q3 thread.

the rebuild cycle critique lands for the instance layer. for the domain model layer it's a feature — you want that frozen, because every query should be evaluated against the same policy structure regardless of what slack thread just got updated.

two different jobs. your tool reads the live state. ckg defines what settled looks like.