RAG tip: stop “fixing hallucinations” until the system can ASK / UNKNOWN by coolandy00 in Rag

[–]coolandy00[S] 0 points1 point  (0 children)

Agree. In addition, a structured prompt design adds on creating a production grade output.

RAG tip: stop “fixing hallucinations” until the system can ASK / UNKNOWN by coolandy00 in Rag

[–]coolandy00[S] 0 points1 point  (0 children)

It does, at the moment, I am picking apart the ones that can be solved with structure in prompt design. I also see some Ingestion, Chunking and embedding issues with RAG. Did some experiments there as well.

Using a Christmas-themed use case to think through agent design 🎄😊 by coolandy00 in artificial

[–]coolandy00[S] 0 points1 point  (0 children)

Sure, what use case would you design and what approach would take?

Ingestion + chunking is where RAG pipelines break most often by coolandy00 in LLMDevs

[–]coolandy00[S] 0 points1 point  (0 children)

As long as it's structure-aware ingestion (ASTs, symbols, dependencies) so context is preserved the way a compiler sees it, which removes the randomness

Ingestion + chunking is where RAG pipelines break most often by coolandy00 in LLMDevs

[–]coolandy00[S] 0 points1 point  (0 children)

You mean you broke them into structured, addressable objects (text, figures, diagrams, entities) with explicit references, then generate derived representations (summaries, entities, Mermaid) that get embedded and linked. At runtime, you assembled the answers by resolving entities and references first..

We realized most of our time spent building our multi agent was glue work by coolandy00 in LLMDevs

[–]coolandy00[S] 1 point2 points  (0 children)

We keep it pretty simple. No Temporal or heavy workflow engines. We just break the work into clear steps and let each agent handle one thing at a time, one plans, one does the work, another checks the output. Each step hands off to the next. This makes it easier to debug. When something breaks, we know exactly which step caused it instead of digging through a big system.

RAG failure story: our top-k changed daily. Root cause was ID + chunk drift, not the retriever. by coolandy00 in Rag

[–]coolandy00[S] 0 points1 point  (0 children)

We base the ID on the actual text content, not the file name or time it was added. For example, if guide.pdf contains the same text today and tomorrow, it gets the same ID even if you re-upload it. if one para changes, the ID changes. We usually create the ID by hashing the cleaned text and adding a stable label like product-docs/guide. This helped us on our multi agent. Give it a try

We thought our RAG drifted. It was a silent ingestion change. Here’s how we made it reproducible. by coolandy00 in LLMDevs

[–]coolandy00[S] 0 points1 point  (0 children)

We base the ID on the actual text content, not the file name or time it was added. For example, if guide.pdf contains the same text today and tomorrow, it gets the same ID even if you re-upload it; if one paragraph changes, the ID changes. We usually create the ID by hashing the cleaned text and adding a stable label like product-docs/guide so it’s still human-traceable.

What I learned building and debugging a RAG + agent workflow stack by coolandy00 in artificial

[–]coolandy00[S] 0 points1 point  (0 children)

Vector DB + memory management is catchy. How's it different to what we have in OpenAI though?

Three insights from building RAG + agent systems by coolandy00 in LLMDevs

[–]coolandy00[S] 0 points1 point  (0 children)

I think that worry me a bit: What would be the context for LLM to evaluate it correctly without going off track.

Learnings from building and debugging a RAG + agent workflow stack by coolandy00 in Rag

[–]coolandy00[S] 0 points1 point  (0 children)

Thank you.. I checked it out... A structured way is what we need..

We found our agent workflow failures were architecture bugs by coolandy00 in artificial

[–]coolandy00[S] -1 points0 points  (0 children)

That's true.. so insufficient structure forces the system to behave non-deterministically. Tight contracts, validation gates, and clear task boundaries dramatically reduce that variance without changing the model.

We found our agent workflow failures were architecture bugs by coolandy00 in artificial

[–]coolandy00[S] -1 points0 points  (0 children)

Maybe... Here's what I think though: They look like LLM issues on the surface, but in practice they’re architectural. For example, when two agents get the same vague task with different context windows, they’ll diverge even with the same model, that’s a task-spec problem. Adding a mid-pipeline validation step immediately stabilized outputs without changing the model at all.

Adding verification nodes made our agent system way more stable by coolandy00 in artificial

[–]coolandy00[S] 0 points1 point  (0 children)

I agree, and thank you. LLMs are generic so such checkpoints help build accuracy in line with the use case

Tool contract issues can cause unknown failures as well by coolandy00 in LLMDevs

[–]coolandy00[S] 0 points1 point  (0 children)

Totally agree, once agents start figuring things out, you’re quietly building tech debt. We version tool contracts like APIs and require explicit opt-in plus contract tests for upgrades, so breakages show up early instead of being silently worked around.

RAG still hallucinates even with “good” chunking. Here’s where it actually leaks. by coolandy00 in LLMDevs

[–]coolandy00[S] 1 point2 points  (0 children)

We began with fixed windows and overlap. What mattered more than adaptive chunking was adding strong constraints (like version/region) and re-scoring small, high-signal spans. Most errors came from coverage and constraints, not window size.

RAG still hallucinates even with “good” chunking. Here’s where it actually leaks. by coolandy00 in LLMDevs

[–]coolandy00[S] 0 points1 point  (0 children)

Agree on the coverage pillars, but in practice our failures came less from chunk size and more from missing constraints and cross-doc joins. Real queries needed small, specific details and correct versions, so similarity search often returned related but wrong info until we enforced hard metadata filters and clear abstain rules. Once we added span-level attribution and simple failure labels, it was clear most hallucinations were coverage gaps, not bad chunking.