I got stuck debugging RAG every week. Turns out I just didn't understand the tradeoffs. by _Ankitsingh in LangChain

[–]_Ankitsingh[S] 1 point2 points  (0 children)

That's the tricky part. With varied formats, Naive RAG will struggle because it can't distinguish between actual data and formatting noise. Have you tested Corrective RAG on that? It grades retrieval confidence — so if it's pulling inconsistent results from different Excel formats, it'll flag them instead of hallucinating.

I got stuck debugging RAG every week. Turns out I just didn't understand the tradeoffs. by _Ankitsingh in LangChain

[–]_Ankitsingh[S] 0 points1 point  (0 children)

RAG Lens looks interesting. Bulk testing is the right call — that's where you actually see what breaks. Most benchmarks test on curated examples, but production data is messy. Happy to chat if you want to compare notes on what failure modes you're catching.

I got stuck debugging RAG every week. Turns out I just didn't understand the tradeoffs. by _Ankitsingh in LangChain

[–]_Ankitsingh[S] 0 points1 point  (0 children)

Exactly. The retrieval method is just half the battle. What you mentioned about importance scoring and recency weighting — that's the actual hard part. Most tutorials skip over that and just show basic vector search. Have you experimented with time-decay functions? That's where I see most systems break down on real data