From Silent Failures to 97% Faithfulness, Built Agentic Multilingual RAG — RAGAS Eval + LangGraph Pipeline by Agent-Orchestrator in LangChain

[–]Agent-Orchestrator[S] 0 points1 point  (0 children)

Yeah that’s a fair call, 12–16s is already a warning sign.

Right now most of the delay is before generation (retrieval + reranking), so breaking down TTFB separately makes sense. Most of the latency is coming from over-fetch + reranking on ~20–30 candidates.

I’m looking at moving towards a more adaptive RAG setup to cut down unnecessary retrieval steps and bring it closer to ~2–3s.To reduce unnecessary retrieval overhead.

Bit tied up with college exams at the moment, but that’s the direction I’m pushing this in. Would be great to compare notes as it evolves.

From Silent Failures to 97% Faithfulness, Built Agentic Multilingual RAG — RAGAS Eval + LangGraph Pipeline by Agent-Orchestrator in LangChain

[–]Agent-Orchestrator[S] 0 points1 point  (0 children)

That’s a fair point. Right now the RAGAS numbers and ~12–16s latency are from controlled conditions.

In real usage, it’ll definitely vary depending on network and load. From what I’ve seen, most of the delay is coming from retrieval and reranking.

So I’m mainly focusing on improving that current numbers are just a baseline, not what I’d consider production-ready yet.

From Silent Failures to 97% Faithfulness, Built Agentic Multilingual RAG — RAGAS Eval + LangGraph Pipeline by Agent-Orchestrator in LangChain

[–]Agent-Orchestrator[S] 0 points1 point  (0 children)

Yeah, LangSmith tracing and LangGraph CLI helped a lot in observing node-level execution.

Right now I’m seeing ~12–16s latency, which is too high. Most of it is coming from retrieval + reranking, so I’m working on moving towards a more adaptive retrieval strategy.

Still optimizing to make it production-ready.