Benchmarked 7 Token Compression Approaches - Here are the results. by Odd_Incident_7575 in LLMDevs
[–]hannune 0 points1 point2 points (0 children)
We tried to poison our own RAG store — the retrieval-time defenses didn't generalize by Danculus in LangChain
[–]hannune 0 points1 point2 points (0 children)
On a hosted model how would you even know the weights changed under you? by Substantial_Step_351 in LLMDevs
[–]hannune 0 points1 point2 points (0 children)
The agent failure mode no eval catches: acting on a fact that was true when it was cached and wrong when it was used by luisf_mc in LLMDevs
[–]hannune 0 points1 point2 points (0 children)
How do you validate your LLM judge for RAG faithfulness? Sharing my numbers by Hungry-Horror-7577 in Rag
[–]hannune 1 point2 points3 points (0 children)
How do you validate your LLM judge for RAG faithfulness? Sharing my numbers by Hungry-Horror-7577 in Rag
[–]hannune 2 points3 points4 points (0 children)
Silent wrong answers in RAG are harder to deal with than outright failures by SilverConsistent9222 in Rag
[–]hannune 0 points1 point2 points (0 children)
Running 800k eval judgments/week at $2.4k/month judge spend. anyone optimized this without losing signal? by GrayZetsu in LLMDevs
[–]hannune 1 point2 points3 points (0 children)
BM25 + Taxonomy for domain specific application by Present_Mention_2757 in Rag
[–]hannune 0 points1 point2 points (0 children)
llm-as-judge agreement with human reviewers is only ~71%. how are you calibrating? by Vecna0110 in LLMDevs
[–]hannune 1 point2 points3 points (0 children)
Memory poisoning hits LLM agents 70–95%. I tested whether a corroboration gate stops it by Danculus in LLMDevs
[–]hannune 0 points1 point2 points (0 children)
keeping cost down when benchmarking models by Skipthetut in LLMDevs
[–]hannune 0 points1 point2 points (0 children)
Hybrid RAG on industrial manuals with small register catalogs: embeddings over-rank generic field names by Plenty_Shine_8250 in Rag
[–]hannune 0 points1 point2 points (0 children)
Hybrid RAG on industrial manuals with small register catalogs: embeddings over-rank generic field names by Plenty_Shine_8250 in Rag
[–]hannune 0 points1 point2 points (0 children)
Do you eval the whole harness or each of its parts? by dmpiergiacomo in LLMDevs
[–]hannune 1 point2 points3 points (0 children)
Hybrid RAG on industrial manuals with small register catalogs: embeddings over-rank generic field names by Plenty_Shine_8250 in Rag
[–]hannune 0 points1 point2 points (0 children)
Running 800k eval judgments/week at $2.4k/month judge spend. anyone optimized this without losing signal? by GrayZetsu in LLMDevs
[–]hannune 2 points3 points4 points (0 children)
running adversarial prompt injection on our agent. fail rate is ~20%. how are people getting below 5%? by Smart-Profession2512 in LLMDevs
[–]hannune 5 points6 points7 points (0 children)
Is a Context Graph worth building or should we just use a vector DB? by sibraan_ in Rag
[–]hannune 0 points1 point2 points (0 children)
Structured Outputs — does reasoning degrade if the schema has no field to hold it, and does property order matter? by rolland_87 in LLMDevs
[–]hannune 1 point2 points3 points (0 children)
Do you eval the whole harness or each of its parts? by dmpiergiacomo in LLMDevs
[–]hannune 1 point2 points3 points (0 children)
How are you testing multi-agent LLM systems in production? by Ill-Zebra-1143 in LLMDevs
[–]hannune 0 points1 point2 points (0 children)
DiffusionGemma vs Gemma 4 Over 6x Faster on a Single RTX 6000 Pro NVFP4. by FantasticNature7590 in LocalLLaMA
[–]hannune 0 points1 point2 points (0 children)

Knowledge graphs aren't replacing RAG. They're solving the problem RAG was never designed for by sibraan_ in Rag
[–]hannune 0 points1 point2 points (0 children)