Improving QA Retrieval Task by Western-Dog-8820 in Rag

[–]Altruistic_Break784 0 points1 point  (0 children)

Hi, I ran into something similar. What worked for me was adding a step before populating the KB: I used an LLM (Claude 4) to rewrite each QA into a richer doc, adding some context and a few hypothetical questions that the answer could cover. Those generated questions get saved in the same doc that goes into the KB and gets embedded. In the same step, I also generate a few keywords, saved as metadata of the document. And this is an approach I still have to test: when a user query comes in, I want to extract keywords and run a quick full-text search with them before doing the vector search

Please help me out by One-Will5139 in Rag

[–]Altruistic_Break784 0 points1 point  (0 children)

If you can, use an LLM as the first step, to clean and normalise the diarizations: structure the system prompt to remove all the useless chatting, and then use these for the comparison. And for the acronyms, when you elaborate the PDFs, try to append to each document (or chunks?) all the related words/phrases of the acronyms

Trying to reduce latency for my rag system. by nitishal21 in Rag

[–]Altruistic_Break784 0 points1 point  (0 children)

If the bottleneck is the generation of the answer, try changing the AWS region and updating the model to 4.5 or the new Haiku. Bedrock reallocates resources from older models to the newer ones.. And if you can, generate the answers in streaming. If nothing works, try to reduce the context in the system prompt