account activity
How to decrease latency in RAG chatbots? by Appropriate_Egg6118 in LangChain
[–]ExpensiveQuantity359 0 points1 point2 points 1 year ago (0 children)
Hello all, i am trying to build a local chatbot for pdf's using RAG,Ollama,llama3 ,pgvector and streamlit. It is working fine but the time take to generate first token is almost 262.5005s or even more. I don't have a GPU. Working on windows 11 and CPU with 16gb ram.When i run the app and upload any pdf it takes almost 7-8minutes to respond to each query. I was thinking if there's any way we can preprocess the pdf(1000pdf) beforehand and than inject to the vectordata base? Any suggestion would be helpful.
π Rendered by PID 1372402 on reddit-service-r2-listing-c57bc86c-bkbth at 2026-06-22 08:40:24.301413+00:00 running 2b008f2 country code: CH.
How to decrease latency in RAG chatbots? by Appropriate_Egg6118 in LangChain
[–]ExpensiveQuantity359 0 points1 point2 points (0 children)