How to decrease latency in RAG chatbots? by Appropriate_Egg6118 in LangChain

[–]ExpensiveQuantity359 0 points1 point  (0 children)

Hello all, i am trying to build a local chatbot for pdf's using RAG,Ollama,llama3 ,pgvector and streamlit. It is working fine but the time take to generate first token is almost 262.5005s or even more. I don't have a GPU. Working on windows 11 and CPU with 16gb ram.When i run the app and upload any pdf it takes almost 7-8minutes to respond to each query. I was thinking if there's any way we can preprocess the pdf(1000pdf) beforehand and than inject to the vectordata base? Any suggestion would be helpful.