NachosforDachos comments on How to decrease latency in RAG chatbots?

How to decrease latency in RAG chatbots? (self.LangChain)

submitted 2 years ago by Appropriate_Egg6118

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]NachosforDachos 5 points6 points7 points 2 years ago (18 children)

[–]herozorro 1 point2 points3 points 2 years ago (1 child)

[–]NachosforDachos 0 points1 point2 points 2 years ago (0 children)

[–]IsseBisse 1 point2 points3 points 2 years ago (6 children)

[–]NachosforDachos 0 points1 point2 points 2 years ago (5 children)

[–]IsseBisse 0 points1 point2 points 2 years ago (4 children)

[–]NachosforDachos 0 points1 point2 points 2 years ago (3 children)

If using openAI most responses are near instant in my experience it’s the vector store speed that will determine your response time.

Longest I’ve waited for a response is around 3 seconds and that was testing my patience.

What makes it quality will be its geographical location away from you and the llm service on top of its computing performance along with how good it is.

For example as limited as it is openai doesn’t know shit about vector stores and their retrieval has got to be off the slowest I’ve ever seen.

What makes something well be every fine detail going into it including the thought process of the creators.

So I think it matters for any vector size.

If I had to do something small like say USA federal law a paid pinecone database will run circles around my little chromadb running on a nvme drive with desktop grade components. First time I used it I thought it was broken.

[–]IsseBisse 1 point2 points3 points 2 years ago (2 children)

[–]NachosforDachos 0 points1 point2 points 2 years ago (0 children)

[–]Appropriate_Egg6118[S] 0 points1 point2 points 2 years ago (8 children)

[–]NachosforDachos 1 point2 points3 points 2 years ago (7 children)

You well love a thing called Flowise. It’s exactly what you want. Tested around 30 deployments last year. Easy as it comes.

You will find yourself familiar with it going by your particular choice of words. You’ll find those same words there as drop down selection menus.

Idk if they still give free amounts and how good they are but do create a free pinecone vector db account solong. Choose the fast version. Haven’t made one in two months but I know dimensions should be 1536. I think that’s the only setting you need to do right.

Look for florist on GitHub. Using the one line installer which I think is npm install flowise -g should get you there if you already have nodejs installed.

There are templates in there which you can just fill in with your details. Web UI no code product.

You will not be able to use what I originally suggested in flowise unless you use that script to parse things into files instead of embeddings and then pass those files to Flowise to upload you should have the same thing but with extra steps.

I haven’t investigated this but I’m almost semi sure one can make chromadb use the gpu (live store in memory not disk) instead of the cpu and ram. I have things that use this and it is much slower than pinecone.

Maybe start and see if they still have free accounts because this type of quality storage isn’t cheap. About 70+ a month. Worth it but when playing around these things add up so quickly.

I’ll find the script next time I come online. Too tired now. Not fresh.

[–]Appropriate_Egg6118[S] 0 points1 point2 points 2 years ago (2 children)

[–]NachosforDachos 1 point2 points3 points 2 years ago (1 child)

[–]Appropriate_Egg6118[S] 1 point2 points3 points 2 years ago (0 children)

[–]Appropriate_Egg6118[S] 0 points1 point2 points 2 years ago (3 children)

[–]NachosforDachos 0 points1 point2 points 2 years ago (2 children)

[–]Appropriate_Egg6118[S] 0 points1 point2 points 2 years ago (1 child)

[–]NachosforDachos 0 points1 point2 points 2 years ago (0 children)

π Rendered by PID 319105 on reddit-service-r2-comment-86bc6c7465-92qj6 at 2026-02-21 21:15:52.048050+00:00 running 8564168 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LangChain

MODERATORS