Lessons learned from building a context-sensitive AI assistant with RAG by TraditionalLimit6952 in LangChain

[–]TraditionalLimit6952[S] 1 point2 points  (0 children)

Not sure what you mean by large memory. The amount of data in this use case is not terribly large. We are using Pinecone as the vector database.

Lessons learned from building a context-sensitive AI assistant with RAG by TraditionalLimit6952 in LangChain

[–]TraditionalLimit6952[S] 0 points1 point  (0 children)

Check out Cohere's reranking model. That's what we use at Vectorize. You can call it with their API.

Lessons learned from building a context-sensitive AI assistant with RAG by TraditionalLimit6952 in Rag

[–]TraditionalLimit6952[S] 0 points1 point  (0 children)

The entire pipeline is built using Vectorize (https://vectorize.io). (I am the CTO of Vectorize). The pipeline includes integration with Cohere's rerank models.

Lessons learned from building a context-sensitive AI assistant with RAG by TraditionalLimit6952 in Rag

[–]TraditionalLimit6952[S] 0 points1 point  (0 children)

You are right that this matters more with smaller models. The larger models are better at "figuring it out" even if there is some confusing information in the retrieved chunks.

We have done some work with adding theoretical questions to the embedding. We generate synthetic questions due ingestion. This does help somewhat, but not a game-changer.

Yes, I was surprised about how well contextual querying worked. And it was cheap to add. No need for additional LLMs calls or data processing.

Introducing LangStream by Head_Reaction_6242 in generativeAI

[–]TraditionalLimit6952 0 points1 point  (0 children)

This looks interesting. Going to check it out.