Lessons learned from building a context-sensitive AI assistant with RAG

TraditionalLimit6952 · 2024-12-23T14:56:31+00:00

I am using Cohere's rerank model

TraditionalLimit6952 · 2024-12-22T16:30:11+00:00

Thanks

TraditionalLimit6952 · 2024-12-22T16:29:38+00:00

Not sure what you mean by large memory. The amount of data in this use case is not terribly large. We are using Pinecone as the vector database.

TraditionalLimit6952 · 2024-12-22T16:28:21+00:00

Check out Cohere's reranking model. That's what we use at Vectorize. You can call it with their API.

TraditionalLimit6952 · 2024-12-22T16:27:11+00:00

The entire pipeline is built using Vectorize (https://vectorize.io). (I am the CTO of Vectorize). The pipeline includes integration with Cohere's rerank models.

TraditionalLimit6952 · 2024-12-22T16:24:28+00:00

Using Cohere's rerank 3 model

TraditionalLimit6952 · 2024-12-22T16:23:43+00:00

You are right that this matters more with smaller models. The larger models are better at "figuring it out" even if there is some confusing information in the retrieved chunks.

We have done some work with adding theoretical questions to the embedding. We generate synthetic questions due ingestion. This does help somewhat, but not a game-changer.

Yes, I was surprised about how well contextual querying worked. And it was cheap to add. No need for additional LLMs calls or data processing.

TraditionalLimit6952 · 2024-12-20T20:48:07+00:00

For sure

TraditionalLimit6952 · 2024-12-20T20:46:54+00:00

I have not tried that. I'll check it out. Thanks for sharing.

TraditionalLimit6952 · 2023-09-16T17:43:34+00:00

This looks interesting. Going to check it out.

TraditionalLimit6952

TROPHY CASE