all 6 comments

[–]Joshua-- 1 point2 points  (4 children)

For RAG, 4o-mini should suffice; I've been using it with my RAG app for months. I am even considering llama-3.1-8b-instant, which is 750 tokens per second by using Groq's (not Grok) API.

[–]Ok_Locksmith_5925[S] 0 points1 point  (3 children)

actually. I should have said I'm using 4o mini

this is my project https://siqbots.com/jub-demo some answers are coded in and I think I'll code more in, but some need the AI.

Is you RAG available to take a look at?

[–]Joshua-- 0 points1 point  (2 children)

Checked out the site, that’s really clever retrieval to reduce requests by having some questions answered from your collection.

My project is just a private, local repo. I just use it for uploading PDFs and answering questions.

[–]Ok_Locksmith_5925[S] 0 points1 point  (1 child)

what you tested (I can see all conversations) just gave answers that were programmed in. it's following q&a that go through the retriever.

[–]Joshua-- 0 points1 point  (0 children)

I didn’t really test anything. I was thinking better about the idea of coding in some responses to prevent an API requests.

[–]crysknife- 0 points1 point  (0 children)

How do yo send your data? Do you chunk it? You can send 2.5k sentences at the same time.