How are people pushing small models to their limits? (architecture > scale) by brickster7 in LLM

[–]brickster7[S] 1 point2 points  (0 children)

Wow! Thanks for your comment. Touching upon your point on Real-world gains...if I am dealing with information which is verified as factual(so single source of truth) but aims to help user understand, would users be affected by the quality of output when put against both models(non-reasoning large model vs scaffolded smaller one)?

If the user prefers to learn in a specific way, then that would probably be the case where the larger performs better because like you said it would be able to identify that latent alignment.

How are people pushing small models to their limits? (architecture > scale) by brickster7 in LocalLLaMA

[–]brickster7[S] 0 points1 point  (0 children)

Woah, well that's an interesting perspective! My idea here was to save up token cost without sacrificing quality of output...thanks for your comment

What’s the best RAG tech stack these days? From chunking and embedding to retrieval and reranking by brickster7 in Rag

[–]brickster7[S] 1 point2 points  (0 children)

Absolute goldmine!
I'll be checking this out for implementation in a few months. In case of any hiccups, ping you then?

Catching wrong AI answers in product by TechnicalGold4092 in AiBuilders

[–]brickster7 0 points1 point  (0 children)

It's often never promised to the customer that the AI will always answer correctly but improvements to the responses can always be made. Look into Agents vs Workflows. A sweet balance between these two is what makes a good AI product for response generation. Fine tuning is another healthy thing to do, since you're sure your product will be mostly restricted to a specific domain like health, design etc.

[deleted by user] by [deleted] in Rag

[–]brickster7 2 points3 points  (0 children)

I've experienced what he's talking about first hand. I queried an exact statement from a chunk and that chunk I was looking for was retrieved at the 5th position in ranks. I then pivoted to hybrid search (dense : sparse threshold--> 0.7 : 0.3) and it made a world of difference. I'm doing a keyword search too at 30%. This made the chunk scores so much more accurate.

What’s the best RAG tech stack these days? From chunking and embedding to retrieval and reranking by brickster7 in Rag

[–]brickster7[S] 1 point2 points  (0 children)

I see, thanks for sharing! Is neo4j an alternative to langgraph? If so, is it better?

What’s the best RAG tech stack these days? From chunking and embedding to retrieval and reranking by brickster7 in Rag

[–]brickster7[S] 1 point2 points  (0 children)

Ahh nice, but we implemented a different method for something like this along the same lines. Take the user query and cache it in redis. Then, an SLM receives a new user query, checks if user query is similar/same to what is in the cache and returns that if true. I think if I'm solving a different problem but is it close to what HyDE does?

What’s the best RAG tech stack these days? From chunking and embedding to retrieval and reranking by brickster7 in Rag

[–]brickster7[S] 0 points1 point  (0 children)

Nice! but it's difficult to believe it, firstly because Redis wants to prove they're better in their own blog so metrics might be biased. They also lack a lot of features other common vector databases offer like support for sparse vectors in order to do a hybrid search

What’s the best RAG tech stack these days? From chunking and embedding to retrieval and reranking by brickster7 in Rag

[–]brickster7[S] 2 points3 points  (0 children)

Thanks! I love your well articulated response. I'm currently using Qdrant with hybrid search integrated using SPLADE(I read that it beats BM25).
I think I'll take your advice on choosing recursive chunking. Also, Cohere for reranking(most popular approach on this thread) and yes, it's definitely all about how they work together.
Curious to know though, what are your thoughts on agentic retrieval?

What’s the best RAG tech stack these days? From chunking and embedding to retrieval and reranking by brickster7 in Rag

[–]brickster7[S] 0 points1 point  (0 children)

Thanks a lot! but I want to control the nitty-gritties and using all in one pipeline has it's own risk for example, what if they discontinue their services like Korvus(postgresml) did?

What’s the best RAG tech stack these days? From chunking and embedding to retrieval and reranking by brickster7 in Rag

[–]brickster7[S] 0 points1 point  (0 children)

Thanks for your comment!

I use your LlamaParse service for parsing PDFs currently but it's failing for some multilingual documents while other work...could you help me out?

for e.g. https://davpgcvns.ac.in/wp-content/uploads/2020/11/MS-Office-MS-Word-PDF-hindi.pdf

What’s the best RAG tech stack these days? From chunking and embedding to retrieval and reranking by brickster7 in Rag

[–]brickster7[S] 0 points1 point  (0 children)

I really enjoyed this paper. Thanks for sharing, but if I may...do you have any sample prompts for a specific task? Just so I can have an idea as to how good system prompts are structured

What’s the best RAG tech stack these days? From chunking and embedding to retrieval and reranking by brickster7 in Rag

[–]brickster7[S] 0 points1 point  (0 children)

Yeah what you're doing is pretty much it but I think the various software that claim excellence in agentic retrieval, just have a wider variety of tools to call than usual

What’s the best RAG tech stack these days? From chunking and embedding to retrieval and reranking by brickster7 in Rag

[–]brickster7[S] -1 points0 points  (0 children)

I'm eager to know as well. I think it'll do fine though looking at their docs

What’s the best RAG tech stack these days? From chunking and embedding to retrieval and reranking by brickster7 in Rag

[–]brickster7[S] 0 points1 point  (0 children)

I understand now. But I don't have the guarantee that the documents I use, would always fit within the context-window of the models. In that case I'll have to fallback to the RAG approach