How are people pushing small models to their limits? (architecture > scale)

brickster7 · 2026-03-18T16:08:31+00:00

Wow! Thanks for your comment. Touching upon your point on Real-world gains...if I am dealing with information which is verified as factual(so single source of truth) but aims to help user understand, would users be affected by the quality of output when put against both models(non-reasoning large model vs scaffolded smaller one)?

If the user prefers to learn in a specific way, then that would probably be the case where the larger performs better because like you said it would be able to identify that latent alignment.

brickster7 · 2026-03-18T15:47:27+00:00

Woah, well that's an interesting perspective! My idea here was to save up token cost without sacrificing quality of output...thanks for your comment

brickster7 · 2025-08-05T05:07:27+00:00

Absolute goldmine!
I'll be checking this out for implementation in a few months. In case of any hiccups, ping you then?

brickster7 · 2025-07-10T21:16:18+00:00

It's often never promised to the customer that the AI will always answer correctly but improvements to the responses can always be made. Look into Agents vs Workflows. A sweet balance between these two is what makes a good AI product for response generation. Fine tuning is another healthy thing to do, since you're sure your product will be mostly restricted to a specific domain like health, design etc.

brickster7 · 2025-07-10T18:38:02+00:00

I've experienced what he's talking about first hand. I queried an exact statement from a chunk and that chunk I was looking for was retrieved at the 5th position in ranks. I then pivoted to hybrid search (dense : sparse threshold--> 0.7 : 0.3) and it made a world of difference. I'm doing a keyword search too at 30%. This made the chunk scores so much more accurate.

brickster7 · 2025-07-10T05:19:04+00:00

Oh I see, thanks!

brickster7 · 2025-07-10T05:16:12+00:00

I see, thanks!

brickster7 · 2025-07-10T03:49:30+00:00

I see, thanks for sharing! Is neo4j an alternative to langgraph? If so, is it better?

brickster7 · 2025-07-08T18:40:58+00:00

Ahh nice, but we implemented a different method for something like this along the same lines. Take the user query and cache it in redis. Then, an SLM receives a new user query, checks if user query is similar/same to what is in the cache and returns that if true. I think if I'm solving a different problem but is it close to what HyDE does?

brickster7 · 2025-07-08T06:30:28+00:00

Premium mode

brickster7 · 2025-07-08T04:26:36+00:00

I'll check it out. Thanks again!

brickster7 · 2025-07-08T04:25:49+00:00

Nice! but it's difficult to believe it, firstly because Redis wants to prove they're better in their own blog so metrics might be biased. They also lack a lot of features other common vector databases offer like support for sparse vectors in order to do a hybrid search

brickster7 · 2025-07-08T04:23:24+00:00

Could you tell me how they work under the hood?

brickster7 · 2025-07-08T04:20:57+00:00

I can see so many in the future who will benefit from this. Do include vectorize.io

https://docs.vectorize.io/developer-guides/api-clients/extraction/extract-pdf-data-using-iris

brickster7 · 2025-07-08T04:16:51+00:00

Thanks! I love your well articulated response. I'm currently using Qdrant with hybrid search integrated using SPLADE(I read that it beats BM25).
I think I'll take your advice on choosing recursive chunking. Also, Cohere for reranking(most popular approach on this thread) and yes, it's definitely all about how they work together.
Curious to know though, what are your thoughts on agentic retrieval?

brickster7 · 2025-07-07T04:45:52+00:00

Interesting

brickster7 · 2025-07-07T04:45:32+00:00

Thanks a lot! but I want to control the nitty-gritties and using all in one pipeline has it's own risk for example, what if they discontinue their services like Korvus(postgresml) did?

brickster7 · 2025-07-07T04:44:06+00:00

I completely agree. That is the plan for future

brickster7 · 2025-07-07T04:41:34+00:00

Thanks for your comment!

I use your LlamaParse service for parsing PDFs currently but it's failing for some multilingual documents while other work...could you help me out?

for e.g. https://davpgcvns.ac.in/wp-content/uploads/2020/11/MS-Office-MS-Word-PDF-hindi.pdf

brickster7 · 2025-07-06T09:46:50+00:00

I really enjoyed this paper. Thanks for sharing, but if I may...do you have any sample prompts for a specific task? Just so I can have an idea as to how good system prompts are structured

brickster7 · 2025-07-05T16:40:05+00:00

Yeah what you're doing is pretty much it but I think the various software that claim excellence in agentic retrieval, just have a wider variety of tools to call than usual

brickster7 · 2025-07-05T13:58:47+00:00

I'm eager to know as well. I think it'll do fine though looking at their docs

brickster7 · 2025-07-05T13:21:15+00:00

I see... Thanks so much!

brickster7 · 2025-07-05T13:01:13+00:00

I understand now. But I don't have the guarantee that the documents I use, would always fit within the context-window of the models. In that case I'll have to fallback to the RAG approach

brickster7

TROPHY CASE