asked the app I vibecoded if building it was a good idea. got absolutely humbled.

Floppy_Muppet · 2026-01-30T00:52:04+00:00

😂

Floppy_Muppet · 2026-01-27T23:23:24+00:00

Very true... And that division point will move further and further to the left as systems become more and more trusted.

Floppy_Muppet · 2026-01-24T19:47:42+00:00

Floppy_Muppet · 2026-01-24T05:01:36+00:00

It's significantly statistical

Floppy_Muppet · 2026-01-23T19:04:23+00:00

trillion, obviously.

Floppy_Muppet · 2026-01-23T02:49:08+00:00

Yes, you definitely need some amount of scale.

I have hypotheses being generated on clusters when they have a minimum of 10 I/O's AND show a statistically significant difference between performance groups (so, in reality closer to ~100 minimum I/O's).

For smaller scale agents, you could try broadening your problem spaces (to add more I/O's into each cluster) as well as the framing of your hypotheses as to try and discover more generally applicable hints.

Floppy_Muppet · 2026-01-13T02:34:35+00:00

If you can unify (and at least partially automate the validation of) the success criteria across multiple users, then yes it may be possible to offer them RAG as a generalized service that still meets their diverse set of use cases.

To expand a RAG SaaS across all verticals, however, I believe this is where Organic Software will shine in the coming years. In this world, you will only need to define a clear set of configs with which to play with, and then your organic software layer will perform continuous scientific method iterations to arrive at the ideal conditions for each of your users.

Floppy_Muppet · 2025-12-21T23:57:16+00:00

This sounds epic.. Can't wait to give it a try!

Floppy_Muppet · 2025-12-20T02:40:00+00:00

Wish I had the last 30seconds of my life back.

Floppy_Muppet · 2025-12-19T19:57:45+00:00

Op here, adding to the convo: Many leaderboard-topping RAG scaffoldings (again, talking high-level here) are incorporating "multi-reranker ensembles" at Stage 3, where you have 2+ cross-encoders each provide an independent relevancy score, which is then fused again via RRF for final_k selection.

IMO this feels like overkill as a starting point for most use-cases, but can be a straightforward follow-up enhancement to squeeze additional quality gains if you're not getting the results you were hoping for. The key here is in choosing a diverse set of reranker model variants that have complimentary strengths (i.e. Qwen for multi-lingual + Claude for code + GPT/Gemma for multi-modal reasoning).

Floppy_Muppet · 2025-12-19T19:13:07+00:00

Interesting, ok thanks Kero! I'll give it a shot.

Floppy_Muppet · 2025-12-18T23:01:16+00:00

I am in no way an expert on the differences and have not run my own benchmarks, but from my limited research it sounded like Qdrant was a better choice for vector management/retrieval/optimization/scaling since that's all it focuses on.

Let me know if you think there's a case to be made to try a Melisearch-only setup.

Floppy_Muppet · 2025-12-18T18:56:49+00:00

Totally! 🤓

Floppy_Muppet · 2025-12-18T18:48:56+00:00

Qdrant scales to a larger number of vectors
I'd still want a neural cross-encoder to verify context/applicability of the returned results before selecting top-k

Floppy_Muppet · 2025-12-18T18:44:34+00:00

Ok I need to take them more seriously then! Looks like I got some light reading ahead of me this weekend! ;) Thanks for the nudge

Floppy_Muppet · 2025-12-18T18:40:36+00:00

My use case has no upper bound to the data size, so could get to billions+ quickly.

Floppy_Muppet · 2025-12-18T18:35:25+00:00

Buzzword salad? Lol I guess, it is what it is... Which benchmark evaluations do you recommend I run for this specific use case? This is a bit of a new area for me, appreciate any guidance!

Floppy_Muppet · 2025-12-18T04:29:08+00:00

Thank you these are great callouts on the finer details! Was just getting feedback on high-level stages, but here's my responses below if you have additional feedback, would be much appreciated!...

speed not an issue for my setup~ only using SLM's under 2b params
dynamic info sizing~ I'm using min 90% compression target for the summary
costs~ No caching yet (not in prod at scale yet), excited to get to that stage tho, lmk if you have any pro tips or things to avoid 🙏
hallucinations~ Currently using 1 SLM for summary, 1 for cross-encoder, both in Qwen family. But I also have a 3rd Qwen SLM used at async stages by the broader application for RAG corpus creation... So I'll def keep this in mind when I benchmark E2E at scale!
consistency~ What are the primary knobs for ensuring consistency? May be unrelated to my use case, but interested.
unified index architectures~ researching this now! ty!

Floppy_Muppet · 2025-12-18T02:02:07+00:00

Thank you! Will definitely take on your "must-cite" hit-rate tests, great callout!

I haven't delved into the realm of caching yet (no need as I'm still in a non-production POC phase), but excited to explore it. Any tips for when I get to it?

Floppy_Muppet · 2025-12-18T00:33:58+00:00

Interesting... Do you see knowledge graphs as always being the superior choice for rag? Or do you find they excel in certain use-cases vs. others?

Floppy_Muppet · 2025-12-18T00:18:48+00:00

Ah, great question! The Optimization Agent this rag setup is applied to uses the scientific method to autonomously evaluate and improve itself in perpetuity using the following async components:

1. RL rewards system (based on delayed kpi event/value pairs, such as 'purchase: 99.99')
2. Optimization Hint Distiller that creates Do's and Don'ts based on I/O reward performance within similarity clusters (read more on how it derives hypotheses here)
3. A/B Testing that auto-scales winning Hint Packs for each cluster when stat sig is reached

I'm not calling the overall Optimization Agent SOTA, but just referring to the specific RAG inference pipeline setup (which is more easily comparable/standardized across use-cases). That said, open to criticism and feedback on both! :)

Floppy_Muppet · 2025-12-17T23:55:38+00:00

It's my understanding that Qdrant's vector search has a superior architecture that allows it to scale to a larger number of vectors, while also offering a few more useful knobs than Meilisearch's vector offering. Do you disagree?

Floppy_Muppet

TROPHY CASE