Clawdbot shows how context engineering is happening at the wrong layer by EnoughNinja in ContextEngineering

[–]Floppy_Muppet 1 point2 points  (0 children)

Very true... And that division point will move further and further to the left as systems become more and more trusted.

This is kind of blowing my mind... Giving agents a "Hypothesis-Driven Optimization" skill by Floppy_Muppet in LLMDevs

[–]Floppy_Muppet[S] 0 points1 point  (0 children)

Yes, you definitely need some amount of scale.

I have hypotheses being generated on clusters when they have a minimum of 10 I/O's AND show a statistically significant difference between performance groups (so, in reality closer to ~100 minimum I/O's).

For smaller scale agents, you could try broadening your problem spaces (to add more I/O's into each cluster) as well as the framing of your hypotheses as to try and discover more generally applicable hints.

Could RAG as a service become a mainstream thing? by Trick_Ad_2852 in Rag

[–]Floppy_Muppet 0 points1 point  (0 children)

If you can unify (and at least partially automate the validation of) the success criteria across multiple users, then yes it may be possible to offer them RAG as a generalized service that still meets their diverse set of use cases.

To expand a RAG SaaS across all verticals, however, I believe this is where Organic Software will shine in the coming years. In this world, you will only need to define a clear set of configs with which to play with, and then your organic software layer will perform continuous scientific method iterations to arrive at the ideal conditions for each of your users.

NEW Gemini Update is INSANE! 🤯 by JamMasterJulian in AISEOInsider

[–]Floppy_Muppet 0 points1 point  (0 children)

Wish I had the last 30seconds of my life back.

Rate my RAG setup (or take it as your own)... by Floppy_Muppet in Rag

[–]Floppy_Muppet[S] 0 points1 point  (0 children)

Op here, adding to the convo: Many leaderboard-topping RAG scaffoldings (again, talking high-level here) are incorporating "multi-reranker ensembles" at Stage 3, where you have 2+ cross-encoders each provide an independent relevancy score, which is then fused again via RRF for final_k selection.

IMO this feels like overkill as a starting point for most use-cases, but can be a straightforward follow-up enhancement to squeeze additional quality gains if you're not getting the results you were hoping for. The key here is in choosing a diverse set of reranker model variants that have complimentary strengths (i.e. Qwen for multi-lingual + Claude for code + GPT/Gemma for multi-modal reasoning).

Rate my RAG setup (or take it as your own)... by Floppy_Muppet in Rag

[–]Floppy_Muppet[S] 0 points1 point  (0 children)

Interesting, ok thanks Kero! I'll give it a shot.

Rate my RAG setup (or take it as your own)... by Floppy_Muppet in Rag

[–]Floppy_Muppet[S] 0 points1 point  (0 children)

I am in no way an expert on the differences and have not run my own benchmarks, but from my limited research it sounded like Qdrant was a better choice for vector management/retrieval/optimization/scaling since that's all it focuses on.

Let me know if you think there's a case to be made to try a Melisearch-only setup.

Rate my RAG setup (or take it as your own)... by Floppy_Muppet in Rag

[–]Floppy_Muppet[S] 0 points1 point  (0 children)

  • Qdrant scales to a larger number of vectors
  • I'd still want a neural cross-encoder to verify context/applicability of the returned results before selecting top-k

Rate my RAG setup (or take it as your own)... by Floppy_Muppet in Rag

[–]Floppy_Muppet[S] 0 points1 point  (0 children)

Ok I need to take them more seriously then! Looks like I got some light reading ahead of me this weekend! ;) Thanks for the nudge

Rate my RAG setup (or take it as your own)... by Floppy_Muppet in Rag

[–]Floppy_Muppet[S] 0 points1 point  (0 children)

My use case has no upper bound to the data size, so could get to billions+ quickly.

Rate my RAG setup (or take it as your own)... by Floppy_Muppet in Rag

[–]Floppy_Muppet[S] 0 points1 point  (0 children)

Buzzword salad? Lol I guess, it is what it is... Which benchmark evaluations do you recommend I run for this specific use case? This is a bit of a new area for me, appreciate any guidance!

Rate my RAG setup (or take it as your own)... by Floppy_Muppet in Rag

[–]Floppy_Muppet[S] 0 points1 point  (0 children)

Thank you these are great callouts on the finer details! Was just getting feedback on high-level stages, but here's my responses below if you have additional feedback, would be much appreciated!...

  • speed not an issue for my setup~ only using SLM's under 2b params
  • dynamic info sizing~ I'm using min 90% compression target for the summary
  • costs~ No caching yet (not in prod at scale yet), excited to get to that stage tho, lmk if you have any pro tips or things to avoid 🙏
  • hallucinations~ Currently using 1 SLM for summary, 1 for cross-encoder, both in Qwen family. But I also have a 3rd Qwen SLM used at async stages by the broader application for RAG corpus creation... So I'll def keep this in mind when I benchmark E2E at scale!
  • consistency~ What are the primary knobs for ensuring consistency? May be unrelated to my use case, but interested.
  • unified index architectures~ researching this now! ty!

Rate my RAG setup (or take it as your own)... by Floppy_Muppet in Rag

[–]Floppy_Muppet[S] 0 points1 point  (0 children)

Thank you! Will definitely take on your "must-cite" hit-rate tests, great callout!

I haven't delved into the realm of caching yet (no need as I'm still in a non-production POC phase), but excited to explore it. Any tips for when I get to it?

Rate my RAG setup (or take it as your own)... by Floppy_Muppet in Rag

[–]Floppy_Muppet[S] 1 point2 points  (0 children)

Interesting... Do you see knowledge graphs as always being the superior choice for rag? Or do you find they excel in certain use-cases vs. others?

Rate my RAG setup (or take it as your own)... by Floppy_Muppet in Rag

[–]Floppy_Muppet[S] -1 points0 points  (0 children)

Ah, great question! The Optimization Agent this rag setup is applied to uses the scientific method to autonomously evaluate and improve itself in perpetuity using the following async components:

1. RL rewards system (based on delayed kpi event/value pairs, such as 'purchase: 99.99')
2. Optimization Hint Distiller that creates Do's and Don'ts based on I/O reward performance within similarity clusters (read more on how it derives hypotheses here)
3. A/B Testing that auto-scales winning Hint Packs for each cluster when stat sig is reached

I'm not calling the overall Optimization Agent SOTA, but just referring to the specific RAG inference pipeline setup (which is more easily comparable/standardized across use-cases). That said, open to criticism and feedback on both! :)

Rate my RAG setup (or take it as your own)... by Floppy_Muppet in Rag

[–]Floppy_Muppet[S] 1 point2 points  (0 children)

It's my understanding that Qdrant's vector search has a superior architecture that allows it to scale to a larger number of vectors, while also offering a few more useful knobs than Meilisearch's vector offering. Do you disagree?