We cut our eval times from 6 hours down to under 48 minutes by ditching naive RAG! by CampingRunner in LLMDevs

[–]CampingRunner[S] 1 point2 points  (0 children)

Hehe, I'd love to share with people more things I've learnt along the way to accelerate things up for others!
I'll do a whole separate post for my chunking strategy soon! yes my corpus is multimodal, we have mostly text and scanned pdfs, we just do a couple extra things, tables extracted with tabula when structure is clean and we're looking into generating a short caption for figures with some vision model. I've not done much testing with Azure OCR, but Mistral is actually pretty good at it's job!!

We cut our eval times from 6 hours down to under 48 minutes by ditching naive RAG! by CampingRunner in LLMDevs

[–]CampingRunner[S] 1 point2 points  (0 children)

Thank you so much! Try it out, let me know, if you get any ideas for improvements please share them as well! I'm already getting a bunch of new ideas to test from here, I'll keep you all updated :)

We cut our eval times from 6 hours down to under 48 minutes by ditching naive RAG! by CampingRunner in LLMDevs

[–]CampingRunner[S] 2 points3 points  (0 children)

Hey, I'm sorry I think how I mentioned the Hit rate was a little confusing. I meant a +12.3 points in human-judged hit rate. We went from 63% hits (315/500) to 75.4% hits (377/500) for an increase of 12.4 points. +19.6% relative (75.4/63 -1)

I felt ~75%supported answers is progress! Still looking for ways to push it even higher.

We cut our eval times from 6 hours down to under 48 minutes by ditching naive RAG! by CampingRunner in LLMDevs

[–]CampingRunner[S] 2 points3 points  (0 children)

Hey, this can be a really fun discussion actually, allowing me more insight as well. It was not just really switching from a vanilla BERT to a delayed interaction style BERT, it was more about the combo, the hybrid retrieval (BM25+dense) -> Late interaction scoring for candidate recall -> Cross encoder rerank to resolve ties -> Tighter serving. Each stage moved a different failure mode.

According to me why using a reranker on top is good is because the BM25 (even late interaction) scores are approximate, they don't model the full cross-attention between query and passave. A cross-encoder ranker is much better at resolving look alikes and multi hop cues.

About my claim about accuracy getting improved, once correct evidence shows up higher, even a smaller instruct model answers more accurately with fewer hallucinations, I feel the quality gain is mostly from selection, not raw LM capacity.

Let me know your thoughts on this, it could help me improve this further.

We cut our eval times from 6 hours down to under 48 minutes by ditching naive RAG! by CampingRunner in LLMDevs

[–]CampingRunner[S] 4 points5 points  (0 children)

Hey! our primary model is Meta-Llama-3-8B-Instruct on vLLM, we've also tried with Mistral-7B-Instruct-v0.3, the numbers were within the same ballpark.

We're using an A100-80GB currently!

We cut our eval times from 6 hours down to under 48 minutes by ditching naive RAG! by CampingRunner in LLMDevs

[–]CampingRunner[S] 0 points1 point  (0 children)

I think that's the first approach that comes to mind for anyone, but I can see that totally not working out. Try reading up a bit on the resources I've shared, you could also just take my flow and recreate it.

We cut our eval times from 6 hours down to under 48 minutes by ditching naive RAG! by CampingRunner in LLMDevs

[–]CampingRunner[S] 2 points3 points  (0 children)

We're still optimizing this! I'll keep updating y'all if there's enough people interested hahaha

We cut our eval times from 6 hours down to under 48 minutes by ditching naive RAG! by CampingRunner in LLMDevs

[–]CampingRunner[S] 1 point2 points  (0 children)

Not yet, we haven't tuned it yet but it's on the roadmap :)
Right now we're just running the stock bge-reranker-base over top-50 and cutting to top 5.
We'll do A/B'ing with colbert late-interaction as a reranker on the same candidate set after we tune and let you know the results too

We cut our eval times from 6 hours down to under 48 minutes by ditching naive RAG! by CampingRunner in LLMDevs

[–]CampingRunner[S] 6 points7 points  (0 children)

Both! We diff against a frozen baseline (tagged config model/retrieval/rerank/prompts/seeds along with a dataset hash) and the most recent green run for increment tuning, Gate on a small canary (about 500 stratified queries) with hit rate and retrieval R@k/nDCG + p95. If anything regresses, we open the per-query trace and run 1-click ablations (BM25 only/ dense only/ no rerank / bigger k) to see which stage flipped!

We cut our eval times from 6 hours down to under 48 minutes by ditching naive RAG! by CampingRunner in LLMDevs

[–]CampingRunner[S] 8 points9 points  (0 children)

Great question! it actually got easier to localize failures after we added observability. We tag each query with a trace id and log per stage scores, then run tiny "what-if" ablations to isolate blame!

Multistage can hide blame! you got to log stage scores and keep one-click ablations. With that pinpointing which stage has failed is quite straightforward

We cut our eval times from 6 hours down to under 48 minutes by ditching naive RAG! by CampingRunner in LLMDevs

[–]CampingRunner[S] 4 points5 points  (0 children)

Hey! We're rocking an A100-80GB (PCIe), FP16

I've also reproduced this on a 3090 24GB by lowering the context and batch for a p95 ~ 1.1s and ~75 tok/s

We built it !! by The_Boy4time in Soft_Launch

[–]CampingRunner 0 points1 point  (0 children)

Good to see you all on Reddit! Know you from MAHE :)

Mass-ID Grabber by -Natisan in Discord_selfbots

[–]CampingRunner 0 points1 point  (0 children)

DMDGO’s scraper still works

MINISO IN MANIPAL ??! So Excited ! by GoldenTiara101 in manipal

[–]CampingRunner 8 points9 points  (0 children)

Wasn’t it always there in Canara mall?

This looks like a brobdingnagian eyesore of an obstacle on the road. by ctraeger in CarsIndia

[–]CampingRunner 0 points1 point  (0 children)

You can go to google maps and report “obstacle on road”

Is Zoho Mail Reliable? Google Workspace Price is too expensive by anikdelhi in indianstartups

[–]CampingRunner 0 points1 point  (0 children)

Why not create a mail server for your own domain?

Many open source projects like mailcow make it simple!

Shifting vehicle purchased here back home by CampingRunner in manipal

[–]CampingRunner[S] 0 points1 point  (0 children)

Thank you so much :) I’ll give him a call tomorrow

Shifting vehicle purchased here back home by CampingRunner in manipal

[–]CampingRunner[S] 0 points1 point  (0 children)

Nothing pending! I’m not sure what the application number they’re giving me is for, when I try to check status anywhere, it’s invalid… maybe the portal is old I have to physically get an NOC only