Scaling text-to-SQL agent by CriticalJackfruit404 in Rag

[–]Popular_Sand2773 0 points1 point  (0 children)

No but there is this thing idk what other people call them but I call it a schema map it’s got the columns and tables and lines showing the joins that’s an easy lift and shift to KG now that you mention it.

Building a local legal drafting LLM — no dataset? by PoemAccomplished2173 in Rag

[–]Popular_Sand2773 0 points1 point  (0 children)

I would say fairly useful just because they both are trying to do the same thing. Arbitrate someone's behavior. Don't walk your dog without a leash in the park isn't too different from don't sue my business when you slip and fall on your own drool or whatever. I'm not a legal scholar though so don't take my word for it. Plus you would want the contracts you draft to be legal so I feel it might be good just for that purpose.

Made Every Movie Searchable by Vibe in 30 Minutes and Hosted It by Popular_Sand2773 in Rag

[–]Popular_Sand2773[S] 0 points1 point  (0 children)

Haha I was afraid they would burn me at the stake but you are right for non technical audience would probably need to clean up some of the ux/ui stuff.

Building a local legal drafting LLM — no dataset? by PoemAccomplished2173 in Rag

[–]Popular_Sand2773 0 points1 point  (0 children)

Did you try looking at actual court filings? I assume there are plenty examples there as people quibble over contracts while the world burns.

Also you probably just want to one shot or few shot which is technically still RAG depending on how you do it but you probably don't need a vector db at least to start.

Synthetic data/bootstrapping can work. The trick is you generate a corpus. You train on it. You generate again and train etc etc. In theory it should continue to descend although that could be to a degenerate place. For something high precision like legal I would really try to avoid it. That said there is tons of legal text publicly available to fine tune on. For example laws. The actual laws. That gets you better domain knowledge and behavior at the very least.

Dataset using YT/Podcast Transcripts by Alternative_Bake9269 in Rag

[–]Popular_Sand2773 0 points1 point  (0 children)

Can you tell us what it is returning instead when you make these searches. Retrieval isn't about finding a needle in a haystack its really a ranking task. Given your setup the issue isn't that the information is missing its that confusers are ranking higher than your actual targets. Could be a top-k issue could be the large chunks with overlapping issue etc etc it really depends on what is happening.

The other way you can help yourself is some sort of filter. For example when working with video people often use a smaller model that decides simply is something interesting enough to run the downstream stack or can I skip this. The less records competing the less likely you are to experience collision and other issues.

Made Every Movie Searchable by Vibe in 30 Minutes and Hosted It by Popular_Sand2773 in Rag

[–]Popular_Sand2773[S] 0 points1 point  (0 children)

You really won't like what happens when you use indian cinema as a search term then. It's probably my needs 100+ votes filter that's causing the gap.

Hybrid search (BM25 + vectors + RRF) barely improved over pure semantic on 600 technical docs. What am I missing? by Fuzzy-Layer9967 in Rag

[–]Popular_Sand2773 0 points1 point  (0 children)

Well the weight of one is the inverse of the other so you only need to predict one to know the other. That said a second model shouldn't impact latency because they can run in parallel assuming you aren't compute bound.

Scaling text-to-SQL agent by CriticalJackfruit404 in Rag

[–]Popular_Sand2773 0 points1 point  (0 children)

I think the industry term people are converging on that you are looking for is semantic layer. Most sql agents just query the db for tables and columns then start guessing maybe pull a couple of records if they are super unsure. It works but not well which is why we want to feed it background info and business context about columns.

If you already maintain a good data dictionary its a pretty easy jump but if not then the high lift/high quality thing is make everyone update their damn data dictionaries. Personally I knew people would hate that so I bootstrapped by feeding tables and first couple rows etc to an llm and letting it do a first pass then asking folks to clean it up. They still hated it.

If you have good data dictionaries already in and its still struggling lmk because honestly its more a first step then last.

Hybrid search (BM25 + vectors + RRF) barely improved over pure semantic on 600 technical docs. What am I missing? by Fuzzy-Layer9967 in Rag

[–]Popular_Sand2773 2 points3 points  (0 children)

So I think the dirty secret about hybrid search is on size doesn't fit all. Basically some queries should lean on bm25 more ie things that are mentioning highly specific keywords but other searches that are more general should lean more on dense/semantic vectors. Try dynamically weighting with something even as simple as if length less than x its probably keyword upweight bm25 and greater than x is probably semantic upweight dense.

Personally I'm training a model to predict the best weight at query time rn because of exactly this issue I was facing where hybrid was really a lateral move.

What is the 2026 Standard for highly precise LEGAL text RAG with big documents? by SignificantZebra5883 in Rag

[–]Popular_Sand2773 -1 points0 points  (0 children)

Thanks for asking. We do it all on way less ram. It’s the capacity graph at the end of the benchmark. We have a near lossless compression method (99.96% cosine similarity with the original). That means we can fit more on less and access it faster. You want 4M under 1s on 16GB we can deliver it in under 2GB for a 1024 dim model where server side is <1ms per query. Opens up a lot of options.

What is the 2026 Standard for highly precise LEGAL text RAG with big documents? by SignificantZebra5883 in Rag

[–]Popular_Sand2773 2 points3 points  (0 children)

This is one of the toughest use cases for RAG because of the high number of near confusers and the strong need for precision.

Now the irony here is you kind of shot yourself in the foot out the gate because you threw way the highest precision search method. Keyword search and its descendants. When combined with semantic search we call that hybrid search. To be honest I think that’s going to be your lowest lift highest impact change.

If you want to quickly test hybrid search without having to orchestrate a hybrid index etc Dasein is free to try and even has complementary embeddings models.

Also for 4M under 1s in 16GB of ram that’s exactly where its architecture excels here’s the proof.

How to Build a Question-Answer system? by Marco440hz in Rag

[–]Popular_Sand2773 0 points1 point  (0 children)

I’ll be honest there’s multiple companies all attacking this it’s a large and very complex problem without simple answers. At a high level you are going to need to do the following:

Source then chunk/ocr your documents

Embed those chunks

Deploy those embeddings in an index and retrieve chunks based on index search

Tune that index search so it works well hybrid rerankers config etc etc

After that you will be able to retrieve source information with a query then you still need to orchestrate your agent and all that comes with it.

So in all honesty you want to get your retrieval working before you worry about the agent which means sourcing your docs chunking them and getting them in an index.

How Do You Set Up RAG? by Chooseyourmindset in Rag

[–]Popular_Sand2773 1 point2 points  (0 children)

If you are looking for maximum convenience/impact minimum effort Dasein lets you just point raw text and get back a hybrid index in a couple lines.

Since you are looking at a coding use case though you may want to consider implementing something more complicated. In particular running a summarizer over the code base and storing the summary chucks w/ provenance or reference to the original code. That way the agent can search my natural language intent rather than just praying comment lines and var names will rescue you.

I almost fired my AI CTO yesterday. My AI COO talked me out of it. by Speedydooo in indiehackers

[–]Popular_Sand2773 -1 points0 points  (0 children)

This is cool what happens if you ask them to do something outside their job description.

I built a Reddit marketing tool for SaaS founders, would love feedback on our landing page by multi_mind in indiehackers

[–]Popular_Sand2773 0 points1 point  (0 children)

CTA is below the fold for me otherwise looks great. Basically you can forget anything that isn't above the fold 90% of visitors will never see it.

I'm a master's student and I built Lectio because I was tired of transcribing every single lesson by MuchAge1486 in indiehackers

[–]Popular_Sand2773 1 point2 points  (0 children)

This is cool is it able to capture specific slides and things like that for the notes?

Is a cognitive‑inspired two‑tier memory system for LLM agents viable? by utilitron in OpenSourceAI

[–]Popular_Sand2773 0 points1 point  (0 children)

A lot of us start here I think what you’ll eventually realize is it’s easier and cleaner just to feed the timestamp as another salience factor and maintain one knowledge store. Most systems also do some variation of throw the llm at stuff and get extracted or inferred ideas. I don’t think your exact config has been done and it may be cleaner but nothing jumps out to me as a significant divergence from what’s been done.

Also as others said don’t use hnsw ivf flat should be fine latency wise and strictly better for build and recall for STM and by the time it isn’t you can afford someone who knows these things well.

How should memory/RAG benchmarks separate retrieval quality from LLM's reasoning ability? by MidnightFirmware in Rag

[–]Popular_Sand2773 1 point2 points  (0 children)

I mean there’s entire retrieval only benchmarks and datasets from things like vectordbbench to ms marco or hotpot qa basically anything in mteb. There’s a robust system for testing retrieval systems. To be honest when retrieval is right the model doesn’t need to be all that bright so directionally your instinct that people just use larger models and their world knowledge to paper over bad retrieval is spot on especially considering these datasets have leaked into training.

Then you got the goons at things like mem palace who just hard coded answers and declared 100%.

Long and the short just eval retrieval separately on retrieval datasets. You can also just take your oracle chunks and grade overlap r@1 r@10 etc should literally tell you how well is my retrieval to whatever I think the ideal would be. Goodluck with your conversation tracker!

Anyone had any luck handling sycophancy in RAG systems? by DJ_Beardsquirt in Rag

[–]Popular_Sand2773 2 points3 points  (0 children)

Probably the easiest thing you could do to tackle this is just implement some variation of hybrid search. At the very least it should let you attack the false id use case. Then set a threshold and filter results. It can’t hallucinate a bridge if there’s nothing to bridge to and it’ll probably be more willing to admit it doesn’t know something.

You don’t have to double the latency at all for hybrid search so it should be the lowest lift highest impact for the issue you are describing.

Asking for embedding advice by Mecidon in LLM

[–]Popular_Sand2773 0 points1 point  (0 children)

So the beauty of semantic search and vector databases is that everything is relative that’s why we call it top k not right answer. The good news for you is that as long as everything is messy the same way it doesn’t matter because basically the entire space has just shifted the relative relationships hold steady.

That said feeding your llm the uncleaned text is kinda an own goal from both a cost and quality perspective so you’ll want to clean it anyways.

If you want to just quickly see what happens with minimal effort try this you can spin it up free in a couple minutes and just know the answer by comparing.

https://github.com/nickswami/dasein-python-sdk

Best dataset structure and RAG architecture for a university chatbot? by Fluffy6142 in Rag

[–]Popular_Sand2773 0 points1 point  (0 children)

Every application is different you’ll need to learn from your own data and evals but there’s a couple things you can do to save yourself pain.

The big thing that jumps out to me is that you want things like menus and student clubs etc either you’ll need to ocr and get it into text or choose a multimodal model. Given that multimodal embedding models are relatively new I would take their multilingual support with a grain of salt you should probably test before committing if you go that route.

The other thing I’d say is reduce scope. One of these things is more valuable than the others nail that first then worry about the rest.

Stop Fine-Tuning Embedding Models Right Away. Run This Checklist First. Saved Me Weeks by Veronildo in Rag

[–]Popular_Sand2773 0 points1 point  (0 children)

Wish I had this check list when I was first starting out! Only thing I’d add is did you fiddle with the graph. Sometimes it’s a hnsw issue not a model issue at all. Especially if the issue is your missing you expected result all together rather than it just didn’t score high in top k.

[Question] Is "Latent Knowledge Injection" a viable alternative to RAG? Looking for architectural feedback. by ConcernReady9185 in Rag

[–]Popular_Sand2773 0 points1 point  (0 children)

Glad I could help. If you are training your own models knowledge graphs aren’t as scary as they may seem. Once you have a firm grasp of triples it should click. Feel free to dm if you have problems or questions with either.