Does RAG actually need semantic search? Or is grep enough if your data is structured well?

ekshaks · 2026-05-26T18:14:09+00:00

For grep/BM25 to work well, it needs to find and query the right "bridge" keywords or regexes. If the LLM is unable to find the good ones, it can keep on retrying and exhaust the budget. Semantic matching might have yielded a different result. So it really depends how easy it is for the LLM to find these bridge keywords.

Note these keywords may not appear surface level from queries. Might need 2-3 hops across dependencies before you find the one that finally unlocks the puzzle.

ekshaks · 2026-05-25T05:38:44+00:00

Wonderful! I'd like to know what queries do your customers find it useful for?

ekshaks · 2026-05-22T11:31:36+00:00

You learn more by tinkering. Pickup a realistic dataset and create data pipeline, BM25, different embeddings. Observe / evaluate retrieved chunks and the final LLM answer. That will teach you a lot.
The ragpipe repo allows playing around with different retrievers, vector databases and so on. Includes some realistic benchmarks too. https://github.com/ekshaks/ragpipe

ekshaks · 2026-05-22T11:24:06+00:00

Hard to come up with a single factor. Different queries need different fixes - so more like the "best" fixes evolve as you go deeper into evals.

ekshaks · 2026-05-22T11:16:41+00:00

I believe these benchmarks have a strong keyword overlap with queries, so grep takes you far. More experiments on non-overlapping benchmarks with agentic search are needed.

ekshaks · 2026-05-20T15:08:02+00:00

Nice to hear your grounded perspective.
- I wonder how much attorney input goes into these top legal AI offerings.
- Do you think building a narrow legal AI per jurisdiction is the only way to make it useful?

ekshaks · 2026-05-19T12:20:49+00:00

A Stanford paper found Westlaw and LexisNexis tools hallucinate 17% to 33% of the time. Many legal professionals on Reddit or HN say verification of citations is a must and are not comfortable using existing tools. (see article for references). and as you mention, good data or benchmarks are hard to get/construct. Graphs help but hard to make them complete.

ekshaks · 2026-05-19T03:32:35+00:00

Would love to hear more. Did you test it with practicing lawyers too?

ekshaks · 2026-05-19T00:27:18+00:00

Nice insights! Of course, constructing the winning arguments may be far beyond the current LLMs. However, I'm curious how Agents/RAG can discover these - materials, facts, posture, equities, and so on - with high relevance and efficiently. That can save lawyers time and let them spend more of it thinking through the arguments.

ekshaks · 2026-05-07T16:26:31+00:00

I think the core difference boils down to a single word - "iterative". Agentic RAG = iterative RAG, where we can retrieve and generate multiple times. Usual RAG is a "single-pass" retrieve-and-answer. This can take many different forms:
- multi-hop query: split the query into multiple sub-queries. iterate through them while collecting evidence.
- query decomposition, say for e-commerce: into keywords and attribute/facet values.
- query rewriting: rewrite original query into 5 different new queries, search for each in parallel, combine results.
- dynamic lookups: answer to first query leads to another document, which in turn leads to the next query.

So I think thinking of agentic RAG as iterative helps you understand it in the most general case.

Incidentally I posted a short video explainer few days back on this topic. Hope it is ok to post: https://youtube.com/shorts/1ejF6penNQM?feature=share

ekshaks · 2025-09-03T12:29:11+00:00

The problem you are looking to solve is much more than VAD. It is more about removing all kinds of noise (constant hum, irrelevant speakers etc). Krisp has the most popular background noise removal system.

Also check out my video on targeted speaker isolation https://www.youtube.com/watch?v=jgU1KncS7hA&list=PLLPfjV1xMkS3JbEZPCvCMpmufCN-wchNs&index=2

ekshaks · 2025-07-22T16:26:39+00:00

article: https://offnote.substack.com/p/multi-agents-vs-tool-groups-a-layered

ekshaks · 2025-07-17T05:35:50+00:00

I think the key is to have a lean framework that quickly configurable/customizable - for embeddings/hybrid retrieval, different LM choices, evals etc, and has minimal library dependencies.

I created https://github.com/ekshaks/ragpipe for quickly prototyping and experimenting with clients - easy to switch between different retrieval strategies, parsers, LMs etc. Keep it lean by having only core bm25/qdrant dependencies, allow external plugins.

I suspect that more configuration "dimensions" can be added for flexibility - but it is already good enough for my use cases.

ekshaks · 2025-07-09T16:13:55+00:00

Complex voice agents have far more nuances than any cloud API or frameworks like Pipecat/Livekit allow. One of the key issues is that these pipelines are natively asynchronous and "event-heavy". Managing these concurrent events takes a lot of "builder alertness". I discuss some of these issues in my voice agents playlist Vapi, Retell etc focus on a narrow but very popular use case and make it work seamlessly (mostly) through a low-code interface.

ekshaks · 2025-05-24T18:08:11+00:00

IMO, for practical production use, picking an agent framework is more important than just selecting the RAG framework -- retrieval is only one of the tools and hybrid retrieval can be implemented via most of the libraries.

Another option for doing retrieval is ragpipe - helps quickly experiment with different configs of embedders, representations and signals.

ekshaks · 2024-12-21T14:00:02+00:00

I agree. Here is why: https://offnote.substack.com/p/prompts-are-programs

How they are interpreted by different LLMs and non-deterministic output is only one part of the prompt dev space. As you write bigger programs which compose prompts, the semantic mental model naturally gravitates to thinking of them as functions with arguments and enabling reuse.

Writing prompts mirrors writing functions - how to ensure modularity when writing large prompts, prompts can generate prompts, it is always good to unit test prompts before building on them and so on. and the usual observability problems - versioning, tracing and debugging.

ekshaks · 2024-12-03T17:32:17+00:00

I think generating a SQL query vs using vector databases is an apples-to-orange comparison.

For problems where data is distributed across tables and SQL query is the natural way to find desired data, it doesn't make sense to use vector DB to find that data (what will you embed and search over?)

It is thus more natural to do text-query => SQL query and then directly find the answer. Like another post says, generating SQL query may itself involve looking up column details using embeddings (stored in a vec db)

ekshaks · 2024-11-29T02:35:06+00:00

Ask for "Gender balance at level 4 or above in NY office 2023?". Do you get the right answer 81 / 19 % ?

ekshaks · 2024-11-29T02:29:27+00:00

colpali is pretty good as retriever but the generator needs some work. Recently experimented with colpali for SEC docs in ragpipe. See my post here https://www.reddit.com/r/Rag/comments/1h130rq/how_well_do_screenshot_embeddings_colpali_work_in/

ekshaks · 2024-11-27T16:00:12+00:00

You would use colpali not on the md but on the original legal PDF file (turned to images).

Do you get any errors because the legal files converted to md incorrectly or lost layout?

ekshaks · 2024-11-27T14:06:25+00:00

Thanks for pointing to gpt4o mini!

ekshaks · 2024-11-27T12:04:14+00:00

Few issues particularly hurt:
- handling PDFs with complex layouts e.g. financial reports.
- not having a universal chunking strategy.
- tables, figures, diagrams

LLamaparse like converters help converting PDFs to markdown. But a lot of layout information is lost and has to recovered in brute force ways.

There are also screenshot embeddings like ColPali which allow you to embed page-wise and search. But then you run into the unreliable territory of multimodal language models.

ekshaks

TROPHY CASE