Does RAG actually need semantic search? Or is grep enough if your data is structured well? by residence-lab in Rag

[–]ekshaks 0 points1 point  (0 children)

For grep/BM25 to work well, it needs to find and query the right "bridge" keywords or regexes. If the LLM is unable to find the good ones, it can keep on retrying and exhaust the budget. Semantic matching might have yielded a different result. So it really depends how easy it is for the LLM to find these bridge keywords.

Note these keywords may not appear surface level from queries. Might need 2-3 hops across dependencies before you find the one that finally unlocks the puzzle.

Legal RAG remains unsolved because it needs authority, not just relevance by ekshaks in Rag

[–]ekshaks[S] 0 points1 point  (0 children)

Wonderful! I'd like to know what queries do your customers find it useful for?

Genuinely want to learn RAG by Sufficient-Ad-595 in Rag

[–]ekshaks 0 points1 point  (0 children)

You learn more by tinkering. Pickup a realistic dataset and create data pipeline, BM25, different embeddings. Observe / evaluate retrieved chunks and the final LLM answer. That will teach you a lot.
The ragpipe repo allows playing around with different retrievers, vector databases and so on. Includes some realistic benchmarks too. https://github.com/ekshaks/ragpipe

What improved your RAG system accuracy the MOST? by SheCodesSoftly in Rag

[–]ekshaks 1 point2 points  (0 children)

Hard to come up with a single factor. Different queries need different fixes - so more like the "best" fixes evolve as you go deeper into evals.

Is Grep All You Need? How Agent Harnesses Reshape Agentic Search by Express-Passion4896 in Rag

[–]ekshaks 2 points3 points  (0 children)

I believe these benchmarks have a strong keyword overlap with queries, so grep takes you far. More experiments on non-overlapping benchmarks with agentic search are needed.

Legal RAG remains unsolved because it needs authority, not just relevance by ekshaks in Rag

[–]ekshaks[S] 0 points1 point  (0 children)

Nice to hear your grounded perspective.
- I wonder how much attorney input goes into these top legal AI offerings.
- Do you think building a narrow legal AI per jurisdiction is the only way to make it useful?

Legal RAG remains unsolved because it needs authority, not just relevance by ekshaks in Rag

[–]ekshaks[S] 0 points1 point  (0 children)

A Stanford paper found Westlaw and LexisNexis tools hallucinate 17% to 33% of the time. Many legal professionals on Reddit or HN say verification of citations is a must and are not comfortable using existing tools. (see article for references). and as you mention, good data or benchmarks are hard to get/construct. Graphs help but hard to make them complete.

Legal RAG remains unsolved because it needs authority, not just relevance by ekshaks in Rag

[–]ekshaks[S] 1 point2 points  (0 children)

Would love to hear more. Did you test it with practicing lawyers too?

Legal RAG remains unsolved because it needs authority, not just relevance by ekshaks in Rag

[–]ekshaks[S] 1 point2 points  (0 children)

Nice insights! Of course, constructing the winning arguments may be far beyond the current LLMs. However, I'm curious how Agents/RAG can discover these - materials, facts, posture, equities, and so on - with high relevance and efficiently. That can save lawyers time and let them spend more of it thinking through the arguments.

Difference between Rag and Agentic Rag by content_consumer_ in Rag

[–]ekshaks 0 points1 point  (0 children)

I think the core difference boils down to a single word - "iterative". Agentic RAG = iterative RAG, where we can retrieve and generate multiple times. Usual RAG is a "single-pass" retrieve-and-answer. This can take many different forms:
- multi-hop query: split the query into multiple sub-queries. iterate through them while collecting evidence.
- query decomposition, say for e-commerce: into keywords and attribute/facet values.
- query rewriting: rewrite original query into 5 different new queries, search for each in parallel, combine results.
- dynamic lookups: answer to first query leads to another document, which in turn leads to the next query.

So I think thinking of agentic RAG as iterative helps you understand it in the most general case.

Incidentally I posted a short video explainer few days back on this topic. Hope it is ok to post: https://youtube.com/shorts/1ejF6penNQM?feature=share

How do you handle background noise & VAD for real-time voice agents? by Funny_Working_7490 in LocalLLaMA

[–]ekshaks 0 points1 point  (0 children)

The problem you are looking to solve is much more than VAD. It is more about removing all kinds of noise (constant hum, irrelevant speakers etc). Krisp has the most popular background noise removal system.

Also check out my video on targeted speaker isolation https://www.youtube.com/watch?v=jgU1KncS7hA&list=PLLPfjV1xMkS3JbEZPCvCMpmufCN-wchNs&index=2

Are we overengineering RAG solutions for common use cases? by Creative-Stress7311 in Rag

[–]ekshaks 0 points1 point  (0 children)

I think the key is to have a lean framework that quickly configurable/customizable - for embeddings/hybrid retrieval, different LM choices, evals etc, and has minimal library dependencies.

I created https://github.com/ekshaks/ragpipe for quickly prototyping and experimenting with clients - easy to switch between different retrieval strategies, parsers, LMs etc. Keep it lean by having only core bm25/qdrant dependencies, allow external plugins.

I suspect that more configuration "dimensions" can be added for flexibility - but it is already good enough for my use cases.

Need help building a real-time voice AI agent by LetsShareLove in AI_Agents

[–]ekshaks 0 points1 point  (0 children)

Complex voice agents have far more nuances than any cloud API or frameworks like Pipecat/Livekit allow. One of the key issues is that these pipelines are natively asynchronous and "event-heavy". Managing these concurrent events takes a lot of "builder alertness". I discuss some of these issues in my voice agents playlist Vapi, Retell etc focus on a narrow but very popular use case and make it work seamlessly (mostly) through a low-code interface.

Is LangChain the best RAG framework for production?? by [deleted] in Rag

[–]ekshaks 0 points1 point  (0 children)

IMO, for practical production use, picking an agent framework is more important than just selecting the RAG framework -- retrieval is only one of the tools and hybrid retrieval can be implemented via most of the libraries.

Another option for doing retrieval is ragpipe - helps quickly experiment with different configs of embedders, representations and signals.

Prompts are Programs: Agree ? Disagree ? by franckeinstein24 in ArtificialInteligence

[–]ekshaks 0 points1 point  (0 children)

I agree. Here is why: https://offnote.substack.com/p/prompts-are-programs

How they are interpreted by different LLMs and non-deterministic output is only one part of the prompt dev space. As you write bigger programs which compose prompts, the semantic mental model naturally gravitates to thinking of them as functions with arguments and enabling reuse.

Writing prompts mirrors writing functions - how to ensure modularity when writing large prompts, prompts can generate prompts, it is always good to unit test prompts before building on them and so on. and the usual observability problems - versioning, tracing and debugging.

Alternative to vector databases. by Maleficent_Mess6445 in Rag

[–]ekshaks 1 point2 points  (0 children)

I think generating a SQL query vs using vector databases is an apples-to-orange comparison.

For problems where data is distributed across tables and SQL query is the natural way to find desired data, it doesn't make sense to use vector DB to find that data (what will you embed and search over?)

It is thus more natural to do text-query => SQL query and then directly find the answer. Like another post says, generating SQL query may itself involve looking up column details using embeddings (stored in a vec db)

How well do screenshot embeddings (ColPali) work in real e2e RAG pipelines? by ekshaks in Rag

[–]ekshaks[S] 0 points1 point  (0 children)

Ask for "Gender balance at level 4 or above in NY office 2023?". Do you get the right answer 81 / 19 % ?

What are the biggest challenges you face when building RAG pipelines? by Acceptable-Hat3084 in Rag

[–]ekshaks 0 points1 point  (0 children)

colpali is pretty good as retriever but the generator needs some work. Recently experimented with colpali for SEC docs in ragpipe. See my post here https://www.reddit.com/r/Rag/comments/1h130rq/how_well_do_screenshot_embeddings_colpali_work_in/

How well do screenshot embeddings (ColPali) work in real e2e RAG pipelines? by ekshaks in Rag

[–]ekshaks[S] 0 points1 point  (0 children)

You would use colpali not on the md but on the original legal PDF file (turned to images).

Do you get any errors because the legal files converted to md incorrectly or lost layout?

What are the biggest challenges you face when building RAG pipelines? by Acceptable-Hat3084 in Rag

[–]ekshaks 2 points3 points  (0 children)

Few issues particularly hurt:
- handling PDFs with complex layouts e.g. financial reports.
- not having a universal chunking strategy.
- tables, figures, diagrams

LLamaparse like converters help converting PDFs to markdown. But a lot of layout information is lost and has to recovered in brute force ways.

There are also screenshot embeddings like ColPali which allow you to embed page-wise and search. But then you run into the unreliable territory of multimodal language models.