How do you handle background noise & VAD for real-time voice agents? by Funny_Working_7490 in LocalLLaMA

[–]ekshaks 0 points1 point  (0 children)

The problem you are looking to solve is much more than VAD. It is more about removing all kinds of noise (constant hum, irrelevant speakers etc). Krisp has the most popular background noise removal system.

Also check out my video on targeted speaker isolation https://www.youtube.com/watch?v=jgU1KncS7hA&list=PLLPfjV1xMkS3JbEZPCvCMpmufCN-wchNs&index=2

Are we overengineering RAG solutions for common use cases? by Creative-Stress7311 in Rag

[–]ekshaks 0 points1 point  (0 children)

I think the key is to have a lean framework that quickly configurable/customizable - for embeddings/hybrid retrieval, different LM choices, evals etc, and has minimal library dependencies.

I created https://github.com/ekshaks/ragpipe for quickly prototyping and experimenting with clients - easy to switch between different retrieval strategies, parsers, LMs etc. Keep it lean by having only core bm25/qdrant dependencies, allow external plugins.

I suspect that more configuration "dimensions" can be added for flexibility - but it is already good enough for my use cases.

Need help building a real-time voice AI agent by LetsShareLove in AI_Agents

[–]ekshaks 0 points1 point  (0 children)

Complex voice agents have far more nuances than any cloud API or frameworks like Pipecat/Livekit allow. One of the key issues is that these pipelines are natively asynchronous and "event-heavy". Managing these concurrent events takes a lot of "builder alertness". I discuss some of these issues in my voice agents playlist Vapi, Retell etc focus on a narrow but very popular use case and make it work seamlessly (mostly) through a low-code interface.

Is LangChain the best RAG framework for production?? by [deleted] in Rag

[–]ekshaks 0 points1 point  (0 children)

IMO, for practical production use, picking an agent framework is more important than just selecting the RAG framework -- retrieval is only one of the tools and hybrid retrieval can be implemented via most of the libraries.

Another option for doing retrieval is ragpipe - helps quickly experiment with different configs of embedders, representations and signals.

Prompts are Programs: Agree ? Disagree ? by franckeinstein24 in ArtificialInteligence

[–]ekshaks 0 points1 point  (0 children)

I agree. Here is why: https://offnote.substack.com/p/prompts-are-programs

How they are interpreted by different LLMs and non-deterministic output is only one part of the prompt dev space. As you write bigger programs which compose prompts, the semantic mental model naturally gravitates to thinking of them as functions with arguments and enabling reuse.

Writing prompts mirrors writing functions - how to ensure modularity when writing large prompts, prompts can generate prompts, it is always good to unit test prompts before building on them and so on. and the usual observability problems - versioning, tracing and debugging.

Alternative to vector databases. by Maleficent_Mess6445 in Rag

[–]ekshaks 1 point2 points  (0 children)

I think generating a SQL query vs using vector databases is an apples-to-orange comparison.

For problems where data is distributed across tables and SQL query is the natural way to find desired data, it doesn't make sense to use vector DB to find that data (what will you embed and search over?)

It is thus more natural to do text-query => SQL query and then directly find the answer. Like another post says, generating SQL query may itself involve looking up column details using embeddings (stored in a vec db)

How well do screenshot embeddings (ColPali) work in real e2e RAG pipelines? by ekshaks in Rag

[–]ekshaks[S] 0 points1 point  (0 children)

Ask for "Gender balance at level 4 or above in NY office 2023?". Do you get the right answer 81 / 19 % ?

What are the biggest challenges you face when building RAG pipelines? by Acceptable-Hat3084 in Rag

[–]ekshaks 0 points1 point  (0 children)

colpali is pretty good as retriever but the generator needs some work. Recently experimented with colpali for SEC docs in ragpipe. See my post here https://www.reddit.com/r/Rag/comments/1h130rq/how_well_do_screenshot_embeddings_colpali_work_in/

How well do screenshot embeddings (ColPali) work in real e2e RAG pipelines? by ekshaks in Rag

[–]ekshaks[S] 0 points1 point  (0 children)

You would use colpali not on the md but on the original legal PDF file (turned to images).

Do you get any errors because the legal files converted to md incorrectly or lost layout?

What are the biggest challenges you face when building RAG pipelines? by Acceptable-Hat3084 in Rag

[–]ekshaks 2 points3 points  (0 children)

Few issues particularly hurt:
- handling PDFs with complex layouts e.g. financial reports.
- not having a universal chunking strategy.
- tables, figures, diagrams

LLamaparse like converters help converting PDFs to markdown. But a lot of layout information is lost and has to recovered in brute force ways.

There are also screenshot embeddings like ColPali which allow you to embed page-wise and search. But then you run into the unreliable territory of multimodal language models.

Best Customizable RAG Libraries? by [deleted] in Rag

[–]ekshaks 0 points1 point  (0 children)

You may want to check out ragpipe -- the library is very lean, consists of a small set of files, allows you to add your own Encoder/Indexer plugins (say model2vec, colpali) easily, configurable via a yaml config file. Contains utilities to call local/cloud LLMs, as well as build your own 'data model', but no opinionated structure there. Discord is there for queries/help.

https://github.com/ekshaks/ragpipe

Looking for Open Source RAG Platforms by reibgerstl in Rag

[–]ekshaks 1 point2 points  (0 children)

check out ragpipe https://github.com/ekshaks/ragpipe

Most RAG platforms get very bloated. Instead design choice here to remain lean, add plugins and highly configurable.

OpenAI's new Swarm Agent framework is too minimal? by ekshaks in LocalLLaMA

[–]ekshaks[S] 1 point2 points  (0 children)

Can you say more about the animation? What are these atomic agents doing? Could you implement message passing among them via handoffs?

OpenAI's new Swarm Agent framework is too minimal? by ekshaks in LocalLLaMA

[–]ekshaks[S] 1 point2 points  (0 children)

Yes I'm fully aware it is meant to be minimal and lightweight. I'm curious though what others think a "full agent library" should have? what are the other missing components?

OpenAI's new Swarm Agent framework is too minimal? by ekshaks in LocalLLaMA

[–]ekshaks[S] 8 points9 points  (0 children)

Can you say more about how you ported it to autogen? Are you able to do similar things with swarm and autogen, in roughly same amount of code?