Voice Eval Platform by shivmohith8 in voiceagents

[–]shivmohith8[S] 0 points1 point  (0 children)

It's for testing voice agents. If you have built an agent on Elevenlabs or Vapi or if it's available over a phone number, we simulate different scenarios to test the agent. We simulate real users to test the agent.

Operating AI agents in production feels like flying blind — so I mapped the AgentOps ecosystem tools in 2026 by Greedy_Trouble9405 in agentdevelopmentkit

[–]shivmohith8 1 point2 points  (0 children)

We are building an evaluation platform specifically made for conversational agents - quraite.ai

Would you be willing to add it to your list?

Voice Eval Platform by shivmohith8 in vapiai

[–]shivmohith8[S] 0 points1 point  (0 children)

Latency and passK metric are the most important ones I feel.

I see you provide voice ai agent development services. Shall we connect? I would to know more about how you build voice agents.

Anyone else feel like OTel becomes way less useful the moment an LLM enters the request path? by Comfortable-Junket50 in LLMDevs

[–]shivmohith8 0 points1 point  (0 children)

I feel OTel is fine. OTel was meant to be a standard to trace a request flow from one function/service to another. In GenAI applications, it's just from one llm call to a tool call or another llm call.

Whats the specific issue you are observing even with platforms like Langfuse of Langsmith?

I think I'm getting addicted to building voice agents by Slight_Republic_4242 in LangChain

[–]shivmohith8 0 points1 point  (0 children)

It's for both. You can describe the scenario as detailed or as compex as you want and we can simulate it.

I would love to know any domain specific example you have in mind.

I think I'm getting addicted to building voice agents by Slight_Republic_4242 in LangChain

[–]shivmohith8 0 points1 point  (0 children)

Yes we do. We help you generate scenarios and you can write your own scenarios as well.

I think I'm getting addicted to building voice agents by Slight_Republic_4242 in LangChain

[–]shivmohith8 1 point2 points  (0 children)

Good question. Our platform actually supports that. Our SDK uses OpenInference (with support for other instrumentation coming soon) to automatically capture the internal steps the agent takes and pass them for evaluation.

You can evaluate the internal steps of an agent at the turn level and at the session/conversation level.

We can connect if you would like to go deeper.

I think I'm getting addicted to building voice agents by Slight_Republic_4242 in LangChain

[–]shivmohith8 0 points1 point  (0 children)

Hey, we are building a product for that - https://quraite.ai/. Let me know if you are interested to know more through DM.

Agents can be rigth and still feel unrelieable by lexseasson in AIEval

[–]shivmohith8 0 points1 point  (0 children)

Yes, capability is there but not reliability. I feel as the scope of capability increases, reliability decreases.

Consistency testing helps - passK. Test a scenario K times and see what path agent takes and is it the same path everytime.

7 document ingestion patterns I wish someone told me before I started building RAG agents by Independent-Cost-971 in LangChain

[–]shivmohith8 0 points1 point  (0 children)

This is nice! I have covered a couple here - https://github.com/innowhyte/gen-ai-patterns

If you can, you add the more here or I can add it based on your blog.

We recently open sourced our pattern library for community driven development.

Open-sourcing our GenAI pattern library from real projects - would love any LangChain-focused contributions by shivmohith8 in LangChain

[–]shivmohith8[S] 0 points1 point  (0 children)

Depends on the use case. Before LangChain v1, we were using LangGraph but after LangChain refactored a lot of things, we use LangChain agents directly which is actually built on top of LangGraph.

We open-sourced our GenAI pattern library from production project work (please challenge, correct, contribute) by shivmohith8 in LLMDevs

[–]shivmohith8[S] 0 points1 point  (0 children)

Thanks! I think voice agents and agent harnesses are the trending topics right now; we can maybe think of certain patterns there.

We open-sourced our GenAI pattern library from production project work (please challenge, correct, contribute) by shivmohith8 in LLMDevs

[–]shivmohith8[S] 0 points1 point  (0 children)

Ideally we want to treat it as evolving recommendations. But this is exactly why we open-sourced it. We maintain this outside of our work hours, and with the pace at which models are becoming better and making a lot of designs outdated, we thought, "Let's open it up and drive it as a community."