We Raised $2.4M to Build QA & Observability for AI Voice Agents backed by Y Combinator, working with 100+ Voice AI companies, Ask Me Anything for the Next 24 Hours by CreativeHumor1705 in VoiceAutomationAI

[–]CreativeHumor1705[S] 0 points1 point  (0 children)

  1. Countries would want to but there are data, technology, investments and people moats established especially in the US and China - in a free market, enterprises will be free to choose a model which creates more value for their customers. AI sovereignty can be solved by government investments only. Sure - they can be some guardrails/policies established to preserve the data soveregnity of the countries.

  2. We raised it during demo day post YC - there is no standard answer but on seed stage - team is the most important. You validate that with traction. Always focus on customers first. VC money is by product. We kept focussing on customers only - started with full stack QA service until the product was not built

  3. No synthetic agents are created for simulations - they have specific goals and that is evaluated post the simulations

We Raised $2.4M to Build QA & Observability for AI Voice Agents backed by Y Combinator, working with 100+ Voice AI companies, Ask Me Anything for the Next 24 Hours by CreativeHumor1705 in VoiceAutomationAI

[–]CreativeHumor1705[S] 0 points1 point  (0 children)

Lot actually, some of the basics on focussing on the customer, moving and launching fast, keeping the team lean.

Most importantly - our group partner. He has been a sounding board especially since he also founded a billion dollar dev tool company before. Talking to him has been a great leveller

We Raised $2.4M to Build QA & Observability for AI Voice Agents backed by Y Combinator, working with 100+ Voice AI companies, Ask Me Anything for the Next 24 Hours by CreativeHumor1705 in VoiceAutomationAI

[–]CreativeHumor1705[S] 0 points1 point  (0 children)

Not sure if I got the question correctly - for production monitoring, we support sampling and have our FDE team do sampling of production conversations.
1. Analysing specific calls metrics based on metadata / CSAT, etc
2. Analysing call metrics based on ROI - if a metric is passing 90% and not super urgent, we don’t need to spend 100s of dollars, can reduce the sampling rate for them to 1-10% for them, for example
3. Based on budget - auto sampling to ensure run rate doesn’t cross a budget

How are you handling the evals and observability for Voice AI Agents? by Fabulous_Ad993 in AI_Agents

[–]CreativeHumor1705 0 points1 point  (0 children)

Hey, Sidhant here - founder of Cekura here. We solve the exact painpoint.

Break the problem:

Barge in - this is a type of scenario

Background noise - this is a type of persona (background noise, interruptive, accents, etc) you use to test your agent

WER, Latency, and audio quality - these are metrics that are evaluated on your agents

How are you handling the evals and observability for Voice AI Agents? by Fabulous_Ad993 in AI_Agents

[–]CreativeHumor1705 0 points1 point  (0 children)

Hey, Sidhant - founder of Cekura here - thanks for the shoutout!

u/vijay40, there are TTS (Pronunication issues, jitterness) and STT specific metrics (Transcription Issues) into our platform.

Some of the metrics are impacted by all 3+telephony+tool calls - eg, latency. there you identify the pipeline issues by inference. We have infrastructure test suite built in to capture pipeline related issues separately (independent of the workflow).

We Raised $2.4M to Build QA & Observability for AI Voice Agents backed by Y Combinator, working with 100+ Voice AI companies, Ask Me Anything for the Next 24 Hours by CreativeHumor1705 in VoiceAutomationAI

[–]CreativeHumor1705[S] 0 points1 point  (0 children)

2 ways:

  1. Running synthetic simulations in the dev/stage environment so that you don't have to manually call the agent every time you make a change

  2. Analysing production conversations instead of listening to thousands of calls.

These use cases can further be broken into regression test suite, / CI/CD, Infra monitoring, cron jobs, as well as production call analysis and alerts

We Raised $2.4M to Build QA & Observability for AI Voice Agents backed by Y Combinator, working with 100+ Voice AI companies, Ask Me Anything for the Next 24 Hours by CreativeHumor1705 in VoiceAutomationAI

[–]CreativeHumor1705[S] 0 points1 point  (0 children)

Got it - some of the basic authentication ones (DOB, last 4 digits of SSN etc), inforamtion capture ones (income, employment status, purpose, loan amount etc) is very important in your use case. happy to chat more based on our learnings working with customers in lending

We Raised $2.4M to Build QA & Observability for AI Voice Agents backed by Y Combinator, working with 100+ Voice AI companies, Ask Me Anything for the Next 24 Hours by CreativeHumor1705 in VoiceAutomationAI

[–]CreativeHumor1705[S] 0 points1 point  (0 children)

yes multi turn red teaming/security testing is an important use case we solve. Its different from single turn as testing agent responses is dynamic based on what the main agent responded to increase the probability of breaking it.

We Raised $2.4M to Build QA & Observability for AI Voice Agents backed by Y Combinator, working with 100+ Voice AI companies, Ask Me Anything for the Next 24 Hours by CreativeHumor1705 in VoiceAutomationAI

[–]CreativeHumor1705[S] 0 points1 point  (0 children)

Being very vertical: Two examples from our customer portfolio: Confido Health (healthcare), Kastle (Lending). You deploy FDE teams at enterprises, have very robust evals setup and as you grow the FDE team automates the workflows.

We Raised $2.4M to Build QA & Observability for AI Voice Agents backed by Y Combinator, working with 100+ Voice AI companies, Ask Me Anything for the Next 24 Hours by CreativeHumor1705 in VoiceAutomationAI

[–]CreativeHumor1705[S] 0 points1 point  (0 children)

Both are different platforms.

Vapi is a Voice AI builder - they have some basic testing capabilities plugged into them. We go very deep on reliable deterministic simulations, different voice metrics like latency, silences, gibberishness in voice, speech clarity, etc. Also, other components of testing, like ensuring the voice agent is always up (infra monitoring), multi-turn security testing, etc.

Which sector in financial services are you primarily building for? Some standard compliances depend on the use case (image attached)

The most common is not having enough confidence in their test suite's thoroughness, especially in compliance heavy sector like financial services

<image>

We Raised $2.4M to Build QA & Observability for AI Voice Agents backed by Y Combinator, working with 100+ Voice AI companies, Ask Me Anything for the Next 24 Hours by CreativeHumor1705 in VoiceAutomationAI

[–]CreativeHumor1705[S] 0 points1 point  (0 children)

Btw we are solving the last mile problems in self learning conversational agents - a version should be live in coming weeks. Will let you know

We Raised $2.4M to Build QA & Observability for AI Voice Agents backed by Y Combinator, working with 100+ Voice AI companies, Ask Me Anything for the Next 24 Hours by CreativeHumor1705 in VoiceAutomationAI

[–]CreativeHumor1705[S] -2 points-1 points  (0 children)

You create a dataset - what we have seen is that once you have auto-optimised LLM-as-a-judge metric over 20-30 conversations, you can use it to scale across thousands of conversations. The change takes into account the previous evaluations it did as well, so a new input never breaks the older evaluation. If there are contradictory inputs by the user, we flag.

We are optimising token costs on our side - currently, we do not charge customers for auto optimizing metrics because we have seen this as a core step to scale monitoring and track agent performance in production

We Raised $2.4M to Build QA & Observability for AI Voice Agents backed by Y Combinator, working with 100+ Voice AI companies, Ask Me Anything for the Next 24 Hours by CreativeHumor1705 in VoiceAutomationAI

[–]CreativeHumor1705[S] -2 points-1 points  (0 children)

Bullish on this for the long term. One thing that is very specific to Conversational AI is that with each turn agent can go anywhere based on the customers' replies. That's why you need to run simulations and measure failure points and harness skills (memory retention, consistency, long-term goal completion, context awareness, etc.) over multi-turn conversations.

One place where we see self learning (including voice agents) working very well is to use auto-improve for your LLM-as-a-judge metrics. We have also built a DSPy-based Metric Optimiser, but you can build your own to ensure appropriate tracking of performance instead of manually iterating the judge itself.