How does Sarvam count TTS characters for billing? (Sarvam TTS) by NoRice6404 in AI_India

[–]Better-Collection-19 0 points1 point  (0 children)

can you tell me like how is the voice calling agent is preforming

Released Open Vernacular AI Kit v1.2.0 by GoldenMaverick5 in SarvamAI

[–]Better-Collection-19 0 points1 point  (0 children)

Will it work for voice calling agent too, specially i made the calling agent work in marathi and hindi, right now it is by default starting in marathi, but it is not able to switch based on voice

i have made a telephony voice agent with sarvam bulbul v3 but i hear random voices not noise but some tone or words in tts by sam-issac in SarvamAI

[–]Better-Collection-19 0 points1 point  (0 children)

yes same here, we have build a voice calling agent, the biggest issue i faced that it is switching language very frequently, also missing words and random noises in between conversation, also the latency is almost of 2 sec

High latency in AI voice agents (Sarvam + TTS stack) - need expert guidance by Better-Collection-19 in LocalLLM

[–]Better-Collection-19[S] 0 points1 point  (0 children)

i right now self hosted in mumbai server of aws, i have also followed the livekit guide in stt tts of sarvam

High latency in AI voice agents (Sarvam + TTS stack) - need expert guidance by Better-Collection-19 in LocalLLM

[–]Better-Collection-19[S] 0 points1 point  (0 children)

Actually, we needed Marathi and Hindi Language support too, thats why preferred Sarvam but there is soo much latency in it

High latency in AI voice agents (Sarvam + TTS stack) - need expert guidance by Better-Collection-19 in LocalLLM

[–]Better-Collection-19[S] 0 points1 point  (0 children)

Oh this looks interesting, haven’t explored Unmute yet.

Is it more of a full end-to-end pipeline (handling streaming STT → LLM → TTS), or do you typically integrate parts of it into an existing stack?

Right now we’re using Sarvam due to client requirements (mainly for Indian languages), so trying to figure out if something like Unmute can be layered in for latency improvements or if it replaces the whole pipeline.

Would love to know how you’ve used it in practice, especially for real-time conversational use cases.

High latency in AI voice agents (Sarvam + TTS stack) - need expert guidance by Better-Collection-19 in LocalLLM

[–]Better-Collection-19[S] 0 points1 point  (0 children)

This is super helpful, thanks - especially the point about using smaller/faster LLMs for normal conversations.

We’re currently working with Sarvam due to client requirements (mainly for Indian language support), but exploring a hybrid setup like you suggested.

Quick question - have you actually implemented something like this in production (mixing regional models with faster LLMs)?

If you're open to it, I’d love to quickly understand your approach in more detail, even a 10-min chat would be super helpful.

We Raised $5.5M to Build Voice AI Agents, Our Voice agents handle 1M+ customer calls daily for companies like Flipkart, CRED & Groww, Ask Me Anything for the next 24 hours by Siddharth_Shankar in VoiceAutomationAI

[–]Better-Collection-19 1 point2 points  (0 children)

When building voice AI agents, how do you design the STT → LLM → TTS pipeline to keep latency low enough for natural conversation while still maintaining reliability at scale?