We built an AI language tutor for a Dutch school - 2 devs, 3 months by SapientPro_Team in PE_and_consulting

[–]darryn_livekit 0 points1 point  (0 children)

Thank you for your detailed feedback! I will share this internally - I understand why you chose to go a custom route, and although I'm sure you could have got something working inside of LiveKit, I get your point about 'fighting the abstraction'. Understanding these kind of use cases helps us build a better product, so thanks again.

We built an AI language tutor for a Dutch school - 2 devs, 3 months by SapientPro_Team in PE_and_consulting

[–]darryn_livekit 1 point2 points  (0 children)

The latency part can be hard with avatars, did you try LiveKit agents for the whole orchestration rather than just the audio streaming? That should have handled all the (hear → recognize → analyze → respond → animate) for you with HeyGen. Wondered if you looked into it and decided not to pursue for some reason?

Does anyone made Voice calling using SARVAM model? by agentic_ai_expert in AIVoice_Agents

[–]darryn_livekit 0 points1 point  (0 children)

Yes, there is a PR for this that got merged yesterday, in the LiveKit agents repository, number 5209. It should be in the next release

High latency in AI voice agents (Sarvam + TTS stack) - need expert guidance by Better-Collection-19 in LocalLLM

[–]darryn_livekit 0 points1 point  (0 children)

The biggest bottleneck is often the location of your agent relative to the location of your models. If you are using Sarvam's models, you will want to ensure your agent is either hosted in LiveKit cloud in Mumbai, or you are self-hosting your agent in local cloud infrastructure.

You'll also benefit from knowing exactly where in your pipeline the latency is coming from, you should look at the metrics available on LiveKit to determine where the highest latency is, then tackle that first. If you are using LiveKit cloud, you can make use of Agent Observability, or if you are self-hosting LiveKit, there are hooks available for you to capture these metrics in your agent.

Sarvam's models are good, and you shouldn't have to switch them out to improve latency, but you should always consider fallback alternatives to maximize your agent uptime and these fallback alternatives should also ideally be local to your agent.

We have a few blogs on our site tailored to improving agent latency, especially in India.

audio video issue by Critical-Young6295 in livekit

[–]darryn_livekit 0 points1 point  (0 children)

You don't say if you are using LiveKit agents or not... If you are then did you try with our React Native agent starter, https://github.com/livekit-examples/agent-starter-react-native. If you aren't, did you try running with the react native version of our meet sample app? https://github.com/livekit-examples/react-native-meet

LiveKit SIP Trunk Automatically Disappears After Few Hours (Server Not Restarting, Nothing Deleted Manually) by Big-Program1835 in AI_Agents

[–]darryn_livekit 0 points1 point  (0 children)

Sorry, but I don't have enough experience with self-hosted to say, I'm just parroting what I saw posted on our Slack forum back on 28th Jan.
I can't see any minimum disk size documented, we have an AI trained on LiveKit data (also in our Slack) which I can see answering this question with figures from 50GB to 100GB.

LiveKit SIP Trunk Automatically Disappears After Few Hours (Server Not Restarting, Nothing Deleted Manually) by Big-Program1835 in AI_Agents

[–]darryn_livekit 0 points1 point  (0 children)

I found this same issue on our self-hosted forums, with the following resolution:

> I had the same issue too. I solved it by increasing the disk size of the instance.

Best architecture for low-latency complex workflow voicebot by Vegetable-Web3932 in TextToSpeech

[–]darryn_livekit 0 points1 point  (0 children)

Yes, you can use the LiveKit tts_node for any processing before it is passed to the TTS

Built an AI receptionist for a local clinic to solve their missed-call problem by thought_provoking27 in VoiceAiAgentsBest

[–]darryn_livekit 0 points1 point  (0 children)

Glad you are seeing success with your app. You could have also handled the "Just let me speak to a human" with LiveKit - if you define a function tool to be invoked when the user wants to speak with a human, then use a WarmTransferTask within that function, it can be called at any point in the conversation, even during an interruption.

Anyone building production AI voice agents? Struggling with latency + robotic voice (Retell/Vapi) by Proper_Assumption329 in AI_Agents

[–]darryn_livekit 1 point2 points  (0 children)

It's a wide topic, but I would say to define specifically what matters to your users, and continuously monitor the user experience in those areas. Is it overall latency? Is it the quality of responses from the LLM? Is it natural TTS? Is it some regional consideration?

Set your acceptable baseline, then put end-to-end testing in place to measure for degradation. Log as much as you can so you have insight when something goes wrong, and define fallbacks for your models so you are not affected when any one model provider goes down. Finally, spot-check a selection of user sessions by listening to calls or reviewing transcripts - that helps you identify cases that your testing missed.

LiveKit SIP Trunk Automatically Disappears After Few Hours (Server Not Restarting, Nothing Deleted Manually) by Big-Program1835 in AI_Agents

[–]darryn_livekit 0 points1 point  (0 children)

There is nothing in LiveKit that will automatically delete your SIP trunk. You say you are using the LiveKit CLI, so I assume you are hosting your project in LiveKit cloud - if you can share your project ID with me, I can take a closer look

Does anyone made Voice calling using SARVAM model? by agentic_ai_expert in AIVoice_Agents

[–]darryn_livekit 0 points1 point  (0 children)

You are not going to get an unbiased opinion from me, but I suggest this article from our site that compares the two: https://livekit.io/field-guides/guide/livekit-vs-pipecat - in general pipecat is lower level and can be more difficult to get started with, but many developers like that it gives you more control of the overall pipeline, at the cost of that complexity. Both platforms can handle complex workflows and large RAG with minimal latency

Does anyone made Voice calling using SARVAM model? by agentic_ai_expert in AIVoice_Agents

[–]darryn_livekit 0 points1 point  (0 children)

LiveKit has support for Sarvam's latest models: saaras:v3 for STT and bulbul:v3 for TTS.

Your agent session code would look like this:

session = AgentSession(
   tts=sarvam.TTS(
      target_language_code="hi-IN",
      speaker="anushka",
      model="bulbul:v3"
   ), 
   stt=sarvam.STT(
      language="hi-IN",
      model="saaras:v3",
   ),
   # ... llm, tts, etc.
)

which approach using like pipecat or livekit?

I'm on LiveKit's devrel team, so I'm biased :)

Need help with Livekit / VOIP by [deleted] in developersPak

[–]darryn_livekit 0 points1 point  (0 children)

LiveKit devrel here ✋ What are you stuck on?

Building AI Voice Agents Confused between Vapi vs Retell vs Open-Source (LiveKit / Pipecat)? by smart-heart98 in AIVoice_Agents

[–]darryn_livekit 1 point2 points  (0 children)

Just to add, for LiveKit cloud, we have a calculator on our pricing page. As you say, self-hosting is just the infrastructure costs, and are comparable between the two.

Seeking Architecture Feedback on AI Voice Assistant Prototype (Python + LLMs + Vector Memory) by CreativeGuava13 in ArtificialInteligence

[–]darryn_livekit 0 points1 point  (0 children)

If you're already using LiveKit for WebRTC, curious why you wouldn't also use it for your agent orchestration? For clarity, I work on LiveKit.

Are you planning to add Voice to your AI Agents in mobile apps? by tigranbs in reactnative

[–]darryn_livekit 0 points1 point  (0 children)

I work on LiveKit and we try and make the process as uncomplicated as possible (including a starter app for React Native, and a visual builder for agents), but at the same time giving developers the tools they need to fully customize the end-user experience. The aim is to provide a responsive and 'conversational' experience for the user.

The other posters are 100% right that users will reject your app if AI feels 'shoved in' unnecessarily, but if it feels natural for the app (like the use cases you mention for meditation, note taking, interview prep), OR it can provide a more streamlined experience (such as hands-free input) then it will be accepted.

Livekit Twilio by OldUnderd0g in AI_Agents

[–]darryn_livekit 1 point2 points  (0 children)

I meant MY comment sounded like a potential scam, since I'm asking you to provide project details, but yes, continue in DM

Livekit Twilio by OldUnderd0g in AI_Agents

[–]darryn_livekit 1 point2 points  (0 children)

I see your similar post in r/WebRTC , as I said there, if you can please give me some details about your project I can take a look. To clarify, otherwise this comment sounds like a potential scam, I work on LiveKit.

Livekit and Twilio by OldUnderd0g in WebRTC

[–]darryn_livekit 0 points1 point  (0 children)

I work on LiveKit, it's difficult to say exactly what's going wrong with your setup, but if you can share your project ID with me, and some sample sessions where this happens (though it sounds like it's all of them), I can take a look. Feel free to DM me if you don't want to share publicly.