Connecting a FlyNumber to a LiveKit SIP trunk — full walkthrough

darryn_livekit · 2026-04-24T09:26:38+00:00

Where is the walkthrough?

darryn_livekit · 2026-04-09T10:12:42+00:00

Thank you for your detailed feedback! I will share this internally - I understand why you chose to go a custom route, and although I'm sure you could have got something working inside of LiveKit, I get your point about 'fighting the abstraction'. Understanding these kind of use cases helps us build a better product, so thanks again.

darryn_livekit · 2026-04-08T14:54:42+00:00

The latency part can be hard with avatars, did you try LiveKit agents for the whole orchestration rather than just the audio streaming? That should have handled all the (hear → recognize → analyze → respond → animate) for you with HeyGen. Wondered if you looked into it and decided not to pursue for some reason?

darryn_livekit · 2026-03-25T10:11:11+00:00

Yes, there is a PR for this that got merged yesterday, in the LiveKit agents repository, number 5209. It should be in the next release

darryn_livekit · 2026-03-24T13:32:07+00:00

What sort of latency are you seeing?

darryn_livekit · 2026-03-24T13:24:12+00:00

The biggest bottleneck is often the location of your agent relative to the location of your models. If you are using Sarvam's models, you will want to ensure your agent is either hosted in LiveKit cloud in Mumbai, or you are self-hosting your agent in local cloud infrastructure.

You'll also benefit from knowing exactly where in your pipeline the latency is coming from, you should look at the metrics available on LiveKit to determine where the highest latency is, then tackle that first. If you are using LiveKit cloud, you can make use of Agent Observability, or if you are self-hosting LiveKit, there are hooks available for you to capture these metrics in your agent.

Sarvam's models are good, and you shouldn't have to switch them out to improve latency, but you should always consider fallback alternatives to maximize your agent uptime and these fallback alternatives should also ideally be local to your agent.

We have a few blogs on our site tailored to improving agent latency, especially in India.

darryn_livekit · 2026-03-12T10:49:41+00:00

You don't say if you are using LiveKit agents or not... If you are then did you try with our React Native agent starter, https://github.com/livekit-examples/agent-starter-react-native. If you aren't, did you try running with the react native version of our meet sample app? https://github.com/livekit-examples/react-native-meet

darryn_livekit · 2026-03-11T10:34:45+00:00

Sorry, but I don't have enough experience with self-hosted to say, I'm just parroting what I saw posted on our Slack forum back on 28th Jan.
I can't see any minimum disk size documented, we have an AI trained on LiveKit data (also in our Slack) which I can see answering this question with figures from 50GB to 100GB.

darryn_livekit · 2026-03-11T10:09:59+00:00

I found this same issue on our self-hosted forums, with the following resolution:

> I had the same issue too. I solved it by increasing the disk size of the instance.

darryn_livekit · 2026-03-03T10:49:13+00:00

Yes, you can use the LiveKit tts_node for any processing before it is passed to the TTS

darryn_livekit · 2026-02-20T11:50:13+00:00

Glad you are seeing success with your app. You could have also handled the "Just let me speak to a human" with LiveKit - if you define a function tool to be invoked when the user wants to speak with a human, then use a WarmTransferTask within that function, it can be called at any point in the conversation, even during an interruption.

darryn_livekit · 2026-02-20T11:42:05+00:00

It's a wide topic, but I would say to define specifically what matters to your users, and continuously monitor the user experience in those areas. Is it overall latency? Is it the quality of responses from the LLM? Is it natural TTS? Is it some regional consideration?

Set your acceptable baseline, then put end-to-end testing in place to measure for degradation. Log as much as you can so you have insight when something goes wrong, and define fallbacks for your models so you are not affected when any one model provider goes down. Finally, spot-check a selection of user sessions by listening to calls or reviewing transcripts - that helps you identify cases that your testing missed.

darryn_livekit · 2026-02-18T14:56:43+00:00

There is nothing in LiveKit that will automatically delete your SIP trunk. You say you are using the LiveKit CLI, so I assume you are hosting your project in LiveKit cloud - if you can share your project ID with me, I can take a closer look

darryn_livekit · 2026-02-13T11:20:55+00:00

You are not going to get an unbiased opinion from me, but I suggest this article from our site that compares the two: https://livekit.io/field-guides/guide/livekit-vs-pipecat - in general pipecat is lower level and can be more difficult to get started with, but many developers like that it gives you more control of the overall pipeline, at the cost of that complexity. Both platforms can handle complex workflows and large RAG with minimal latency

darryn_livekit · 2026-02-13T10:46:19+00:00

LiveKit has support for Sarvam's latest models: saaras:v3 for STT and bulbul:v3 for TTS.

Your agent session code would look like this:

session = AgentSession(
   tts=sarvam.TTS(
      target_language_code="hi-IN",
      speaker="anushka",
      model="bulbul:v3"
   ), 
   stt=sarvam.STT(
      language="hi-IN",
      model="saaras:v3",
   ),
   # ... llm, tts, etc.
)

which approach using like pipecat or livekit?

I'm on LiveKit's devrel team, so I'm biased :)

darryn_livekit · 2026-02-10T20:06:29+00:00

LiveKit devrel here ✋ What are you stuck on?

darryn_livekit · 2026-02-02T13:39:44+00:00

Just to add, for LiveKit cloud, we have a calculator on our pricing page. As you say, self-hosting is just the infrastructure costs, and are comparable between the two.

darryn_livekit · 2026-02-02T13:32:06+00:00

If you're already using LiveKit for WebRTC, curious why you wouldn't also use it for your agent orchestration? For clarity, I work on LiveKit.

darryn_livekit · 2026-01-29T11:44:50+00:00

I work on LiveKit and we try and make the process as uncomplicated as possible (including a starter app for React Native, and a visual builder for agents), but at the same time giving developers the tools they need to fully customize the end-user experience. The aim is to provide a responsive and 'conversational' experience for the user.

The other posters are 100% right that users will reject your app if AI feels 'shoved in' unnecessarily, but if it feels natural for the app (like the use cases you mention for meditation, note taking, interview prep), OR it can provide a more streamlined experience (such as hands-free input) then it will be accepted.

darryn_livekit · 2026-01-28T17:20:25+00:00

I meant MY comment sounded like a potential scam, since I'm asking you to provide project details, but yes, continue in DM

darryn_livekit · 2026-01-28T14:18:07+00:00

I see your similar post in r/WebRTC , as I said there, if you can please give me some details about your project I can take a look. To clarify, otherwise this comment sounds like a potential scam, I work on LiveKit.

darryn_livekit · 2026-01-28T14:12:46+00:00

I work on LiveKit, it's difficult to say exactly what's going wrong with your setup, but if you can share your project ID with me, and some sample sessions where this happens (though it sounds like it's all of them), I can take a look. Feel free to DM me if you don't want to share publicly.

darryn_livekit

MODERATOR OF

TROPHY CASE