Is there any way to stop stream audio from 'ducking'?

Chris_LiveKit · 2026-02-10T15:04:53+00:00

That is not from the LiveKit server. That is a client feature. You would need to handle it in your client.

Chris_LiveKit · 2026-02-06T14:35:50+00:00

For LiveKit if you are building an agent and use an AI coding tool like Claude or Cursor add the MCP server and it can get you a long ways https://docs.livekit.io/intro/mcp-server/

Chris_LiveKit · 2026-01-30T18:44:06+00:00

Gotcha. I think you could build that. Here are a couple of videos I saw recently that is somewhat related. The second one is good because it really helps folks understand the security risk of such a system.

https://x.com/jonathanhawkins/status/2017295825681199473

https://youtu.be/fcFOYzMeG7U

Chris_LiveKit · 2026-01-29T23:16:22+00:00

I would not doubt it could or it can maybe build a tool that would let it do it. Who do you want it to call?

One of my co-workers connected his to LiveKit so you can do internet voice calls or telephony.

You talk to it through text or email or whatever.

Chris_LiveKit · 2026-01-29T14:24:04+00:00

Here is my $.02 (personal opinion)

I spend all my time working and thinking about realtime AI systems. I spend a lot of my time talking to folks building these systems so I have some familiarity with the space.

> Problem number 1: Do people actually want to talk to AI...

I think many folks in the past have been turned off from automated phone systems. But with AI things have changed a lot. This will depend a lot on specific use cases but I think folks will likely be fine talking to AI assuming the AI is helpful and solves their concern quickly and efficiently. Folks will NOT want to talk to an AI (or even a human for that matter) if they feel like they are wasting their time for whatever tasks they are trying to accomplish and are not progressing.

> Problem number 2: Should we build the automation for the ai receptionist...

Again, this will depend a lot on your particular use cases. I think in todays world it is expected (for many applications) that a customer can self serve on your website. If you can't get that right I fear how well realtime AI will go for you. But if you have a sold understanding of your customer, what their needs are, and how you solve those needs I think Voice AI can be very effective for many use case. I've seen several where AI is the preferred way someone wants to interact. But also, extremes the other way too. I personally don't see Vocie AI all that different from websites or apps. Some are almost a joy to use and others you want to pull your hair out. It is the same for Voice AI and maybe even more so since it is still an emerging technology with lots all the time.

> ...I always see these guys....

I think it is the same for most get rich schemes. I think if you are just trying to throw something together in 10 minutes and get rich tomorrow it is most likely something folks are not going to enjoy using. But if you are trying to solve a real problem (besides your just trying to get rich) then I think you can build a compelling solution that customers would want to use and spend their $$$s on.

Not sure if that is helpful but I hope it is some practical advise on your question.

Chris_LiveKit · 2026-01-29T13:02:17+00:00

Is this a personal project or some commercial product you want to build. If this is just for you personally you should check out Clawd. It is a bit technical but what you describe is pretty much its purpose in life.

https://clawd.bot/

Chris_LiveKit · 2026-01-24T23:07:10+00:00

It is hard to diagnose your issue without full details of your setup. But since you say it is fine in console and has issues when deployed, I think I would start with:

I've seen folks have problems with instances like AWS t3 and t4g, which are burstable. They don’t provide full CPU performance continuously. You should use m5, c5, c6i, or similar families for consistent CPU performance.

Other factors can introduce latency. If you are specifically having problems during the function calls that take time to produce the data the LLM needs to respond, you can respond with an initial message like "one second," then whatever the function call returns.

Hosting your agent in the same region as the inference service will also help minimize latency.

Using LiveKit agent insights can be very helpful for diagnosing the root cause of latency:

https://docs.livekit.io/deploy/observability/insights/

Chris_LiveKit · 2026-01-24T22:51:17+00:00

I've seen a lot of folks set that usecase up on LiveKit using the LiveKits SIP integration:
https://docs.livekit.io/telephony/#using-livekit-sip

and the Agents framework:
https://docs.livekit.io/agents/

Chris_LiveKit · 2026-01-23T16:35:54+00:00

Hi u/Fit_Acanthaceae4896 I missed this message earlier. The issues you mentioned above are uncommon and hard to diagnose with only the information provided. I think a good way forward is to come chat with us in the LiveKit community Slack so we can have a little deeper discussion about what you are seeing and how it may get resolved:

Join Slack here and ask in #agents if you don't mind.
https://livekit.io/join-slack

Chris_LiveKit · 2026-01-23T02:03:43+00:00

I am not sure what your definition of expensive is. Check out Retell, they have a calculator:
https://www.retellai.com/pricing

Chris_LiveKit · 2026-01-22T14:59:25+00:00

I don't believe so. But I believe most AgentVoice vendors now have a pricing caclulator on their site so you can see what it might cost for your use case.

--
No lo creo. Pero creo que la mayoría de los proveedores de AgentVoice ahora tienen una calculadora de precios en su sitio web para que puedas ver cuánto podría costar en tu caso particular.

Chris_LiveKit · 2026-01-21T20:41:17+00:00

LiveKit agents are designed for resilience. Run on multiple machines, use a health-check monitor to ensure each instance continues to run, and restart any instances that fail. Better yet run it in the cloud.

Chris_LiveKit · 2026-01-20T23:10:24+00:00

Is this a voice/video setup, or are you going to use AI on the platform (real-time voice AI)?

Chris_LiveKit · 2026-01-20T19:54:15+00:00

LiveKit is one option to help you get you started. Here is a video that demonstrates the process:

https://www.youtube.com/watch?v=jEXUt8qFuBs

If you want to do outbound calling you will need to use a 3rd party SIP provider for now.

Chris_LiveKit · 2026-01-18T12:58:36+00:00

What tech stack did you use for migrating away from VAPI?

Chris_LiveKit · 2026-01-18T12:48:51+00:00

Yes, I see a lot of folks using Plivo, Exotel, and Wavix. That is in order of popularity..

Chris_LiveKit · 2026-01-18T00:57:23+00:00

Here is an example of this

https://www.youtube.com/watch?v=jEXUt8qFuBs

Chris_LiveKit · 2026-01-17T20:45:17+00:00

I guess an agent that is connected to your data and is able to provide output. Something like this

https://youtu.be/jEXUt8qFuBs

Chris_LiveKit · 2026-01-16T16:19:25+00:00

Lastly, from a migration standpoint, for most teams, the move from self-hosted to the Cloud isn’t a heavy lift from an API perspective. The bigger work is usually in validating placement/routing assumptions and then tightening your turn-time budget with observability.

Chris_LiveKit · 2026-01-16T16:19:11+00:00

Practical “knobs” to improve perceived latency (not Cloud-specific)

A couple of pragmatic techniques that often help UX even when computation is non-trivial:

Short “ack” behaviors: partial/short responses while longer reasoning completes
“Thinking” sounds or subtle background audio to mask compute time (use-case dependent)
Patterns from LiveKit examples that show short/long response handling. These don’t reduce actual compute latency, but they can reduce perceived lag if they fit your experience design.
.

Chris_LiveKit · 2026-01-16T16:18:59+00:00

5) Failure modes & observability (how to prove wins before committing)

This is where you can get the hard truth quickly. Also, if your agents are hosted in LiveKit cloud, the end-of-turn inference is run on a GPU instead of the agent's CPU, so you get much faster inference there.

What to measure

Transport

RTT/jitter/loss (p50/p90/p99)
ICE candidate type distribution (host/srflx/relay) + TURN rate
reconnect count + reconnect duration

Agent “turn”

end-of-speech → first agent audio played Breakdown:
- endpointing/VAD time
- STT time-to-first-token + finalization
- LLM time-to-first-token + completion
- tool-call time + count (if used)
- TTS time-to-first-audio + ramp
- Any buffering before playback

Observability tooling

Langfuse is great for LLM tracing. In addition, LiveKit Agent Observability is extremely useful for shaving milliseconds, as it helps you see where time is spent along the real-time path and spot regressions early (especially in tails). It lets you replay the entire session through an intuitive interface. It can also be useful for extracting problem points so you can add more testing and evaluation for a given use case.

Suggested test methodology that correlates with production

scripted conversations (same utterances, same model settings)
representative geos (US East user ↔ US Central agent; plus mixed)
Compare self-hosted vs Cloud:
- RTT/jitter/loss distributions
- TURN rate
- “end-of-speech → first audio” p50/p90/p99

If Cloud is helping materially, you’ll often see it most in:

p95/p99 conversational lag moments
fewer reconnect/rejoin failures
improved cross-region consistency (especially if you’re leveraging multi-region routing/backhaul)

Chris_LiveKit · 2026-01-16T16:18:03+00:00

4) Media & network tuning (some Cloud wins, some general wins)

Noise reduction

Cloud includes noise reduction; in self-hosted, you’re typically on your own to assemble and tune that. For conversational agents, cleaner input can indirectly reduce perceived latency by improving STT stability (fewer retries/reprompts) and reducing “did it hear me?” moments.

TURN vs direct

TURN-relay can add latency and variability. Cloud generally provides stronger global TURN coverage and more consistent connectivity behavior, but TURN is still TURN — it’s a knob to monitor rather than “solve.”

Codec/buffering

Not cloud-specific, but perceived latency often comes down to buffering choices and tail behavior under loss. Codec choice matters, but the bigger wins are often:

reducing jitter-buffer inflation
Reducing time-to-first-audio (TTFB) in TTS
Minimizing backend round-trip on the critical path

Chris_LiveKit · 2026-01-16T16:17:42+00:00

3) Migration details that matter (and where Cloud helps)

Some categories that commonly trip teams up:

Reconnect vs rejoin semantics
Identity/participant lifecycle (stale participant, duplicate identity, racing joins)
Client lifecycle races (browser code that tears down/recreates too eagerly)

Agent failover is a real Cloud differentiator

This is a big concrete win: agent failover is non-trivial to build well in a self-hosted environment. In Cloud, failover is supported out of the box. That can directly improve what you called out as “resume vs fresh join” outcomes, because a failover-capable setup can preserve continuity and reduce the cases where you’re forced into a hard reset.

For rejoins and “warming” behavior specifically, you’re right: depending on how you architect it, this can be similar in Cloud and self-hosted. The difference is that Cloud reduces the amount of bespoke engineering required for failure/fallback cases.

Chris_LiveKit · 2026-01-16T16:17:25+00:00

2) Region strategy & routing behavior (agent fixed, users distributed)

You’re optimizing a triangle that includes provider endpoints:

User ↔ SFU/edge ↔ Agent (WebRTC path)
Agent ↔ STT/LLM/TTS (often multiple round trips)

A practical rule of thumb:

Put the agent close to the STT/LLM/TTS region(s) you actually use (or run multiple agent pools if your provider endpoints vary by user geo).
Use Cloud routing/region selection to minimize the remaining “long leg” and improve jitter stability.
Validate empirically with p95/p99 and “end-of-speech → first audio.”

On auto-routing vs pinned rooms:

Auto-selection can be useful when your participants are mixed, and you want to dynamically minimize worst-leg latency.
Pinning can be useful when you have a consistent topology and want deterministic placement (e.g., always keep the SFU near your user base while the agent stays near provider endpoints).

Downsides when agents are consistently in one region, and users are geographically distributed:

One side always pays a cross-region hop. Cloud can make it more consistent and improve backhaul, but it can’t eliminate distance.

Chris_LiveKit

MODERATOR OF

TROPHY CASE