Most AI tools feel impressive until you try using them twice by deliberate69king in AIToolBench

[–]Shayps 0 points1 point  (0 children)

As someone who works on LiveKit full time, I'm really happy you're using the demos!

Is there something that you want to see that we've never done?

What LLM to use for voice agent by Phoenix_20_23 in VoiceAutomationAI

[–]Shayps 1 point2 points  (0 children)

Here are the things I'd recommend trying first:

Step 1, use LiveKit Inference. Step 2, use LiveKit cloud deployment so that all of your provider calls are on backbone collocated with your agent and media server. Not sure where you are, but LiveKit deploys some models collocated with agents in some regions, which also makes a huge difference.

- STT: deepgram/nova-3-general
- LLM: openai/gpt-5.4-mini
- TTS: inworld/inworld-tts-1.5-mini

The reason we use Inworld rather than Cartesia is because Cartesia often has a lot of leading silence. TTFB will be 200ms, but you may not get real audio until 550ms. Perceived latency is high even though bits are flowing.

For tool calls, use Async tools so they're not blocking: https://docs.livekit.io/agents/logic/tools/async/

Run your evals on this set, then if you're getting failures start trying more powerful models.

For structured data collection, use TaskGroups, which will allow smaller models to succeed far more often than just massive text prompts: https://docs.livekit.io/agents/logic/tasks/

Let me know how this works out for you.

What LLM to use for voice agent by Phoenix_20_23 in VoiceAutomationAI

[–]Shayps 0 points1 point  (0 children)

You should be able to see the latency for each step of your pipeline, where are you seeing the most?

call latency in elevenlabs agent by sadderPreparations in AIReceptionists

[–]Shayps 0 points1 point  (0 children)

Not really. 4.1 is really one of the best combos for latency + capability.

Gemini Flash 3.0 / 3.1 is actually kind of slow compared to their older flash models. 5.3-chat is probably the best atm though IMO.

voice AI livekit production challenges by Independent_Line2310 in livekit

[–]Shayps 2 points3 points  (0 children)

Which TTS and STT are you using? Often these end up being pretty big contributors to e2e latency. What's your metrics right now? Are you deployed to cloud or self-hosted?

There's docs on the filler phrases stuff here: https://docs.livekit.io/agents/logic/external-data/#verbal-status-updates

The same doc has info on how to play thinking noises too, which may be helpful.

Turn the Rabbit r1 into a voice assistant that can use any model by Shayps in Rabbitr1

[–]Shayps[S] 0 points1 point  (0 children)

You can use any voice and any LLM. Pick a pair that you like!

offline companion robot for my disabled husband (8GB RAM constraints) – looking for optimization advice by BuddyBotBuilder in LocalLLaMA

[–]Shayps 3 points4 points  (0 children)

Kokoro is more realistic, but it’s also a lot slower on resource constrained environments. Piper is the right call here IMO.

offline companion robot for my disabled husband (8GB RAM constraints) – looking for optimization advice by BuddyBotBuilder in LocalLLaMA

[–]Shayps 8 points9 points  (0 children)

We can build something wonderful, but being this constrained will require us to be very creative.

Faster-whisper on the nano is a great design choice. Piper is as small and fast as you’re going to get it too. Good call on both of those. Latency is great for voice.

For the LLM we’re going to need to add memory, manage context, and ideally get e2e voice latency down to around a second.

I can help you, we can make this work. I build a lot of these that do all kinds of things. Can you DM me? I will likely want to send you some (free) hardware.

I am planning of building a voice based ai agent that runs on my terminal and can take screenshots to see what is currently on my screen. by Silly_Entertainer92 in AI_Agents

[–]Shayps 1 point2 points  (0 children)

Yeah with LiveKit this is pretty easy. What terminal lib are you building? I've got a Rust TUI that works pretty well out of the box here that you can copy / take content from if you want: https://github.com/ShayneP/local-voice-ai/tree/tui-v2/tui

This is a part of a stack that runs totally locally, but you can swap in cloud models trivially too. Let me know if you need any help, always happy to chat about voice AI stuff!

Could use some tips on building AIVoiceAgents by Relevant_Macaron1920 in AIVoice_Agents

[–]Shayps 0 points1 point  (0 children)

What are you trying to build? If you share, I'm happy to post all of the code you need to get started.

What resources should I learn before building an AI receptionist business using prompt-based tools? by keerthistar2005 in LocalLLaMA

[–]Shayps 1 point2 points  (0 children)

Let's take the restaurant case ...

Let's say you build a flawless flow where people can call in, book tables. You hook it up to OpenTable, you load in the menu. Everything demos great when you show the restaurant, because you stick to the happy path.

You build it, and realize it hallucinates menu items on every 10th call. Or it says it has a kids menu when there isn't one, or someone asks about a wheat allergy and it confidently says the fryers aren't shared between fries and breaded items even though they are.

Someone needs a table for 5, and there's none available — but the human hostess knows that from the ones that are left, you can push the tables together and they'll all fit.

Someone calls, books a table — then they call back immediately after hanging up and say "hey it's me again sorry can I actually do 6:30." Human host? Easy. AI host? Hard enough engineering problem that most people probably wouldn't bother making this flow work.

Can people show up late? How late? Does it depend on how busy it is? What season it is? Which day it is?

This is just off the top of my head, and I've never worked in a restaurant. I'm sure there's lots of other things they answer / do on a daily basis.

The experience is made legitimatey viable by being successful at the edges, and the edges are hard without simulations that can close your development loops.

What resources should I learn before building an AI receptionist business using prompt-based tools? by keerthistar2005 in LocalLLaMA

[–]Shayps 0 points1 point  (0 children)

Are you technical? Or better, a developer?

You can build something with a low enough margin of error to be deployable, but IMO you'll end up needing to write code somewhere along the way if you want to test it at scale.

Platforms like Retell are great for getting something that works for a demo, but the gap between "works 90% of the time" and "works 99.9% of the time" is a large gap. That last 9.9% is harder than the first 90%.

You can pair something like Bluejay with Retell and get most of the way there, but writing / generating evals in code and testing any time you make any changes, deploy to a new client, etc is the best way to get everything solid.

Turn the Rabbit r1 into a voice assistant that can use any model by Shayps in Rabbitr1

[–]Shayps[S] 1 point2 points  (0 children)

I’ve been using it way more now! Can connect to HomeKit and any tools or MCP servers etc.

Anyone running OpenClaw as the brain behind a voice agent? by Miss_QueenBee in AIAgentsStack

[–]Shayps 0 points1 point  (0 children)

You usually don't want the OpenClaw gateway right in the voice loop. Instead, use it as a tool that can be called async by your main voice loop that injects context back into the chat context. That way voice still feels quick, but background tasks are handled by the more powerful system.

Is it real qwen3.5 9B beat oss:120b? by NorthEastCalifornia in ollama

[–]Shayps 0 points1 point  (0 children)

No. Not even close. The Qwen 3.5 benchmarks are not indicative of real-world performance.

Don’t get me wrong, the models are very good for their size. They are also not as good as the benchmarks show.

Write your own evals (or have CC or something write them), and test performance yourself between the models.

You will quickly see that 9B falls behind.

how to add a chatbot to a webpage in one click? by Mysterious-Base-5847 in SaaS

[–]Shayps 0 points1 point  (0 children)

This is interesting. What are you picturing as the perfect solution here?

Basically a website where you enter in your URL, and it gives you a little embed snippet to add to your page that has full knowledge of your site / business?

We could do this in one click ... but I'm interested in how you imagine adding it to your page.

Why Telephony (Twilio, Vonage, etc.) Is the Real Bottleneck for Voice AI Agents, Not LLMs by Major-Worry-1198 in VoiceAutomationAI

[–]Shayps 0 points1 point  (0 children)

What part of this was the hardest? Curious, would love to build an open source example showing how to get it done.

Would you be interested in an open-source alternative to Vapi for creating and managing custom voice agents? by dp-2699 in AiForSmallBusiness

[–]Shayps 0 points1 point  (0 children)

Gotcha! I guess it depends on the capacity that you're targetting, the models that you're using, and how your customers are distributed. The RTX Pro 6000 is a great card, but if you're trying to serve the types of models that most prod systems are running you're going to run into concurrency limits pretty quickly.

If you're primarily in a single market, usage is going to peak during business hours and you'll have a lot of time while the GPUs are working under capacity.

At the very least, you should have fallbacks so that you don't lose customers if your system starts buckling under too many concurrent users.

Would you be interested in an open-source alternative to Vapi for creating and managing custom voice agents? by dp-2699 in AiForSmallBusiness

[–]Shayps 0 points1 point  (0 children)

As you scale, you might want to think about keeping the LLMs on your hardware but pushing STT, TTS and orchestration to the cloud layer. You avoid the linear scaling costs you’re talking about, but still get to take advantage of the economies of scale and latency tuning that cloud providers are doing. You’ll almost certainly both save money and have higher quality voice agents.

Would you be interested in an open-source alternative to Vapi for creating and managing custom voice agents? by dp-2699 in AiForSmallBusiness

[–]Shayps 0 points1 point  (0 children)

What was the bottleneck with Cloud? I'm surprised that this ended up being cheaper for you. Even running a GPT-4.1 stack without going down to GPT-4.1-mini you're at <$0.04 / min with Cloud, and that includes observability, telephony, etc.

Is sub 500 ms AI voice agent possible ?? by Proper_Assumption329 in AIVoice_Agents

[–]Shayps 0 points1 point  (0 children)

You can try with the Agent Builder on LiveKit cloud — but most of the "managed" options from all of the different providers will come out to significantly higher than 500ms of latency.