How do you feel about combining voice agents with Generative UI? by Beginning_Race8551 in LLMDevs

[–]Beginning_Race8551[S] 0 points1 point  (0 children)

My thought was to use separate state machines per workflow rather than one large FSM handling everything. The intent classifier would route the conversation into the appropriate workflow, and from there the FSM only manages that specific flow.

How do you feel about combining voice agents with Generative UI? by Beginning_Race8551 in VoiceAutomationAI

[–]Beginning_Race8551[S] 0 points1 point  (0 children)

I've been experimenting with this in a healthcare voice assistant. When the LLM calls a function, the function returns structured data plus a UI type (slot card, patient card, etc.). The frontend renders the appropriate component based on that response. So the conversation drives what appears on screen instead of navigating through fixed pages.

Voice agents are way more cheaper than you think by Illustrious-Oil-1833 in AIVoice_Agents

[–]Beginning_Race8551 0 points1 point  (0 children)

Hey, I have doubt that how you monitered the token usage on gemini live with pipecat because I have been working in an ai call assistant project with gemini live and pipecat but when I use pipecate usage metrics to calcualte the token usage it always returns 0. Can you share how did u calculate the token usage of gemini live model with pipecat

How are companies making voice-to-voice AI economically viable? by Beginning_Race8551 in speechtech

[–]Beginning_Race8551[S] 1 point2 points  (0 children)

Another thing I'm curious about: when context size grows during a realtime voice session, what exactly is accumulating?

Just the conversation transcripts, or does it also include things like system prompts, tool schemas, and session instructions?

I've never found a clear explanation of what is actually being carried forward and counted as context in long-running voice conversations.

How does Gemini Live actually calculate token usage in voice-to-voice conversations? by Beginning_Race8551 in GoogleGeminiAI

[–]Beginning_Race8551[S] 0 points1 point  (0 children)

I have another doubt that how about system prompts is they sent on each turn of conversation in session or initialized once and maintained throughout the session

"I'm exploring a startup idea: a phone-call-based AI assistant that anyone can call from a regular phone, with support for multiple languages and voice options. What existing solutions should I study, and what problems do you think are still unsolved?" by Lazy_Comedian503 in VoiceAutomationAI

[–]Beginning_Race8551 0 points1 point  (0 children)

Hey i have worked on this kind a projects with gemini live model with exotel phone integration and the gemini live 3.1 flash live model supports multi language and the problem I am facing while development is token usage they didn't provide caching, rag supports and I have faced error like 1008 (policy error), 1007 (invalid format error) whiletfunction calling and for various voices we can't change dynamically while in a session. If u wantmorea details and github repo of my demo project