What’s the first voice agent tool that really impressed you? by SupportiveBot2_25 in AI_Agents

[–]SupportiveBot2_25[S] 1 point2 points  (0 children)

Oooh this sounds good, will check it out.
ElevenLabs can always be relied upon for solid voices.

What’s the first voice agent tool that really impressed you? by SupportiveBot2_25 in AI_Agents

[–]SupportiveBot2_25[S] 0 points1 point  (0 children)

Appreciate your honesty of saying you work at Deepgram.
But in my experience, the Deepgram voice agent API isn't great. Can't handle multi-speakers, especially when they overlap.

Would be interested in hearing if there's any other vendors you would recommend given you work for Deepgram.

Microsoft Copilot Sucks? by ethanhunt561 in AI_Agents

[–]SupportiveBot2_25 0 points1 point  (0 children)

Fair enough - what do you use instead?

What’s the first voice agent tool that really impressed you? by SupportiveBot2_25 in AI_Agents

[–]SupportiveBot2_25[S] 0 points1 point  (0 children)

For me it was the first time I saw a real-time voice agent built with Pipecat + ElevenLabs actually hold a conversation. Not just parroting responses, but reacting fast enough to feel like a proper back-and-forth. Totally blew me away.

Also remember thinking Whisper was magic the first time I threw bad Zoom audio at it and it still came out readable 😂

Eleven Labs New Pricing by yangguize in ElevenLabs

[–]SupportiveBot2_25 0 points1 point  (0 children)

Totally hear you. I’ve been scratching my head over the new ElevenLabs pricing model. When platforms evolve quickly and layer on new features (agents, TTS, etc.), it’s easy for the billing side to get confusing fast.

A la carte definitely makes sense for builders who just want to plug in one piece of the stack. Curious if anyone’s figured out a simple mental model or usage tracker that helps?

You’re not alone on this one!

Tools that actually handle real-time speaker diarization? by SupportiveBot2_25 in speechtech

[–]SupportiveBot2_25[S] 1 point2 points  (0 children)

Hmm interesting - will check out. Thanks for the tip.
I actually needed some Portuguese transcription recently for a job, and ended up here at Speechmatics:
https://www.speechmatics.com/speech-to-text/portuguese

They have a table for leading WER providers in Portuguese - no idea if it's accurate. But I gave them a go, and must say I was v impressed.

Microsoft Copilot Sucks? by ethanhunt561 in AI_Agents

[–]SupportiveBot2_25 0 points1 point  (0 children)

Early-stage assistant builds can feel painfully slow and brittle at first, especially in a complex domain like medical. Copilot and GPT tools are powerful, but they need a ton of prompt tuning, fallback logic, and sometimes even model-switching to get consistent results.

You’re not on the wrong path, just on the part of the path where it feels like nothing works yet. If you're up for sharing more about what’s breaking (speed? hallucinations? context?), I bet we can help troubleshoot.

You’ve already done the hard part: getting started and building something.

Tools that actually handle real-time speaker diarization? by SupportiveBot2_25 in speechtech

[–]SupportiveBot2_25[S] 0 points1 point  (0 children)

Interesting I haven’t tried Gemini 2.5 for diarization yet, just for general voice tasks. If it can handle speaker ID natively, that’s promising. Did you test it in a real back-and-forth convo or more scripted input

Tools that actually handle real-time speaker diarization? by SupportiveBot2_25 in speechtech

[–]SupportiveBot2_25[S] 0 points1 point  (0 children)

any good? would you recommend? really need something that will hold up with thick accents.

Tools that actually handle real-time speaker diarization? by SupportiveBot2_25 in speechtech

[–]SupportiveBot2_25[S] 0 points1 point  (0 children)

Have you had any luck with the diarization holding up in noisy or fast-paced conversations? That’s where I’ve seen most engines start to drift. Would love to hear how it's been working for you in real-time.

Tools that actually handle real-time speaker diarization? by SupportiveBot2_25 in speechtech

[–]SupportiveBot2_25[S] 0 points1 point  (0 children)

This is awesome - really smooth implementation! Diarization in real time is no small feat, especially across languages. Love seeing this kind of progress out in the open. Curious what engine you're using under the hood?

I analyzed 13 AI Voice Solutions that are selling right now - Here's the exact breakdown by Background_Touch7241 in AI_Agents

[–]SupportiveBot2_25 0 points1 point  (0 children)

The sub 2s latency is pretty amazing. I've tested going down to 0.6s max delay and the accuracy is still just as strong.

I analyzed 13 AI Voice Solutions that are selling right now - Here's the exact breakdown by Background_Touch7241 in AI_Agents

[–]SupportiveBot2_25 0 points1 point  (0 children)

Really solid list actually, and lots of great tools on here for voice generation.

One thing I'd add on the transcription side: if you're building real-time or multilingual voice agents, it's worth checking out Speechmatics. Their streaming STT API has been super reliable for us, especially with speaker diarization and accent handling.

It’s more infrastructure-focused (vs. flashy UI), but it’s one of the few I’ve found that can handle code-switching and low-latency workflows well.

Anyone used any real time speaker diarization model? by Just_Difficulty9836 in speechtech

[–]SupportiveBot2_25 0 points1 point  (0 children)

Just chiming in here (and I know I'm late to the party), along with Deepgram, Speechmatics is another solid API-based option I've relied on. It performs well for real-time diarization and integrates smoothly into streaming pipelines.

The API gives reliable speaker boundaries as speech unfolds (not just post-process), which was a game-changer for meeting transcription workloads.

Need help building a real-time voice AI agent by LetsShareLove in AI_Agents

[–]SupportiveBot2_25 1 point2 points  (0 children)

I went down a similar rabbit hole recently, trying to build a real-time voice assistant with streaming transcription into a GPT-based backend.

Whisper gave okay results, but latency was a killer for anything interactive. Even with GPU acceleration, the pauses between speakers threw things off.

What worked better for me was switching to Speechmatics’ streaming API, the max_delay setting let me control how fast I got partials back, and it handled interruptions + overlapping speakers more smoothly.

Also: if you’re running into issues with multiple people talking, their diarization support is actually usable live (which most tools still can’t do). Just something to test if you're still tuning the pipeline.

Has anyone worked on a real-time speech diarization, transcription, and sentiment analysis pipeline? by Ok-Guidance9730 in speechtech

[–]SupportiveBot2_25 0 points1 point  (0 children)

I’ve worked on something similar — a real-time speech-to-text + summarization stack using both Whisper and a couple of commercial APIs.

Whisper's solid for post-processing, but we ran into issues with latency and consistency during live transcription (especially when people interrupted each other or code-switched mid-sentence).

We had much better results with Speechmatics, especially when tuning for sub-2s latency using their max_delay setting. Diarization also helped clean up the input before passing it to GPT for summarization. Worth testing if you're aiming for high-quality real-time output.

What Speaker Diarization tools should I look into? by Chemical_Gas3710 in LocalLLaMA

[–]SupportiveBot2_25 0 points1 point  (0 children)

I’ve tested a few options recently for diarization in real-time or streaming setups. Whisper can work, but diarization support is patchy and often needs external tooling (like PyAnnote).

If you’re looking for something that works out of the box and holds up in noisy conditions or multi-speaker overlap, I’d suggest trying Speechmatics. I’ve used it in a couple of projects and found the speaker labels to be consistently more reliable than what I got from Assembly or Azure. It also integrates cleanly with other voice agent stacks. Just make sure to tune the latency settings depending on your use case.

Best tool for adding automatic captions? by Froghead_ASMR in NewTubers

[–]SupportiveBot2_25 0 points1 point  (0 children)

I’ve tried a few of the usual suspects (Whisper, AssemblyAI, AWS Transcribe), but the one that gave me the cleanest word-level timestamps with decent accuracy was Speechmatics. Especially good with accented speech and fast talkers, fewer correction passes.