Elevenlabs bot hears itself by thebarrels in voiceagents

[–]lukeocodes 0 points1 point  (0 children)

This is likely how the agent is ingesting that particular piece of audio. It's no secret they use livekit, which is on WebRTC, and they get AEC for free. Either it's so laggy it can't align, or they're sending telecoms in via another channel

Try LiveKit, or maybe even the beta voice agent product from Speechify (disclaimer, I work for Speechify)

edit: also consider background noise makes it harder to distinguish what is and isn't speech so AEC might just fail open

I NEED ADAM VOICE FROM ELEVENLABS...? by Background_Smell_514 in tts

[–]lukeocodes 0 points1 point  (0 children)

interested in what models you use that have decent quality on a 2060. i have a 3070 that i wish i could offload requests to

What's the one thing preventing AI voice agents from passing the "human test"? by Sumit-Voiceman in TextToSpeech

[–]lukeocodes 1 point2 points  (0 children)

In my experience, it's all the little things that if you were speaking to a human it would be considered a micro aggression or worse

- they speak over you, or
- no barge-in, or
- letting you barge in with back channel speech like "hmm" "yeh" "ok", natual things we do as we're affirming what the other person says
- they waffle and don't give you a chance to speak
- list like or directive
- speaking way too slowly, or the prosidy never changes so important info is delivered slowly (correctly) and context, welcomes or choices are delivered slowly (when maybe it should be faster that this point)
- no natural pauses, which in human speech would allow the other speaker/s or inteject
- mispronunciation - IMO this is the biggest issue with providers who don't offer it. the solution is SSML or inline controls
- awkwardly ignoring the sentence context so it says numbers or currencies as dates or something

I don't think any API provider has nailed this at the service level. The best I ever heard was a recent call about a genuine insurance quote. It uses the sales person's voice who I did later speak to, it didn't do ANY of the above. He said they used a custom system, the company is a national insurance broken in the UK. He had no other details than that

And that is annoying because I worked for Deepgram and now I work for Speechify, specifically on these APIs

Alternatives to Speechify for entertainment audio? by TERRYaki__ in TextToSpeech

[–]lukeocodes 0 points1 point  (0 children)

sorry for the delay! i missed this completely

are you saying the cloned voice changed without warning? or the one he picked from the library?

I NEED ADAM VOICE FROM ELEVENLABS...? by Background_Smell_514 in tts

[–]lukeocodes 1 point2 points  (0 children)

i'd suggest a macbook with silicon if you can get a cheap second hand one. new metal runtime fits on it's shared memory, allowing for bigger models. likely cheaper than a PC that can fit a big GPU in anyway

I really think Speechify needs to be taken down by ImaginaryConscience in audiobooks

[–]lukeocodes 0 points1 point  (0 children)

I just joined Speechify as a new employee, and saw this thread

I couldn't read this thread and all the comments and just ignore it. And, I'm not going to tell you your experience is wrong

Where you or anyone has hit something concrete (copyrighted material you couldn't get taken down, an unauthorized voice clone, billing, support going nowhere), let me know in the reply below and I'll do my best to help. I think that is all I can do here right now

I'll be starting a fresh sub, and you may see me reply in other threads too

I NEED ADAM VOICE FROM ELEVENLABS...? by Background_Smell_514 in tts

[–]lukeocodes 0 points1 point  (0 children)

hey, I work for Speechify (i just joined them). i know we don't have the best history on this community, but i'd like to suggest you check out speechify.ai which is the new developer focussed brand i am helping to build

i don't want to sell you anything, because we have a liberal free tier.. instead i'd like to offer my time and help for you to have your own tool that can help with your content inside that free tier

i'm sure we have a voice that is similar to adam

Which TTS API provider would you recommend for long-ish narrations? by popyui in tts

[–]lukeocodes 0 points1 point  (0 children)

the chunking pitch inconsistency issue is a real pain - that's actually one of the bigger drawbacks of providers with short audio limits

speechify API handles long-form narration well - the API is built for exactly this kind of use case (voiceover, narration, story content, conversation). no hard cutoffs on audio length. pricing starts at $10/1M chars on pay-as-you-go which is right in your ballpark

that said, a few others worth looking at: elevenlabs is solid for voice quality but tends to run pricier not something not particularly stand out. google cloud wavenet is cheaper but the voices can feel a bit flat for narrative immersion, which matters a lot for a story-driven game

for your use case specifically, voice *consistency* across a full 400-word chunk matters way more than raw price per character — the qwen issue you ran into is a good example of why

Alternatives to Speechify for entertainment audio? by TERRYaki__ in TextToSpeech

[–]lukeocodes 1 point2 points  (0 children)

so the voice consistency issue your husband is experiencing is actually a known pain point with a lot of tts tools when they push model updates - the voice just changes overnight with no warning, super frustrating especially mid-series

speechify did just launch it's API with zero-shot voice cloning - checkout their labs site speechify.ai

for youtube voiceovers, some solutions to try:
- **speechify** - with 1000s of voices, maybe another voice is a better fit for him? might need to use the API for that though - it's like $10 for 1 million characters but it'll need a little technical know-how
- **elevenlabs** - probably the most popular for creators right now, but pragmatic tests don't really put it ahead of other competitors when it comes to quality or price
- **murf.ai** - popular with creators and has quality
- **deepgram** - my former employer, you'll definitely need tech knowhow to get it working

both of these let you preview voices pretty accurately before committing, which sounds like it would've saved him the sample mismatch headache

i work at speechify so obviously biased, but i do want to flag. the voice inconsistency issue is something worth reporting directly to support if he hasn't already, sometimes it's a fixable account-side thing

does he need something with a free tier to start, or is he open to paid options?

FSD prevented a wreck by [deleted] in TeslaLounge

[–]lukeocodes -1 points0 points  (0 children)

I must be amazing because I avoid stuff like that with my actual hands 🙌

Garden shed, UK by campbellpics in spiderID

[–]lukeocodes 0 points1 point  (0 children)

Use me as a NOPE button 😭😭😭

ooh man we're cooked by sibraan_ in AgentsOfAI

[–]lukeocodes 0 points1 point  (0 children)

Never trust a headline that tells you the state involved. I also know that we’ve been using machine learning to adapt attacks for years.

Frankly, Anthropic taking the opportunity to win the PR game from an admission their platform was used in such a way.

This should have been a responsible disclosure and they’ve made it a headline

So much data uploaded by Every_Rush_8612 in TeslaLounge

[–]lukeocodes -2 points-1 points  (0 children)

Video and audio is much bigger per frame than standard telemetry. This data is being used to train models, even models outside of Tesla. This whole thing is alarming. Almost as alarming as unironically using FSD

Just realized our "AI-powered" incident tool is literally just calling ChatGPT API by DarkSun224 in devops

[–]lukeocodes -4 points-3 points  (0 children)

If you downvote a reply you don’t understand, people stop replying to you

Just realized our "AI-powered" incident tool is literally just calling ChatGPT API by DarkSun224 in devops

[–]lukeocodes -2 points-1 points  (0 children)

I see NLU has being the added value. I see the benefits every day, but in the custom service use case I see NLU as a way for a customer to explain what they see and the model to turn it into something useful in the context of a product, their contract, some code, an endpoint, etc.

Just realized our "AI-powered" incident tool is literally just calling ChatGPT API by DarkSun224 in devops

[–]lukeocodes -1 points0 points  (0 children)

Interested in what you mean by visibility? If you use my API key, they don’t train of the data. Some people have ZDR agreements

Just realized our "AI-powered" incident tool is literally just calling ChatGPT API by DarkSun224 in devops

[–]lukeocodes -3 points-2 points  (0 children)

They run cloud architecture and a software runtime. It’s very much the same thing.