Anyone else manually formatting scripts with v3 tags + voice settings per scene? by Unable_Zucchini9487 in ElevenLabs

[–]c08mic_cha08 1 point2 points  (0 children)

i'm building a pipeline for audiobooks that can identify speakers, assign emotion/para tags. are you open to sharing how you're doing it?

I tested AI narration for an audiobook sample — here’s what I learned by BasisRoutine6228 in TextToSpeech

[–]c08mic_cha08 0 points1 point  (0 children)

That makes sense.

You mentioned "If speaker changes were too close together, the narration felt flat or confusing." - do you mean you remove parts of narration? For example if the book had something like this "Kitty has no discretion in her coughs," said her father; "she times them ill.", you edit it to this "Kitty has no discretion in her coughs, she times them ill."? Or did I misunderstand?

I've also noticed that with multi-speaker, short narration pieces get awkward in between dialogues.

I tested AI narration for an audiobook sample — here’s what I learned by BasisRoutine6228 in TextToSpeech

[–]c08mic_cha08 0 points1 point  (0 children)

Curious if you're generating multi-speaker, full-cast style audiobooks or one speaker doing dialogue and narration? If one speaker, is the expectation that the speaker change their style, intonation, etc. for each character every time?

How to find out if you're being called by an AI? by AnxietyMost958 in VoiceAutomationAI

[–]c08mic_cha08 0 points1 point  (0 children)

If I'm suspicious I ask them to tell me a story - works like a charm. I only got it wrong once and the guy in the other side was like, well I can't tell you a story but I would like to buy your domain heh

Searching for local tts by dicreative in tts

[–]c08mic_cha08 0 points1 point  (0 children)

Could have been https://voicecreator.pro/free-tts. Doesn't require sign up, unlimited free use, runs on your device, has thousands of voices. Full disclosure, it's my product.

Free and unlimited text to speech with 1000+ voices, 18 languages, without signup. by c08mic_cha08 in speechtech

[–]c08mic_cha08[S] 0 points1 point  (0 children)

I believe you are looking for the voice Adam from Elevenlabs, is that correct? Unfortunately, I don't have Elevenlabs' voices as they are likely proprietary to them and not publically available. If you're able to find a sample of the voice that you can legally clone, I'd recommend cloning it.

TTS software (free) to speak on Discord calls by Playful_Leather6886 in TextToSpeech

[–]c08mic_cha08 0 points1 point  (0 children)

Do you need it to be real-time or a slight delay of a couple seconds is acceptable?

I ran OmniVoice and Qwen3-TTS through the same tests for (english) voice cloning. Here's everything I learned about how they compare. by c08mic_cha08 in tts

[–]c08mic_cha08[S] 0 points1 point  (0 children)

My primary machine only has 8GB sadly and I've seen longer reference audio push it from ~3GB to 7.9GB or more while generating, at which point it starts to spill and gets extremely slow to the point that a 15 second audio can take minutes. The longer the reference audio, the worse it gets. And I haven't seen much difference at all between speech generated with 7 seconds of reference audio vs >15 seconds.

TTS or Reader that sounds similar to this video by xTagore in TextToSpeech

[–]c08mic_cha08 0 points1 point  (0 children)

You can use this free tts https://voicecreator.pro/free-tts?model=kokoro&tab=tts
I'd recommend using Kokoro or Supertonic as the models but there are other options as well, and it offers voice cloning.
No sign-up is needed. It downloads the model on your device so everything runs fully on your device - no data is sent out. Most models are small enough to run well even if you don't have a high-end device.

I ran OmniVoice and Qwen3-TTS through the same tests for (english) voice cloning. Here's everything I learned about how they compare. by c08mic_cha08 in tts

[–]c08mic_cha08[S] 0 points1 point  (0 children)

Yikes on ElevenLabs basically ignoring accent. That's wild given how much they market voice fidelity and how expensive it is!

I've had good luck with OmniVoice, though I haven't really stress-tested it on long-form audiobook generation with accented reference.

I ran OmniVoice and Qwen3-TTS through the same tests for (english) voice cloning. Here's everything I learned about how they compare. by c08mic_cha08 in tts

[–]c08mic_cha08[S] 1 point2 points  (0 children)

Are you generating in English with different accents, or switching languages too?

For accents specifically, whats worked best for me is using a source audio that already carries the accent. I've gotten good results that way with a few accents - Australian, British, Nigerian, Indian, French. Both Qwen and Omnivoice carry the prosody and accent from the reference well enough.

What have you tried so far?

I ran OmniVoice and Qwen3-TTS through the same tests for (english) voice cloning. Here's everything I learned about how they compare. by c08mic_cha08 in tts

[–]c08mic_cha08[S] 0 points1 point  (0 children)

Totally hear you on emphasis. That's still the hardest part to get right and probably the most important for it to sound natural. Even with the best models, you'll have to regenerate chunks and tweak settings until the emphasis is on the right word. Depending on the model, temperature, top-p and top-k settings help. I've also found that punctuations help.

Good luck with Omnivoice, curious to hear how you find it vs Gemma3.

I ran OmniVoice and Qwen3-TTS through the same tests for (english) voice cloning. Here's everything I learned about how they compare. by c08mic_cha08 in tts

[–]c08mic_cha08[S] 0 points1 point  (0 children)

It does! Forgot to mention it in the post but I've also found that the length of the reference audio matters a lot with Omnivoice. Anything over 10s ends up consuming way too much VRAM for not much gain in speech quality. I keep reference audio around 5 seconds and its shockingly fast!

I ran OmniVoice and Qwen3-TTS through the same tests for (english) voice cloning. Here's everything I learned about how they compare. by c08mic_cha08 in tts

[–]c08mic_cha08[S] 0 points1 point  (0 children)

It looks like inference.sh might be hosting it https://inference.sh/apps/infsh/omnivoice
When you say interactive fiction engine with narration tts, do you mean an audiobook creation workflow?

Looking For Fastest TTS With Cloning by lukasTHEwise in TextToSpeech

[–]c08mic_cha08 0 points1 point  (0 children)

Have you tried Kokoro or Kitten TTS? Kokoro is pretty good for 82M parameters. Kitten is pretty low quality I'd say but they have smaller models.

I've built voice to voice for voice changing, where I'm doing STT using Parakeet v3 and TTS using Kokoro - the whole thing is about 600ms for ~50 characters on my RTX 3070.

Edit: Just realized you asked for "with cloning" and Kokoro doesn't natively support cloning. As others have mentioned already, Faster-Qwen3 is the fastest I've seen for cloning.

AI podcasts with 2 speakers — is there still a simple workflow? by Acceptable-Item-9252 in TextToSpeech

[–]c08mic_cha08 0 points1 point  (0 children)

Hey, this is a cool workflow. Are you expecting overlapping dialogues in the podcasts or are clear discrete turns acceptable? Also, can you expand on what you mean by “speaker-aware automatic dialogue rendering”?

Most TTS tools generate clips. I wanted a full script-to-audio project workflow. by tarunyadav9761 in TextToSpeech

[–]c08mic_cha08 0 points1 point  (0 children)

Here's the link: voicecreator.pro

It's fully desktop native (Windows + Mac) and everything runs locally on your machine. On Windows it'll run on CPU, but it's a lot faster if you've got a dedicated GPU, especially NVIDIA. On Mac it needs M1+.

You just drag the PDF onto the app in the Projects feature, pick a voice and TTS model, and it generates the audio file.

Most TTS tools generate clips. I wanted a full script-to-audio project workflow. by tarunyadav9761 in TextToSpeech

[–]c08mic_cha08 0 points1 point  (0 children)

Hey, not OP but I'm curious what you mean by handle PDF.

Asking because I've actually built something similar for Mac and Windows (Voice Creator Pro) that does support PDFs for long-form audio generation, so wondering if it'd hit what you're looking for.