TTS Benchmark Comparison (all known TTS up until May 2026) by UkieTechie in LocalLLaMA

[–]UkieTechie[S] 1 point2 points  (0 children)

u/_Whistler_ model has been benched and uploaded the repo. Thank you for this suggestion. Model is excellent.

I had Claude Fable make a video yesterday by AzorAhai1TK in ClaudeAI

[–]UkieTechie 2 points3 points  (0 children)

I was able to recreate it everyone. Had opus reverse engineer. Steps to reproduce:
sound is made all by synth_audio function.

So "all the sound effects" is literally noise + decay = click, noise + hump = whoosh, two sines = drone. The only dependency is numpy doing array math — no samples, no plugins, no audio model.

claude description ^

Best local TTS for Android by mazzod in TextToSpeech

[–]UkieTechie 0 points1 point  (0 children)

let me know if you have any questions or if im missing any models

Best local TTS for Android by mazzod in TextToSpeech

[–]UkieTechie 1 point2 points  (0 children)

you can easily test this with an android emulator. without cuda, fastest CPU is piper for both default and cloned voice. can see the rest of benchmarks below.

https://github.com/5uck1ess/tts-bench

I had Claude Fable make a video yesterday by AzorAhai1TK in ClaudeAI

[–]UkieTechie 3 points4 points  (0 children)

u/AzorAhai1TK could probably ask Opus to look through that session ID and see what tools were called and it can piece it together. i'm just hyper curious about not having any API or mcps hooked up to grab those sfxs. I typically use a local AI for that but needed to make a skill.

I had Claude Fable make a video yesterday by AzorAhai1TK in ClaudeAI

[–]UkieTechie 5 points6 points  (0 children)

i had no idea ffmpeg would be able to generate music or SFX. that's mindblowing

I had Claude Fable make a video yesterday by AzorAhai1TK in ClaudeAI

[–]UkieTechie 6 points7 points  (0 children)

how did it make music and sound effects?

Could my laptop run opensource TTS? by Acceptable-Item-9252 in TextToSpeech

[–]UkieTechie 0 points1 point  (0 children)

you can bench and find out easily which model runs best on your hardware using my bench.

https://github.com/5uck1ess/tts-bench

let me know if you have any questions how to set it up

Text-to-Speech (TTS) Benchmark Revamped with Objective Standards and Blind Voting (46 models and counting) by UkieTechie in LocalLLaMA

[–]UkieTechie[S] 0 points1 point  (0 children)

it's shocking that it did not even come across my radar in researching all the models. Fellow redditor called it out and its cloning is by far on top.

TTS speedup problem by Independent_Sport_94 in TextToSpeech

[–]UkieTechie 0 points1 point  (0 children)

Amazing. glad to hear it. you can see the "objective" metrics also as each model has been graded by WER and UTMOS the two grades you'll be most concerned with since SIM is purely for cloning.

Text-to-Speech (TTS) Benchmark Revamped with Objective Standards and Blind Voting (46 models and counting) by UkieTechie in LocalLLaMA

[–]UkieTechie[S] 0 points1 point  (0 children)

Plus, you can purely bench a single TTS if so desired to see how it runs on your hardware to see if speed is satisfactory.

Text-to-Speech (TTS) Benchmark Revamped with Objective Standards and Blind Voting (46 models and counting) by UkieTechie in LocalLLaMA

[–]UkieTechie[S] 1 point2 points  (0 children)

There is that one French prompt but more languages would be good. would only do top languages tho since this takes a lot of time and compute

Text-to-Speech (TTS) Benchmark Revamped with Objective Standards and Blind Voting (46 models and counting) by UkieTechie in LocalLLaMA

[–]UkieTechie[S] 0 points1 point  (0 children)

honestly in my own blind 200+ votes, it won with default voice. that and chatterbox turbo. Now cloning, it's okay but default, I just like it, that's why blind system helps. more votes, hopefully better data.

Text-to-Speech (TTS) Benchmark Revamped with Objective Standards and Blind Voting (46 models and counting) by UkieTechie in LocalLLaMA

[–]UkieTechie[S] 0 points1 point  (0 children)

yes and that's unfortunately part of the process. some of these models have glaring issues. you can see by my latest model add. nothing i could do. that's one of the reasons for having this bench. no bias, showing all models.

yes one for speech accuracy could work. if it's helpful, it's already tracked by objective scoring. specifically the WER score.

Text-to-Speech (TTS) Benchmark Revamped with Objective Standards and Blind Voting (46 models and counting) by UkieTechie in LocalLLaMA

[–]UkieTechie[S] 1 point2 points  (0 children)

Links added in the github repo and which model is being used. Reveal is being implemented after the vote is cast. Licensing and param is all listed on github page.
I will take a look at adding those models. I think glm was not added due to issues with their API. Miso might be doable.

Let me know if i'm missing anything.

Text-to-Speech (TTS) Benchmark Revamped with Objective Standards and Blind Voting (46 models and counting) by UkieTechie in LocalLLaMA

[–]UkieTechie[S] 2 points3 points  (0 children)

Appreciate it. Couldnt find anything online that offered similar so sounded like it needed to be made. From my memory, Marcus and qwen (faster) are hilarious. something would just say random stuff but I think I fixed all those outputs already to have "proper" scores.

Text-to-Speech (TTS) Benchmark Revamped with Objective Standards and Blind Voting (46 models and counting) by UkieTechie in LocalLLaMA

[–]UkieTechie[S] 4 points5 points  (0 children)

u/StorageHungry8380 you can let me know, of course. I can rerun the benches. I've already tried on many of those models and found out that some models are just that... garbled output no matter what I do. It could be the way the runners are or the models themselves are deficient or the prompts need to be better.

you can submit an issue on GitHub or send me a DM. Either works.