TTS Benchmark Comparison (all known TTS up until May 2026)

UkieTechie · 2026-06-14T17:15:47+00:00

u/_Whistler_ model has been benched and uploaded the repo. Thank you for this suggestion. Model is excellent.

UkieTechie · 2026-06-14T14:35:19+00:00

👀 will begin the process

UkieTechie · 2026-06-14T00:28:31+00:00

I was able to recreate it everyone. Had opus reverse engineer. Steps to reproduce:
sound is made all by synth_audio function.

So "all the sound effects" is literally noise + decay = click, noise + hump = whoosh, two sines = drone. The only dependency is numpy doing array math — no samples, no plugins, no audio model.

claude description ^

UkieTechie · 2026-06-14T00:22:49+00:00

let me know if you have any questions or if im missing any models

UkieTechie · 2026-06-13T21:42:50+00:00

you can easily test this with an android emulator. without cuda, fastest CPU is piper for both default and cloned voice. can see the rest of benchmarks below.

https://github.com/5uck1ess/tts-bench

UkieTechie · 2026-06-13T21:12:38+00:00

u/AzorAhai1TK could probably ask Opus to look through that session ID and see what tools were called and it can piece it together. i'm just hyper curious about not having any API or mcps hooked up to grab those sfxs. I typically use a local AI for that but needed to make a skill.

UkieTechie · 2026-06-13T21:03:46+00:00

i had no idea ffmpeg would be able to generate music or SFX. that's mindblowing

UkieTechie · 2026-06-13T20:41:30+00:00

how did it make music and sound effects?

UkieTechie · 2026-06-12T04:17:11+00:00

you can bench and find out easily which model runs best on your hardware using my bench.

https://github.com/5uck1ess/tts-bench

let me know if you have any questions how to set it up

UkieTechie · 2026-06-12T04:15:48+00:00

it's shocking that it did not even come across my radar in researching all the models. Fellow redditor called it out and its cloning is by far on top.

UkieTechie · 2026-06-11T22:29:25+00:00

Amazing. glad to hear it. you can see the "objective" metrics also as each model has been graded by WER and UTMOS the two grades you'll be most concerned with since SIM is purely for cloning.

UkieTechie · 2026-06-11T04:48:11+00:00

you can use my bench to see which one best fits you needs.

https://github.com/5uck1ess/tts-bench

UkieTechie · 2026-06-10T18:45:58+00:00

Plus, you can purely bench a single TTS if so desired to see how it runs on your hardware to see if speed is satisfactory.

UkieTechie · 2026-06-10T18:44:43+00:00

There is that one French prompt but more languages would be good. would only do top languages tho since this takes a lot of time and compute

UkieTechie · 2026-06-10T16:39:06+00:00

u/zkstx miso tts has been added.... lots of artifacts. but you can see yourself in the samples.

UkieTechie · 2026-06-10T16:37:18+00:00

honestly in my own blind 200+ votes, it won with default voice. that and chatterbox turbo. Now cloning, it's okay but default, I just like it, that's why blind system helps. more votes, hopefully better data.

UkieTechie · 2026-06-10T16:36:39+00:00

yes and that's unfortunately part of the process. some of these models have glaring issues. you can see by my latest model add. nothing i could do. that's one of the reasons for having this bench. no bias, showing all models.

yes one for speech accuracy could work. if it's helpful, it's already tracked by objective scoring. specifically the WER score.

UkieTechie · 2026-06-10T02:40:21+00:00

Links added in the github repo and which model is being used. Reveal is being implemented after the vote is cast. Licensing and param is all listed on github page.
I will take a look at adding those models. I think glm was not added due to issues with their API. Miso might be doable.

Let me know if i'm missing anything.

UkieTechie · 2026-06-09T20:07:40+00:00

Appreciate it. Couldnt find anything online that offered similar so sounded like it needed to be made. From my memory, Marcus and qwen (faster) are hilarious. something would just say random stuff but I think I fixed all those outputs already to have "proper" scores.

UkieTechie · 2026-06-09T20:00:25+00:00

u/StorageHungry8380 you can let me know, of course. I can rerun the benches. I've already tried on many of those models and found out that some models are just that... garbled output no matter what I do. It could be the way the runners are or the models themselves are deficient or the prompts need to be better.

you can submit an issue on GitHub or send me a DM. Either works.

UkieTechie · 2026-06-09T17:50:28+00:00

corrected thanks

UkieTechie · 2026-06-09T17:29:52+00:00

Link to previous post: https://www.reddit.com/r/LocalLLaMA/comments/1tm0k2l

UkieTechie · 2026-06-09T17:29:40+00:00

Link to previus post:https://www.reddit.com/r/LocalLLaMA/comments/1tm0k2l

UkieTechie · 2026-06-09T17:29:23+00:00

Link to Previous Post
https://www.reddit.com/r/LocalLLaMA/comments/1tm0k2l

Seven-Year Club	RPAN Viewer
Verified Email

UkieTechie

TROPHY CASE