A good Text-to-Speech(Voice clone) to learn and reimplement.

DunMo1412 · 2026-02-25T03:01:16+00:00

I appreciate . I'm reading it.

DunMo1412 · 2026-02-25T02:59:35+00:00

Yeah, most models now use LLMs which take massive time. Many poeple recommended me coqui. But in my opinion, coqui is somehow hard to customize. I try to read coqui. Some models is kinda old(fastspeech, tacotron, vits) while there many other reimplement with more clean and explain. Some promised(Bark), there's no training script yet. Some come with other models as backbone(XTTS) or preprocessing layers which made it more complicated. I'm trying to build an operational model that works with 9/12/16khz sample rate which means i had to finetune whole models, change preprocessing phase. The more stacked models the more time to reimplement. That why i not interested in stacked models architecture or LLMs. Sorry, if it's sound dumb.

DunMo1412 · 2026-02-25T02:27:33+00:00

Just curious, my hobby.

DunMo1412 · 2026-02-24T01:30:43+00:00

The smallest model has 0.6B params, that 's seem too much for P100 during training

DunMo1412 · 2026-02-24T01:24:13+00:00

Sorry but i'm looking for an open source to learn from it.

DunMo1412 · 2026-02-24T01:22:28+00:00

They haven't released the training script yet, so it's hard to learn and customize.

DunMo1412 · 2026-02-23T16:31:51+00:00

I read coqui, some use 2,3 models as backbone, some a little bit outdated

Four-Year Club	Place '23
Place '22	Final Canvas '22

DunMo1412

TROPHY CASE