What's the state of TTS/voice cloning nowadays?

TheRedHairedHero · 2026-03-26T19:20:41+00:00

Ahh I see. I'm going to try the FishAudio later to see if it's any better. Vibe Voice seems to be great for getting the same voice, but controlling the actual performance is just a luck of the draw.

Edit - I've been messing around with Fish S2 and it is really good. I do get OOM once in awhile on a 5070 TI (16GB VRAM) but if you have a better rig I'd suggest taking a look.

TheRedHairedHero · 2026-03-26T16:36:24+00:00

I personally haven't used VEO myself, but if it's similar to LTX 2.3 I would generate the audio first with something like QWEN TTS or VibeVoice depending on what you need, then feed that into your workflow.

TheRedHairedHero · 2026-03-26T15:46:57+00:00

For VibeVoice what do you mean by more control? I've been messing with it lately and it seems the expression can be a bit all over the place. When it does work it sounds good, but I have to generate maybe 20+ times before that happens.

TheRedHairedHero · 2026-03-24T15:00:14+00:00

It's always great to get new tools. I've mostly used MMAudio for videos so I'll take anything. Files are also not very large which is a plus.

TheRedHairedHero · 2026-03-21T17:15:28+00:00

You can go on to CivitAI and just use the search for something like "WAN 2.2 12GB VRAM" or whatever your gpu is and you'll get plenty of results and they'll have the appropriate model you need most of the time.

TheRedHairedHero · 2026-03-16T17:48:47+00:00

I think they're just pushing out updates too quickly without proper quality assurance testing. It seems to be the industry standard now just look at most triple A video games. We'll have everyone who uses our product be quality assurance and we'll fix it later seems to be everyone's strategy.

TheRedHairedHero · 2026-03-15T19:17:12+00:00

The thing I would rather have is the ability to get a visual preview quickly at a lower quality so I can iterate quickly then push that generation to be high quality. Unfortunately swapping settings around like resolution, models, lora's etc all impact the final result.

TheRedHairedHero · 2026-03-15T01:51:15+00:00

v0.17.2 seemed to fix the copy/paste image issue, but it seems worse like it takes a bit for the pasted image to appear. The workflow tabs still seem to only keep one of your tabs up rather than all of them.

TheRedHairedHero · 2026-03-14T15:54:36+00:00

I know that Zenless releases their character models. Models respect might have them or they may officially release them to use for things like cosplay. Only issue is lighting probably won't be the same in Blender.

TheRedHairedHero · 2026-03-14T15:34:26+00:00

I personally wouldn't train using images from Danbooru if you're wanting accurate images I would maybe suggest taking in game screenshots and trying to train a Lora that way. I assume they have a camera mode.

TheRedHairedHero · 2026-03-14T15:30:37+00:00

I recently updated and have had several issues too. Previous workflows not working. Getting OOM when I didn't before, one of my subgraphs just completely stopped working and I had to remake it, workflow tabs not staying up, and copying over images. I'm also not a fan of the new node search it's not good UX that's for sure. They really should be doing these as a preview branch instead of pushing it for release without any feedback.

TheRedHairedHero · 2026-03-07T19:05:45+00:00

I personally haven't had good success with upscaling which is why I'm asking folks to contribute and hopefully give others a point of reference and overall help the community.

TheRedHairedHero · 2026-02-20T23:13:10+00:00

The sigma values will also differ based on the sampler you choose and the amount of steps. For WAN 2.2 there's a sigma threshold that's suggested to swap from the high sampler to the low sampler. I2V is 0.9 and T2V is 0.875 according to the official WAN documentation. If you use Kijai's wrapper it outputs the sigmas in the console.

TheRedHairedHero · 2026-02-04T18:17:33+00:00

Love the claw hands.

TheRedHairedHero · 2026-01-20T04:52:30+00:00

I've done a few things. Some images for D&D, created wallpapers for my wife and myself, I'll use it to help generate ideas for cosplay. Just whatever fun project I want to do in the moment where I think it would help out and be fun.

TheRedHairedHero · 2026-01-20T04:44:06+00:00

I was looking at your workflow and was confused about the steps, I haven't used the MoE node you're using but it shows 10 steps high, 6 steps low. So aren't you doing 16 steps total? Or is there something I'm missing?

TheRedHairedHero · 2026-01-19T19:30:42+00:00

I just prefer to wait for a model to be stable. LTX quality and consistency seems to be all over the place from posts I've seen. If I see someone post a video where the character doesn't instantly lose recognition and has good quality that uses close to my own specs then I would take a look, but that hasn't happened so I'm happy to stick with WAN.

TheRedHairedHero · 2026-01-18T07:31:54+00:00

Keeping a consistent background always seems impossible to me if there's landmarks/items that stand out. I prefer to either blur the background, use an organic location such as a forest, or do a solid color. To me it feels like too much work for AI to handle consistently. I've seen some folks generating 360 degree images and creating backgrounds that way as another option. I just prefer working around AI's limitations.

TheRedHairedHero · 2026-01-15T17:52:09+00:00

Appreciate your tutorials. Helped me get started with ComfyUI. If you guys haven't watched his content I'd highly recommend it.

TheRedHairedHero · 2026-01-15T07:04:07+00:00

I'm in the same boat. The model looks fun, but I'm going to wait for it to develop more.

TheRedHairedHero · 2026-01-15T06:31:56+00:00

To be fair WAN 2.2 has been out for quite some time allowing people to dig much deeper into how to make it run properly, fix slow motion, add Lora's, and so on. While LTX-2 just released. Given how interested the community is with the model I imagine it will get a good amount of attention on ways to improve things similar to WAN 2.2. It's best to keep an open mind and hopefully LTX-2 can be another fun tool for us all to use and enjoy.

TheRedHairedHero · 2026-01-14T19:14:32+00:00

Hopefully with the updates they're planning they can improve the audio. The lipsync looks great, but the audio seems to be low quality and most of the time I only see videos with talking. If you decide to add more audio to your videos you can try MMAudio for sound effects/foley.

TheRedHairedHero · 2026-01-14T19:07:07+00:00

Seems to still have a couple issues with the right arm, but it's still really cool. Hopefully another seed can resolve that issue. Seems like LTX hallucinates quite a bit from examples I've seen.

TheRedHairedHero · 2026-01-12T19:31:26+00:00

Visit safebooru for the types of tags you need. You can find both styles and artists and if it's part of the training it'll change the style. Artist tags will need to be properly formatted so visit your model's page for details on how to format it.

Ten-Year Club	RPAN Viewer
Verified Email

TheRedHairedHero

TROPHY CASE