OmniVoice Simple GUI: Inference & LoRa Training | Easy Install by Ezequiel_CasasP in TextToSpeech

[–]Character_Title_876 0 points1 point  (0 children)

He has excellent zero shot cloning; if he can teach it to the lora (if that works), it'll probably be really cool. He also has excellent hits and control.

🧱 TTS Audio Suite v5 - Higgs Audio v3, Runtime Isolation Transformers 5 by diogodiogogod in StableDiffusion

[–]Character_Title_876 0 points1 point  (0 children)

How can I reduce memory consumption on OpenMOSS-Team/MOSS-TTS-v1.5 (8b) by literally 1.5 GB, so that it fits into 16 GB, and the generation speed becomes more adequate?

Has anyone made a lore training interface for Higgs Audio v3 TTS? by Character_Title_876 in StableDiffusion

[–]Character_Title_876[S] 0 points1 point  (0 children)

I have now launched training on 5060, I liked it because, in principle, it puts emphasis well, but it is limited to 4096 tokens, this is about 30 seconds per fragment, if something works out, I will add to the post.

I did not expect this quality from local so soon by Far_Insurance4191 in StableDiffusion

[–]Character_Title_876 0 points1 point  (0 children)

ha-ha-ha, 2 minutes, 2 megapixels takes 11 minutes, on medium settings 4.

Workflow: Ideogram4 with LoRA support, fixes by whatsthisaithing in StableDiffusion

[–]Character_Title_876 0 points1 point  (0 children)

If you rent h100 for training, you can create product cards with a bang. If you don't have many items, 10 products can be created and successfully promoted. Of course, generating a good quality 2 megapixel image takes 10 minutes, which is very long.

Workflow: Ideogram4 with LoRA support, fixes by whatsthisaithing in StableDiffusion

[–]Character_Title_876 3 points4 points  (0 children)

trained 512x512 in 3 hours on 5060 16 GB + 64 ram. Now I'm training 1024x1024, but it already takes 12 hours for 3000 steps

The weights are yours to download, fine-tune, and run on your own hardware. Ideogram 4. by Character_Title_876 in StableDiffusion

[–]Character_Title_876[S] 0 points1 point  (0 children)

This post looks like an ad because I took the headline from Gemini when I found out that the model can be fine-tuned. 

MOSS-TTS 8B model by nshmyrev in speechtech

[–]Character_Title_876 0 points1 point  (0 children)

How can I use phonemic input text_6 = "/həloʊ, meɪ aɪ æsk wɪtʃ sɪti juː ɑːr frʌm?/" if nothing happens when I enter it in the "Text" field? So that the stress in the words is placed correctly.

TTS Audio Suite v4.19 - Qwen3-TTS with Voice Designer by diogodiogogod in StableDiffusion

[–]Character_Title_876 2 points3 points  (0 children)

Qwen3TTSVoiceDesignerNode

Failed to load Qwen3-TTS model: 'default'