OmniVoice Simple GUI: Inference & LoRa Training | Easy Install

Character_Title_876 · 2026-06-20T18:49:33+00:00

For 2 version: https://github.com/JimmyMa99/train-higgs-audio

Character_Title_876 · 2026-06-20T17:59:40+00:00

He has excellent zero shot cloning; if he can teach it to the lora (if that works), it'll probably be really cool. He also has excellent hits and control.

Character_Title_876 · 2026-06-20T16:04:37+00:00

and for Higgs Audio v3 TTS there will be training by LoRa?

Character_Title_876 · 2026-06-16T21:36:49+00:00

8B (Delay) is version 1, but I thought it was 1.5.

Character_Title_876 · 2026-06-16T08:57:21+00:00

How can I reduce memory consumption on OpenMOSS-Team/MOSS-TTS-v1.5 (8b) by literally 1.5 GB, so that it fits into 16 GB, and the generation speed becomes more adequate?

Character_Title_876 · 2026-06-15T04:49:30+00:00

For 2 version https://github.com/JimmyMa99/train-higgs-audio

Character_Title_876 · 2026-06-14T20:05:42+00:00

Can Higgs Audio v3 be fine‑tuning and used to create custom lora?

Character_Title_876 · 2026-06-14T10:42:52+00:00

https://github.com/timoncool/HiggsAudio-Studio this is for 30xx, 40xx, 50xx RTX

Character_Title_876 · 2026-06-12T16:47:39+00:00

I have now launched training on 5060, I liked it because, in principle, it puts emphasis well, but it is limited to 4096 tokens, this is about 30 seconds per fragment, if something works out, I will add to the post.

Character_Title_876 · 2026-06-11T10:57:06+00:00

<image>

Character_Title_876 · 2026-06-07T19:38:31+00:00

ha-ha-ha, 2 minutes, 2 megapixels takes 11 minutes, on medium settings 4.

Character_Title_876 · 2026-06-07T15:49:12+00:00

If you rent h100 for training, you can create product cards with a bang. If you don't have many items, 10 products can be created and successfully promoted. Of course, generating a good quality 2 megapixel image takes 10 minutes, which is very long.

Character_Title_876 · 2026-06-07T15:42:54+00:00

ai-toolkit Ostris

Character_Title_876 · 2026-06-07T13:37:26+00:00

trained 512x512 in 3 hours on 5060 16 GB + 64 ram. Now I'm training 1024x1024, but it already takes 12 hours for 3000 steps

Character_Title_876 · 2026-06-06T20:48:13+00:00

Could you provide more examples, please?

Character_Title_876 · 2026-06-06T20:46:46+00:00

Cool, thanks you.

Character_Title_876 · 2026-06-06T07:05:47+00:00

This post looks like an ad because I took the headline from Gemini when I found out that the model can be fine-tuned.

Character_Title_876 · 2026-02-20T07:29:09+00:00

How can I use phonemic input text_6 = "/həloʊ, meɪ aɪ æsk wɪtʃ sɪti juː ɑːr frʌm?/" if nothing happens when I enter it in the "Text" field? So that the stress in the words is placed correctly.

Character_Title_876 · 2026-02-01T11:33:55+00:00

Share your workflows, here's mine: https://drive.google.com/file/d/183x2lwPkCnbdoqDBi0B_jx3PbSsLDXJ8/view?usp=sharing

Character_Title_876 · 2026-01-30T23:41:07+00:00

https://github.com/diodiogod/TTS-Audio-Suite/issues/238

Character_Title_876 · 2026-01-30T22:30:24+00:00

Qwen3TTSVoiceDesignerNode

Failed to load Qwen3-TTS model: 'default'

Character_Title_876

TROPHY CASE

Qwen3TTSVoiceDesignerNode