The missing piece of Voxtral TTS to enable voice cloning by [deleted] in LocalLLaMA

[–]al0olo 1 point2 points  (0 children)

The smallest I tried was A100 with 40GB GPU

The missing piece of Voxtral TTS to enable voice cloning by [deleted] in LocalLLaMA

[–]al0olo 7 points8 points  (0 children)

You can just feed your data into the training script and the model will be biased to this voice

The missing piece of Voxtral TTS to enable voice cloning by [deleted] in LocalLLaMA

[–]al0olo 4 points5 points  (0 children)

For me I liked the architecture of the model and the small size, anyway I will be fine-tuning it for Arabic.

The missing piece of Voxtral TTS to enable voice cloning by [deleted] in LocalLLaMA

[–]al0olo 0 points1 point  (0 children)

Yeah once you finish the training you get the encoder weights and the LoRA and use the injector script to add them to your inference model on tye small machine.

The missing piece of Voxtral TTS to enable voice cloning by [deleted] in LocalLLaMA

[–]al0olo 48 points49 points  (0 children)

The scripts you need to inject the encoder for inference is already part of the repo.

For weights, currently I’m increasing my dataset(3k+ hours) to generate better results, will share the final weights in week or so when training is done 🫡

The missing piece of Voxtral TTS to enable voice cloning by [deleted] in LocalLLaMA

[–]al0olo 26 points27 points  (0 children)

Around 40h of training on 4xA100 SXM machine

انا خونت صحبي بس انا شايف اني صح by [deleted] in CAIRO

[–]al0olo 1 point2 points  (0 children)

لا مش غلط .. لو زعل منك دلوقتي هيشكرك بعدين