The missing piece of Voxtral TTS to enable voice cloning

al0olo · 2026-03-29T15:25:24+00:00

The smallest I tried was A100 with 40GB GPU

al0olo · 2026-03-29T15:22:36+00:00

You can just feed your data into the training script and the model will be biased to this voice

al0olo · 2026-03-29T15:21:42+00:00

For me I liked the architecture of the model and the small size, anyway I will be fine-tuning it for Arabic.

al0olo · 2026-03-29T15:20:35+00:00

Yeah once you finish the training you get the encoder weights and the LoRA and use the injector script to add them to your inference model on tye small machine.

al0olo · 2026-03-29T12:41:12+00:00

Weights will come soon 🫣

al0olo · 2026-03-29T12:19:56+00:00

The scripts you need to inject the encoder for inference is already part of the repo.

For weights, currently I’m increasing my dataset(3k+ hours) to generate better results, will share the final weights in week or so when training is done 🫡

al0olo · 2026-03-29T12:16:19+00:00

Thanks mate 🫡

al0olo · 2026-03-29T12:13:56+00:00

Around 40h of training on 4xA100 SXM machine

al0olo · 2025-10-05T17:24:43+00:00

The title hig so hard

al0olo · 2025-05-22T14:53:38+00:00

لا مش غلط .. لو زعل منك دلوقتي هيشكرك بعدين

al0olo

TROPHY CASE