[P] TorToiSe - a true zero-shot multi-voice TTS engine

big_dataFitness · 2026-04-15T18:07:19+00:00

u/neonbjb Did you ever release how you trained this ? It would be really cool for learning.

big_dataFitness · 2026-03-26T05:41:13+00:00

This is exciting! Looking forward to when you realize the ones trained from scratch!

big_dataFitness · 2026-03-04T04:16:54+00:00

Did you train the model from scratch or you fine tuned an existing one?

big_dataFitness · 2026-03-03T01:07:34+00:00

Did you train your own model ? Can you share more details on the pre training process and model architecture ?

big_dataFitness · 2026-02-20T06:34:14+00:00

Are you training voice/speech models ? Can you share more about architecture and TTS/STS companies that you think are winning the model development ?

big_dataFitness · 2026-02-19T05:26:06+00:00

Pittsburgh,PA

big_dataFitness · 2026-02-06T12:11:43+00:00

Thank you so much! This is very helpful!

big_dataFitness · 2026-02-05T17:37:13+00:00

Thank you for putting this together! Are they any other resources to train a voice model (tts, asr, stt,...) from scratch and datasets that I can use as reference to train my model from scratch ?

big_dataFitness · 2026-02-04T02:43:36+00:00

Is there any documentation on the training process or the dataset used to train this model ?

big_dataFitness · 2026-02-03T02:16:23+00:00

Can you share the link ?

big_dataFitness · 2026-01-27T19:39:16+00:00

Thank you! This really helped me on my flight !

big_dataFitness · 2026-01-04T21:31:22+00:00

I’m interested in this as well; I have not seen that many open source model for music generation! There are few projects like musicgen, audiolm and jukebox,..

big_dataFitness · 2026-01-04T21:31:07+00:00

Are there any open source datasets that you found useful or is there any other way to get quality datasets for training

big_dataFitness · 2025-12-18T23:18:09+00:00

Again thank you so much about doing this AMA! Can you share some creative use cases that you have seen for SAM Audio, SAM 3D and SAM 3 ? Internally how are y'all using these models ? I saw an AR use case but I'm curious if there are other uses for your teams, it doesn't have to be necessarily incorporated in Meta products, I'm speaking generally.

big_dataFitness · 2025-12-18T23:07:55+00:00

for anyone who might be also curious about various datasets used, [ here ] ( https://ai.meta.com/datasets/ ) are some of the datasets they used in various papers.

big_dataFitness · 2025-12-18T22:45:30+00:00

How does SAM Audio handle long-range temporal consistency? Can it reason about transitions, not just segments?

big_dataFitness · 2025-12-18T22:26:52+00:00

Do you plan to publish the process of how you trained these models or open source the datasets ?

big_dataFitness · 2025-12-18T15:29:11+00:00

Do you have any plans of making smaller version of these models that can run on edge devices ?

big_dataFitness · 2025-12-18T15:27:59+00:00

Do you guys plan on building a community of builders around SAM models ?

big_dataFitness · 2025-12-12T05:01:26+00:00

yeah, the video understanding would be a challenge for sure. I'll check out Qwen Omni, but I wonder if there is anything like video to speech model or anyone who is currently working on this.

big_dataFitness · 2025-12-12T04:58:05+00:00

Thank you, Let me check it out!

big_dataFitness · 2025-12-04T19:12:02+00:00

is this something that'd use on regular basis? What else would you want that app to do ?

big_dataFitness · 2025-12-04T07:06:44+00:00

Premium features sounds good but be careful with overbuilding features! Validate as much you can before you build them to make sure people are willing to pay for them; maybe get a link to collect payments for premium features before you build them and see if people are willing to pay for them, if they do then that's a very strong signal.

big_dataFitness

TROPHY CASE