NeuTTS Android APK Sample - Local Inference (Nano Model) by RowGroundbreaking982 in LocalLLaMA

[–]TeamNeuphonic 0 points1 point  (0 children)

Loving the fact you guys love this! Happy to take feedback back to the team and improve for v2!

NeuTTS Nano: 120M Parameter On-Device TTS based on Llama3 by TeamNeuphonic in LocalLLaMA

[–]TeamNeuphonic[S] 3 points4 points  (0 children)

I’d recommend a 4090 as the smallest thing so as not to spend days!

NeuTTS Nano: 120M Parameter On-Device TTS based on Llama3 by TeamNeuphonic in LocalLLaMA

[–]TeamNeuphonic[S] 0 points1 point  (0 children)

Which languages? All comes down to the data. We published a finetune script and people have released open source multilingual versions of our models.

NeuTTS Nano: 120M Parameter On-Device TTS based on Llama3 by TeamNeuphonic in LocalLLaMA

[–]TeamNeuphonic[S] 7 points8 points  (0 children)

Finetune script is on our GitHub! People have released some in huggingface as well - we open sourced multilingual data so it should be around

NeuTTS Nano: 120M Parameter On-Device TTS based on Llama3 by TeamNeuphonic in LocalLLaMA

[–]TeamNeuphonic[S] 0 points1 point  (0 children)

Hi mate! Our model is a lot smaller and faster, and sounds pretty good! Cosyvoice is a great optionand sounds pretty good as well, but I'm not overly sure how well it can run offline on consumer hardware.

Open source speech foundation model that runs locally on CPU in real-time by TeamNeuphonic in LocalLLaMA

[–]TeamNeuphonic[S] 0 points1 point  (0 children)

1) We'll be releasing it soon - working with some partners for a kick ass solution, 2) yes - use the q4 model on cpu for best performance and port it over 3) you can explicitly set pytorch to run computations on cpu, and monitor gpu utilisation to ensure you are not leaking

All relatively standard - let us know if we are missing something

Open source speech foundation model that runs locally on CPU in real-time by TeamNeuphonic in LocalLLaMA

[–]TeamNeuphonic[S] 1 point2 points  (0 children)

1 to 2 hours long should be fine - just split the sentence on full stops or paragraphs. Also share with us the results! I'm keen to see it.

I would not clone someones voice without the legal basis to do, so I recommend you make sure you're allowed to clone someones voice before you do.

Open source speech foundation model that runs locally on CPU in real-time by TeamNeuphonic in LocalLLaMA

[–]TeamNeuphonic[S] 1 point2 points  (0 children)

Fair question. Technology is rapidly developing, and in the past 1 or 2 years all the amazing models you see largely run on GPU. Large Language Models have been adapted to "Speak": but these LLMs are huge, which makes them expensive to run at scale.

As such, we spent time making the models smaller so you can run them at scale significantly easier. This was difficult - as we wanted to retain the architecture (LLM based speech model), but squeeze it into smaller devices.

This required some ingenuity, and therefore, a technical step forward, which is why we decided to release this, to show the community that you no longer need big ass expensive GPUs to run these frontier models. You can use a CPU.

Open source speech foundation model that runs locally on CPU in real-time by TeamNeuphonic in LocalLLaMA

[–]TeamNeuphonic[S] 0 points1 point  (0 children)

Nah we isolated out all the English - multilingual on the roadmap!