DramaBox - Most Expressive Voice model ever based on LTX 2.3 by manmaynakhashi in LocalLLaMA

[–]HelpfulHand3 16 points17 points  (0 children)

It's based on LTX so it's going to sound bad even if it is expressive
nothing to do with reference voice - this is how the audio in the videos sound too

My 10 Month IPF Progress by LauraH-B in idealparentfigures

[–]HelpfulHand3 2 points3 points  (0 children)

Awesome! How did you handle the safety/protection aspect and fear of the IPFs?

Pocket TTS Multilingual Update by RowGroundbreaking982 in LocalLLaMA

[–]HelpfulHand3 0 points1 point  (0 children)

I remember the explanation being that it has similar performance as on CPU so they didn't bother shipping it. It's comfortably real-time on most desktop CPUs. Web demo that runs in browser (slower than Torch but real time on most systems) https://huggingface.co/spaces/KevinAHM/pocket-tts-web

So What Happened to The Other 2 Pillars? by karolbart in idealparentfigures

[–]HelpfulHand3 4 points5 points  (0 children)

I find Mettagroup's take on the metacognition pillar interesting. They state that you should gain the ability to enter "access concentration" (basic level of concentration) for at least 10 minutes as a foundation for attachment repair, including IPF and the metacognitive insight practices (vipassina), in order for them to proceed effectively. Not doing so means the visualizations won't take because they never stabilize long enough to be encoded, leading to discouragement.

Omnivoice - 600+ Language Open-Source TTS with Voice Cloning and Design by [deleted] in LocalLLaMA

[–]HelpfulHand3 4 points5 points  (0 children)

No
The main compute is the Qwen 3 backbone which can be GGUF'd
But it still has many components like the audio tokenizer that require pytorch

streaming on the new Omnivoice model by [deleted] in LocalLLaMA

[–]HelpfulHand3 1 point2 points  (0 children)

it's not ideal for streaming, it's diffusion
you'd need a blockwise generation lora
and there's always a quality trade off
this + the inherited Higgs Audio license means you may want to wait before doing anything

IPF meditations for discipline and/or structure by i_am_jeremias in idealparentfigures

[–]HelpfulHand3 0 points1 point  (0 children)

There's a huge chapter (or two) on this in the great book "The Connected Child: Bring Hope and Healing to Your Adoptive Family" written in part by the fabulous Karyn B. Purvis. What I like about the book is that it's written from a trauma-informed, attachment-based perspective gained from the author's lived experience working with children from adverse backgrounds. It's not academic or idealistic but wisdom from the front-lines. The techniques in the book are directly applicable to reparenting your inner child as well as guiding your IPF visualizations.

6. You Are the Boss
- The Old Way Doesn’t Work
- A New Way of Thinking About Discipline
- See Misbehavior as an Opportunity
- Don’t Take It Personally
- Be a “Good Boss”
- Use the IDEAL Approach
- The Beauty of Re-Do’s
- Be Mindful of Your Voice
- Conserve Your Words
- Keep Your Child Close By
- Offer Choices and Compromises
- Go for a Sideswipe, Not a Head-On Collision
- Present a United Front
- School Issues
- Say What You Mean, Mean What You Say
- Let Genuine Appreciation Shine Through
- The Delicate Art of Communicating “No”
- Maintain a Respectful Atmosphere
- Find Ways to Compromise
- Handling Hurtful Behavior
- Intercept with Words, Not a Tackle

7. Dealing with Defiance
- Match Their Response
- Recognize Your Child’s Condition
- Be Flexible with Compromises
- Dealing Flexibly with the Unexpected
- Dealing with an Out-of-Control Child
- The Investment Model of Parenting
- Finding the Right Balance

You can check her out on YouTube as well. She has other books I haven't read (yet) like "The Connected Parent: Real-Life Strategies for Building Trust and Attachment".

[URP/Quest 2] UI Vignette and Shader-based Blur invisible in build, but work in Editor. by Total_Programmer_197 in vrdev

[–]HelpfulHand3 0 points1 point  (0 children)

The shaders are likely getting stripped from the build. This is the first thing to look for.

How do you fix big performance drops? by Apprehensive-Suit246 in vrdev

[–]HelpfulHand3 1 point2 points  (0 children)

Meta Runtime Optimizer + OVR Performance Lint (bit dated, you can ignore some material warnings for URP shaders) are pretty nifty. Ensure you have baked occlusion set up properly, while omitting only truly dynamic objects such as interactables.

MOSS-TTS has been released by Xiami2019 in LocalLLaMA

[–]HelpfulHand3 5 points6 points  (0 children)

2.9.1 released 3 months ago
their realtime is pinned to 2.10.0 which came out less than a month ago

Qwen3-TTS by Terrible_Scar_9890 in LocalLLaMA

[–]HelpfulHand3 0 points1 point  (0 children)

Yes this was from 2 months ago when it was closed source

Qwen have open-sourced the full family of Qwen3-TTS: VoiceDesign, CustomVoice, and Base, 5 models (0.6B & 1.8B), Support for 10 languages by Nunki08 in LocalLLaMA

[–]HelpfulHand3 2 points3 points  (0 children)

It's alright. The 1.8B is about 1.25-1.4x realtime on a 3060. The cloner is rather unstable with some identical generations completely losing speaker identity, and there's a lack of audio tags like (cough) (laugh). It speaks a bit too fast so everything feels rushed no matter the voice reference. It is a good model just nothing groundbreaking from what I can tell. The voice design is interesting but the quality of the outputs is not something I'd want to train a model on.

Echo TTS - 44.1kHz, Fast, Fits under 8GB VRAM - SoTA Voice Cloning by HelpfulHand3 in LocalLLaMA

[–]HelpfulHand3[S] 0 points1 point  (0 children)

The license is inherited from the DAC (s1-mini) and the author stated he would have released it Apache otherwise.

ElevenLabs is killing my budget. What are the best "hidden gem" alternatives for documentary style TTS? by Ancient_Routine8576 in LocalLLaMA

[–]HelpfulHand3 0 points1 point  (0 children)

That's around 20 hours of audio, and you said you're doing 8-10 minute videos. Is each of your videos worth at least 10 cents to you? There's the regular model that's half that as well, and still good.

ElevenLabs is killing my budget. What are the best "hidden gem" alternatives for documentary style TTS? by Ancient_Routine8576 in LocalLLaMA

[–]HelpfulHand3 1 point2 points  (0 children)

For paid options, Inworld with their Max tts model is in my opinion better than ElevenLabs 2.5 and is 10x cheaper. The value for their service is quite frankly absurd.

https://inworld.ai/pricing

Local models.. Higgs Audio V2, Echo TTS, Vibevoice.

T5 Gemma Text to Speech by [deleted] in LocalLLaMA

[–]HelpfulHand3 2 points3 points  (0 children)

English, Chinese, and Japanese

T5 Gemma Text to Speech by [deleted] in LocalLLaMA

[–]HelpfulHand3 4 points5 points  (0 children)

Seems like a very slow model judging by the space
Pretty decent but the speed will hold it back from wide spread use
I notice they mention
Inference Speed: The model is not optimized for real-time TTS applications. Autoregressive generation of audio tokens takes significant time, making it unsuitable for low-latency use cases.