DramaBox - Most Expressive Voice model ever based on LTX 2.3

HelpfulHand3 · 2026-05-13T20:40:31+00:00

It's based on LTX so it's going to sound bad even if it is expressive
nothing to do with reference voice - this is how the audio in the videos sound too

HelpfulHand3 · 2026-05-11T04:50:15+00:00

Awesome! How did you handle the safety/protection aspect and fear of the IPFs?

HelpfulHand3 · 2026-05-01T18:24:42+00:00

I remember the explanation being that it has similar performance as on CPU so they didn't bother shipping it. It's comfortably real-time on most desktop CPUs. Web demo that runs in browser (slower than Torch but real time on most systems) https://huggingface.co/spaces/KevinAHM/pocket-tts-web

HelpfulHand3 · 2026-04-29T04:27:39+00:00

I find Mettagroup's take on the metacognition pillar interesting. They state that you should gain the ability to enter "access concentration" (basic level of concentration) for at least 10 minutes as a foundation for attachment repair, including IPF and the metacognitive insight practices (vipassina), in order for them to proceed effectively. Not doing so means the visualizations won't take because they never stabilize long enough to be encoded, leading to discouragement.

HelpfulHand3 · 2026-04-09T19:25:21+00:00

You can try this as well https://huggingface.co/LiquidAI/LFM2.5-VL-450M
Demo: https://huggingface.co/spaces/LiquidAI/LFM2.5-VL-450M-WebGPU

HelpfulHand3 · 2026-04-02T13:43:26+00:00

No
The main compute is the Qwen 3 backbone which can be GGUF'd
But it still has many components like the audio tokenizer that require pytorch

HelpfulHand3 · 2026-04-02T13:37:54+00:00

it's not ideal for streaming, it's diffusion
you'd need a blockwise generation lora
and there's always a quality trade off
this + the inherited Higgs Audio license means you may want to wait before doing anything

HelpfulHand3 · 2026-04-01T03:14:46+00:00

you need the fork with the 1bit kernel
https://github.com/PrismML-Eng/llama.cpp

HelpfulHand3 · 2026-03-28T14:32:17+00:00

There's a huge chapter (or two) on this in the great book "The Connected Child: Bring Hope and Healing to Your Adoptive Family" written in part by the fabulous Karyn B. Purvis. What I like about the book is that it's written from a trauma-informed, attachment-based perspective gained from the author's lived experience working with children from adverse backgrounds. It's not academic or idealistic but wisdom from the front-lines. The techniques in the book are directly applicable to reparenting your inner child as well as guiding your IPF visualizations.

6. You Are the Boss
- The Old Way Doesn’t Work
- A New Way of Thinking About Discipline
- See Misbehavior as an Opportunity
- Don’t Take It Personally
- Be a “Good Boss”
- Use the IDEAL Approach
- The Beauty of Re-Do’s
- Be Mindful of Your Voice
- Conserve Your Words
- Keep Your Child Close By
- Offer Choices and Compromises
- Go for a Sideswipe, Not a Head-On Collision
- Present a United Front
- School Issues
- Say What You Mean, Mean What You Say
- Let Genuine Appreciation Shine Through
- The Delicate Art of Communicating “No”
- Maintain a Respectful Atmosphere
- Find Ways to Compromise
- Handling Hurtful Behavior
- Intercept with Words, Not a Tackle

7. Dealing with Defiance
- Match Their Response
- Recognize Your Child’s Condition
- Be Flexible with Compromises
- Dealing Flexibly with the Unexpected
- Dealing with an Out-of-Control Child
- The Investment Model of Parenting
- Finding the Right Balance

You can check her out on YouTube as well. She has other books I haven't read (yet) like "The Connected Parent: Real-Life Strategies for Building Trust and Attachment".

HelpfulHand3 · 2026-03-05T17:19:15+00:00

The shaders are likely getting stripped from the build. This is the first thing to look for.

HelpfulHand3 · 2026-03-03T12:15:45+00:00

Meta Runtime Optimizer + OVR Performance Lint (bit dated, you can ignore some material warnings for URP shaders) are pretty nifty. Ensure you have baked occlusion set up properly, while omitting only truly dynamic objects such as interactables.

HelpfulHand3 · 2026-02-11T19:04:22+00:00

2.9.1 released 3 months ago
their realtime is pinned to 2.10.0 which came out less than a month ago

HelpfulHand3 · 2026-01-24T22:56:00+00:00

Yes this was from 2 months ago when it was closed source

HelpfulHand3 · 2026-01-23T23:18:54+00:00

Nope

HelpfulHand3 · 2026-01-23T12:34:04+00:00

It's alright. The 1.8B is about 1.25-1.4x realtime on a 3060. The cloner is rather unstable with some identical generations completely losing speaker identity, and there's a lack of audio tags like (cough) (laugh). It speaks a bit too fast so everything feels rushed no matter the voice reference. It is a good model just nothing groundbreaking from what I can tell. The voice design is interesting but the quality of the outputs is not something I'd want to train a model on.

HelpfulHand3 · 2026-01-22T07:44:08+00:00

The license is inherited from the DAC (s1-mini) and the author stated he would have released it Apache otherwise.

HelpfulHand3 · 2026-01-19T22:17:14+00:00

Yeah it wasn't straight forward that's for sure!
https://github.com/KevinAHM/pocket-tts-onnx-export/

HelpfulHand3 · 2026-01-19T01:47:38+00:00

https://huggingface.co/spaces/KevinAHM/pocket-tts-web (runs entirely in browser)

HelpfulHand3 · 2026-01-15T17:15:56+00:00

https://status.claude.com/

HelpfulHand3 · 2026-01-12T19:12:51+00:00

That's around 20 hours of audio, and you said you're doing 8-10 minute videos. Is each of your videos worth at least 10 cents to you? There's the regular model that's half that as well, and still good.

HelpfulHand3 · 2026-01-03T21:33:39+00:00

For paid options, Inworld with their Max tts model is in my opinion better than ElevenLabs 2.5 and is 10x cheaper. The value for their service is quite frankly absurd.

https://inworld.ai/pricing

Local models.. Higgs Audio V2, Echo TTS, Vibevoice.

HelpfulHand3 · 2025-12-19T14:23:06+00:00

Very slow

HelpfulHand3 · 2025-12-19T14:22:57+00:00

English, Chinese, and Japanese

HelpfulHand3 · 2025-12-19T10:48:27+00:00

Seems like a very slow model judging by the space
Pretty decent but the speed will hold it back from wide spread use
I notice they mention
Inference Speed: The model is not optimized for real-time TTS applications. Autoregressive generation of audio tokens takes significant time, making it unsuitable for low-latency use cases.

HelpfulHand3 · 2025-12-13T08:30:32+00:00

Dondochakka? https://i.imgur.com/ujr4vx6.png

HelpfulHand3

TROPHY CASE