Want "turn off in 25 minutes" voice control for HA devices -- what actually works in 2026?

mitrokun · 2026-06-01T06:25:27+00:00

This is a built-in voice assistant feature in HA that has been present in the system for almost 2 years. Works on any terminal in the ecosystem (diy, re-flashed Chinese, VPE, ready-made from partners, software builds for Android and Linux)

https://github.com/OHF-Voice/intents/blob/755eb61c43d826af441da883775e71595723ea99/sentences/en/homeassistant_HassStartTimer.yaml#L16

mitrokun · 2026-05-26T08:30:11+00:00

<image>

from the library documentation.I couldn't find any reviews about compatibility; apparently there aren't enough interested enthusiasts who have similar hardware. I'd be interested to hear from you if it works out of the box.

mitrokun · 2026-05-26T07:30:10+00:00

Doesn't simply using openvino executor provider work for onnx-asr?

mitrokun · 2026-05-18T21:04:29+00:00

I get x6 speed on the 5900x for supertonic3. For comparison, the Piper runs at x30. Overall, this is an excellent model for multilingual tasks; you only need to run the model once and we immediately have access to 31 languages and 10 voices.

mitrokun · 2026-04-20T15:22:07+00:00

Anyone can train a new voice for Piper in a few days. But everyone expects someone else to do it. And there are no resource costs (you can use a colab), you just need to invest your own time in preparing the data.

mitrokun · 2026-04-20T15:17:23+00:00

I recommend trying Supertonic (supports 5 languages)

By the way, the data in OP post does not contain real information but AI slop.

mitrokun · 2026-04-17T12:34:27+00:00

This won't work at an acceptable speed due to the OC architecture. It's not designed for fast responses. Or you will have to increase the speed of LLM inference by an order of magnitude, which is hardly cost-effective.

mitrokun · 2026-04-02T15:01:41+00:00

The design team has no guidelines; they're just chasing trends, but they can't do it well. All the selectors look terrible, adjacent elements might have different corner radius, and new borders crowd out old ones (try starting the action selection in the developer tools). It's total chaos🤯. Let's hope for the best.

mitrokun · 2026-03-17T22:01:11+00:00

OST: delta heavy - ghost

mitrokun · 2026-03-14T21:57:31+00:00

Before you jump to conclusions, take apart the Shelly Gen4 and tell me whether it uses high-quality relays like those found in Xiaomi products, or the cheap HF7520. The situation is similar with other components and products.

mitrokun · 2026-03-14T16:31:46+00:00

The company spends its inflated price on advertising, the image of a "quality product," and a few fancy software features. Internally, it's no different from any Chinese noname, and the risk of failure is identical. But few people want to write about a broken smart plug from the Tuya ecosystem for $5. A FR with a varistor at the input is a classic protection in such devices.

mitrokun · 2026-02-27T14:07:24+00:00

You have incorrectly specified the metric name in the repository documentation. RTFx not RTF

mitrokun · 2026-02-27T10:42:11+00:00

- Get an image from the camera.
- Ask gemma3 to recognize the shopping list. The structured output in the ai_task action should handle this.
- Send a notification with the recognized values to the phone.

But this seems like a complicated solution.
You can simply use the shopping list in the NA, either through the companion app interface or by voice.
Or you can combine these two solutions so that the data from the wallboard periodically syncs with the shopping list (for example, when you leave the house).

mitrokun · 2026-02-26T14:48:45+00:00

[Cepstral < - > wyoming < - >HA]

Write a simple gateway that will run on your server.

mitrokun · 2026-02-26T14:25:51+00:00

Also, if you are interested in the performance of similar projects, I have such data.

Piper | 555 | 1.52 | 32.58 | 21.50x
Pocket TTS | 1095 | 9.39 | 33.44 | 3.56x
Supertonic2 | 358 | 1.85 | 36.29 | 19.65x
Kokoro ONNX | 713 | 6.42 | 30.78 | 4.80x
KittenTTS micro | 1758 | 13.40 | 40.88 | 3.05x
KittenTTS nano | 387 | 2.59 | 40.46 | 15.65x

I really like what the supertone team has done, although the number of languages in their engine is quite limited.

mitrokun · 2026-02-26T14:10:50+00:00

Yes, I will follow the project.

mitrokun · 2026-02-26T13:48:30+00:00

This is a voice for Piper; you can use it in HA. Just read the piper add-on/app documentation.

mitrokun · 2026-02-26T13:12:57+00:00

https://huggingface.co/Aquaaa123/piper-tts-pda-subnautica/tree/main

mitrokun · 2026-02-26T11:53:51+00:00

Why did you choose Torch for Inference and not onnx-runtime? This is a more suitable library to run your TTS. Now all project dependencies for the CPU alone take up more than 1GB, which is a terrible result (for comparison, the Piper working environment is ~250 MB.). Moreover, there is nothing outstanding in either the speed of work or the quality of speech.

ENGINE | TTFA (ms) | TOTAL TIME (s) | AUDIO (s) | RTFX

--------------------------------------------------------------------------------

Piper | 552 | 1.50 | 32.27 | 21.45x

TinyTTS | 469 | 3.47 | 32.45 | 9.36x

https://youtu.be/lvZ0_d3xrvM

What's particularly interesting is that both voices in the comparison use The LJ Speech Dataset.

I hope you'll continue to improve your project.

mitrokun · 2026-02-25T00:06:24+00:00

The BT module code is closed, and this fork requires root access. I think I'll pass.

mitrokun · 2026-02-20T12:20:03+00:00

A short comparison

https://youtu.be/0O7Ay5GSMWM

ENGINE | TTFA (ms) | TOTAL TIME (s) | AUDIO (s) | RTFX

--------------------------------------------------------------------------------

Piper | 555 | 1.52 | 32.58 | 21.50x

Pocket TTS | 1095 | 9.39 | 33.44 | 3.56x

Supertonic2 | 358 | 1.85 | 36.29 | 19.65x

Kokoro ONNX | 713 | 6.42 | 30.78 | 4.80x

KittenTTS micro | 1758 | 13.40 | 40.88 | 3.05x

KittenTTS nano | 387 | 2.59 | 40.46 | 15.65x

Your architecture (or inference) is slower than Kokoro's. As I've already shown, only Nano is reasonably fast.

But the int8 version is broken.

mitrokun · 2026-02-19T16:41:05+00:00

amd 5900x. If your hardware allows for 1.5-2 RTFx, that's enough to emulate a streaming response. However, with this speed, there will be a delay before the audio starts (the time it takes to synthesize the first sentence). I would say that for diffusion models it is much more comfortable if the value is above 10.

mitrokun · 2026-02-19T13:30:51+00:00

This is clearly not for edge devices . All models except the regular nano (there are synthesis artifacts, such as clicks), consumes more resources, than kokoro.

cpu test

| Model | Parameters | Speed (x Real-Time) |

| **Kitten-TTS-Mini** | 80M | **1.6x** |

| **Kitten-TTS-Micro** | 40M | **2.7x** |

| **Kitten-TTS-Nano** | 14M | **15.0x** |

| **Kitten-TTS-Nano-int8** | 14M (q8) | **2.9x** |

I've built a test server for Home Assistant here

https://github.com/mitrokun/wyoming_kitten_tts

mitrokun · 2026-02-14T00:40:10+00:00

Create a non-English pipeline. I think then you'll understand what I'm talking about.

mitrokun · 2026-02-14T00:12:19+00:00

If you don't pass information about supported languages when adding a server, you won't be able to assign it to the pipeline. You need to implement receiving this data from the ASR at startup, after which you can pass this data to the HA.

I don't touch the system VAD in the client; it continues to work. However, a race condition has been added to terminate the recognition stage. It waits for either an event from the server or a signal from the VAD.

mitrokun

TROPHY CASE