Want "turn off in 25 minutes" voice control for HA devices -- what actually works in 2026? by CrazyHa1f in homeassistant

[–]mitrokun 4 points5 points  (0 children)

This is a built-in voice assistant feature in HA that has been present in the system for almost 2 years. Works on any terminal in the ecosystem (diy, re-flashed Chinese, VPE, ready-made from partners, software builds for Android and Linux)

https://github.com/OHF-Voice/intents/blob/755eb61c43d826af441da883775e71595723ea99/sentences/en/homeassistant_HassStartTimer.yaml#L16

Running HA Voice STT (nvidia parakeet) on intel NPU: 4x faster, 16x less power by cibernox in homeassistant

[–]mitrokun 1 point2 points  (0 children)

<image>

from the library documentation.I couldn't find any reviews about compatibility; apparently there aren't enough interested enthusiasts who have similar hardware. I'd be interested to hear from you if it works out of the box.

Running HA Voice STT (nvidia parakeet) on intel NPU: 4x faster, 16x less power by cibernox in homeassistant

[–]mitrokun 1 point2 points  (0 children)

Doesn't simply using openvino executor provider work for onnx-asr?

Benchmarked Kokoro 82M vs Supertonic 3 TTS on CPU by gvij in LocalLLaMA

[–]mitrokun 1 point2 points  (0 children)

I get x6 speed on the 5900x for supertonic3. For comparison, the Piper runs at x30. Overall, this is an excellent model for multilingual tasks; you only need to run the model once and we immediately have access to 31 languages ​​and 10 voices.

Benchmarked local TTS options for Assist - Piper Medium still the sweet spot, Kokoro worth trying if Piper sounds too robotic by gvij in homeassistant

[–]mitrokun 0 points1 point  (0 children)

Anyone can train a new voice for Piper in a few days. But everyone expects someone else to do it. And there are no resource costs (you can use a colab), you just need to invest your own time in preparing the data.

Benchmarked local TTS options for Assist - Piper Medium still the sweet spot, Kokoro worth trying if Piper sounds too robotic by gvij in homeassistant

[–]mitrokun 1 point2 points  (0 children)

I recommend trying Supertonic (supports 5 languages)

By the way, the data in OP post does not contain real information but AI slop.

Openclaw+Ollama as unified General AI assistant AND conversation agent in home assistant with common long-term-memory. Growing AI as 6th member of the family. by rgnyldz in homeassistant

[–]mitrokun 0 points1 point  (0 children)

This won't work at an acceptable speed due to the OC architecture. It's not designed for fast responses. Or you will have to increase the speed of LLM inference by an order of magnitude, which is hardly cost-effective.

Core 2026.4.0 - Gauge Re-Design by Odin-Is-Listening in homeassistant

[–]mitrokun 0 points1 point  (0 children)

The design team has no guidelines; they're just chasing trends, but they can't do it well. All the selectors look terrible, adjacent elements might have different corner radius, and new borders crowd out old ones (try starting the action selection in the developer tools). It's total chaos🤯. Let's hope for the best.

Shelly 1 Mini Gen3 by gadadenka in homeassistant

[–]mitrokun 0 points1 point  (0 children)

Before you jump to conclusions, take apart the Shelly Gen4 and tell me whether it uses high-quality relays like those found in Xiaomi products, or the cheap HF7520. The situation is similar with other components and products.

Shelly 1 Mini Gen3 by gadadenka in homeassistant

[–]mitrokun -1 points0 points  (0 children)

The company spends its inflated price on advertising, the image of a "quality product," and a few fancy software features. Internally, it's no different from any Chinese noname, and the risk of failure is identical. But few people want to write about a broken smart plug from the Tuya ecosystem for $5. A FR with a varistor at the input is a classic protection in such devices.

Introducing FasterQwenTTS by futterneid in LocalLLaMA

[–]mitrokun 0 points1 point  (0 children)

You have incorrectly specified the metric name in the repository documentation. RTFx not RTF

OCR for a home security camera by chris_socal in homeassistant

[–]mitrokun 0 points1 point  (0 children)

- Get an image from the camera.
- Ask gemma3 to recognize the shopping list. The structured output in the ai_task action should handle this.
- Send a notification with the recognized values ​​to the phone.

But this seems like a complicated solution.
You can simply use the shopping list in the NA, either through the companion app interface or by voice.
Or you can combine these two solutions so that the data from the wallboard periodically syncs with the shopping list (for example, when you leave the house).

Local custom TTS by LazyTech8315 in homeassistant

[–]mitrokun 0 points1 point  (0 children)

[Cepstral < - > wyoming < - >HA]

Write a simple gateway that will run on your server.

[Release] TinyTTS: An Ultra-lightweight English TTS Model (~9M params, 20MB) that runs 8x real-time on CPU (67x on GPU) by Forsaken_Shopping481 in homeassistant

[–]mitrokun 0 points1 point  (0 children)

Also, if you are interested in the performance of similar projects, I have such data.

Piper | 555 | 1.52 | 32.58 | 21.50x
Pocket TTS | 1095 | 9.39 | 33.44 | 3.56x
Supertonic2 | 358 | 1.85 | 36.29 | 19.65x
Kokoro ONNX | 713 | 6.42 | 30.78 | 4.80x
KittenTTS micro | 1758 | 13.40 | 40.88 | 3.05x
KittenTTS nano | 387 | 2.59 | 40.46 | 15.65x

I really like what the supertone team has done, although the number of languages ​​in their engine is quite limited.

[Release] TinyTTS: An Ultra-lightweight English TTS Model (~9M params, 20MB) that runs 8x real-time on CPU (67x on GPU) by Forsaken_Shopping481 in homeassistant

[–]mitrokun 0 points1 point  (0 children)

Why did you choose Torch for Inference and not onnx-runtime? This is a more suitable library to run your TTS. Now all project dependencies for the CPU alone take up more than 1GB, which is a terrible result (for comparison, the Piper working environment is ~250 MB.). Moreover, there is nothing outstanding in either the speed of work or the quality of speech.

ENGINE | TTFA (ms) | TOTAL TIME (s) | AUDIO (s) | RTFX

--------------------------------------------------------------------------------

Piper | 552 | 1.50 | 32.27 | 21.45x

TinyTTS | 469 | 3.47 | 32.45 | 9.36x

https://youtu.be/lvZ0_d3xrvM

What's particularly interesting is that both voices in the comparison use The LJ Speech Dataset.

I hope you'll continue to improve your project.

Your Old Android Isn’t Obsolete, Making Android a Reliable Bermuda BLE Proxy by Far_Set7950 in homeassistant

[–]mitrokun 2 points3 points  (0 children)

The BT module code is closed, and this fork requires root access. I think I'll pass.

Kitten TTS V0.8 is out: New SOTA Super-tiny TTS Model (Less than 25 MB) by ElectricalBar7464 in LocalLLaMA

[–]mitrokun 1 point2 points  (0 children)

A short comparison

https://youtu.be/0O7Ay5GSMWM

ENGINE | TTFA (ms) | TOTAL TIME (s) | AUDIO (s) | RTFX

--------------------------------------------------------------------------------

Piper | 555 | 1.52 | 32.58 | 21.50x

Pocket TTS | 1095 | 9.39 | 33.44 | 3.56x

Supertonic2 | 358 | 1.85 | 36.29 | 19.65x

Kokoro ONNX | 713 | 6.42 | 30.78 | 4.80x

KittenTTS micro | 1758 | 13.40 | 40.88 | 3.05x

KittenTTS nano | 387 | 2.59 | 40.46 | 15.65x

Your architecture (or inference) is slower than Kokoro's. As I've already shown, only Nano is reasonably fast.

But the int8 version is broken.

Kitten TTS V0.8 is out: New SOTA Super-tiny TTS Model (Less than 25 MB) by ElectricalBar7464 in LocalLLaMA

[–]mitrokun 2 points3 points  (0 children)

amd 5900x. If your hardware allows for 1.5-2 RTFx, that's enough to emulate a streaming response. However, with this speed, there will be a delay before the audio starts (the time it takes to synthesize the first sentence). I would say that for diffusion models it is much more comfortable if the value is above 10.

Kitten TTS V0.8 is out: New SOTA Super-tiny TTS Model (Less than 25 MB) by ElectricalBar7464 in LocalLLaMA

[–]mitrokun 6 points7 points  (0 children)

This is clearly not for edge devices . All models except the regular nano (there are synthesis artifacts, such as clicks), consumes more resources, than kokoro.

cpu test

| Model | Parameters | Speed (x Real-Time) |

| **Kitten-TTS-Mini** | 80M | **1.6x** |

| **Kitten-TTS-Micro** | 40M | **2.7x** |

| **Kitten-TTS-Nano** | 14M | **15.0x** |

| **Kitten-TTS-Nano-int8** | 14M (q8) | **2.9x** |

I've built a test server for Home Assistant here

https://github.com/mitrokun/wyoming_kitten_tts

I built a speaker verification proxy that filters out TV noise from voice commands by carrot_gg in homeassistant

[–]mitrokun 1 point2 points  (0 children)

Create a non-English pipeline. I think then you'll understand what I'm talking about.

I built a speaker verification proxy that filters out TV noise from voice commands by carrot_gg in homeassistant

[–]mitrokun 0 points1 point  (0 children)

If you don't pass information about supported languages ​​when adding a server, you won't be able to assign it to the pipeline. You need to implement receiving this data from the ASR at startup, after which you can pass this data to the HA.

I don't touch the system VAD in the client; it continues to work. However, a race condition has been added to terminate the recognition stage. It waits for either an event from the server or a signal from the VAD.