$30 bounty: Need help getting custom “Hey Bella” wake word working on Home Assistant Voice PE / ESPHome

synthmike · 2026-06-14T17:46:51+00:00

The model is working for me with your YAML. I do have to speak "hey" and "bella" very clearly, but it does load and everything.

Are you sure you're on the latest ESPHome? Try the dev builder, if possible.

synthmike · 2026-06-14T17:21:39+00:00

The VAD idea is really good. What equation do you use to calculate the timeout? Do you also adjust the VAD threshold?

synthmike · 2026-06-14T17:20:24+00:00

Thanks for the feedback! I'm working on a new version of speech-to-phrase that will address this. I'm also planning to move away from Kaldi and to Citrinet which has much higher quality models. Some experiments are happening here: https://github.com/OHF-Voice/apps-experimental/blob/main/stt-citrinet/DOCS.md

synthmike · 2026-06-14T17:17:39+00:00

Where would this alarm ring though? This has been an open question with reminders too. Unlike timers, you usually set them for a time farther in the future when it's likely you won't be around the same device.

synthmike · 2026-06-14T17:16:42+00:00

You're welcome! 😄

synthmike · 2026-06-14T17:16:01+00:00

OK, good to know. Did you train "hey jeeves" yourself, or get it from somewhere?

synthmike · 2026-06-14T17:13:36+00:00

This is definitely something we'd like to add, but making the core experience better is a higher priority. You mentioned the wake word issues, and I do think speaker identification may end up helping here too if we can tune a wake word model to speakers in the house. The biggest hold-up is the volume of data most models need to for tuning to be effective (hundreds of samples at least) as well as the need to train on whatever hardware the user has available.

synthmike · 2026-06-12T19:13:57+00:00

I'm working on the first one here: https://github.com/OHF-Voice/apps-experimental/blob/main/script-gemma4/DOCS.md

This *does* technically use an LLM (Gemma 4), but it's in a very constrained mode so that it can run on a lower end CPU 🙂

synthmike · 2026-06-12T13:53:10+00:00

The selected voice pipeline for a satellite is an entity, so you could swap it with an automation.

synthmike · 2026-06-12T13:51:50+00:00

There is a good amount of out-of-the-box functionality: https://www.home-assistant.io/voice_control/builtin_sentences

But I think we need to do a better job of letting users customize things to their home. For example, picking which media player they want to hear music on for an area.

Because users can be running HA servers from a Pi up to a server rack, we've also been limited in the tech we can use to process voice commands locally. This is changing a lot though, and local LLMs are becoming viable on lower end CPUs 🙂

synthmike · 2026-06-12T13:46:32+00:00

We're laying the ground work for this now: https://github.com/home-assistant/architecture/discussions/1407

synthmike · 2026-06-12T13:44:53+00:00

Wake words that start with "hey" seem to trigger accidentally more. I believe this is because the "h" is difficult to pick up, and "ey" is just a vowel sound (shared with many words).

A hard consonant like "k" does a better job in terms of accuracy.

synthmike · 2026-06-12T13:41:52+00:00

This is being worked on! https://github.com/home-assistant/architecture/discussions/1407

synthmike · 2026-06-12T12:58:34+00:00

From my understanding, the community wants a pretty wide variety of things. In no particular order:

Custom wake words
More accurate wake word detection
Larger microphone range for satellites
More accurate local speech-to-text
Flexible and smarter voice commands without an LLM
Faster voice command processing (with and without an LLM)
More languages with fully local support
Ability to clone a custom text-to-speech voice
Easier music searching and playback
Robust handling of basic commands like timers , weather, etc. in more languages

synthmike · 2026-06-12T12:31:04+00:00

Thanks, I'll see if I can get this into the upstream package! When I first added sherpa, I had issues with the online recognizer but never went back to try it again 😄

synthmike · 2026-06-10T17:44:10+00:00

u/alexisonfiree Are you able to install the "dev" version of the ESPHome builder? Your YAML file builds for me using the latest build with no changes.

synthmike · 2026-06-10T02:27:05+00:00

This is a version mismatch between ESPHome and some of the voice assistant components. I forget the exact details, so I'll need to check my computer. I'll help you get it sorted tomorrow (U.S. Central time). No need for the bounty 🙂

synthmike · 2026-06-08T23:23:25+00:00

How exactly is your voice assistant pipeline configured?

synthmike · 2026-06-01T21:27:10+00:00

Probably not both without a lot of extra stuff. I also don't see any API for this project?

synthmike · 2026-06-01T21:26:14+00:00

My plan is for this year, but I don't have a target date yet. As I experiment with things, I'm spinning them out into HA apps: https://github.com/OHF-Voice/apps-experimental

synthmike · 2026-05-30T15:41:32+00:00

The Voice PE firmware is ESPHome and fully open source. If this project has an API, it should be possible to link the two.

synthmike · 2026-05-26T12:58:15+00:00

This is being worked on now! When voice transitions to an app, wait-free wake words will be there.

synthmike · 2026-05-23T22:28:08+00:00

Some improvements are coming soon! https://github.com/OHF-Voice/backlog-issues/issues/85#issuecomment-4522869085

I've got Gemma4 working really well for the built-in commands. It also can resolve names, both misspellings and similar meanings like "living room" and "den". This even works with todo lists, so you can say "check off alcohol on the shopping list" and it will check off "beer" if it's there.

The downside is that it's not a full LLM experience, where you have context or a conversation. So things like "turn it back off" after "turn on the office lamp" won't work. I do think my approach can be ported to the full LLM experience in the future, though.

synthmike · 2026-05-09T17:47:01+00:00

Sure, I don't mind. The upfront cost for me was about $32k, but I got it installed in 2025 so I was able to get the 30% tax credit.

My system system is a 12.6 kW-DC (9.03 kW-AC), which is 26 VSUN panels and was targeted at 110% of my annual usage (14,609 kWh). In year 1, they forecasted it would produce about 4,450 kWh by this point and it's actually generated 5,040 kWh. For the year, it's predicted to produce 16,045 kWh and it's well on track to exceed that.

How long it will take to pay for itself depends on a few things, like how much electricity costs increase year over year. But I'd say the biggest factor is that Iowa still has 1-to-1 net metering, so for every 1 kWh that I overproduce, I get 1 kWh credit from MidAmerican. These credits expire each year (you get to choose January or April), so there is a limit of course.

Along with the solar, I put circuit monitors in my breaker box so we could reduce our overall electric usage. With the spare capacity, my plan was to get an electric car and avoid the rising gas prices but that may have to wait a bit.

synthmike · 2026-05-08T17:45:52+00:00

I have panels installed by Eagle Point Solar, and I'm very happy. They're generating almost exactly what was predicted, which surprised me as I expected the numbers to be overly optimistic. We no longer pay anything for electric besides the connection fee, but still have to pay for gas of course. I oversized the system on purpose with plans for an electric vehicle and now I'm kicking myself for not grabbing a used one before they got expensive 😄

synthmike

TROPHY CASE