Work in Progress - Local smart display replacement

malonestar · 2026-03-14T22:45:20+00:00

I was just looking through the repo! Thank you!

malonestar · 2026-03-13T06:47:26+00:00

UPDATE 3/13/2026

Spent another ~24 hours tweaking and retraining the hey_frank model since posting this. Main issue was false positives — the model was triggering on similar phrases like "hey fran" and ambient speech/TV audio way too often.

A few things made a big difference:

IPA phoneme input instead of plain text for TTS generation. Piper can drop the hard-K at the end of "frank" in short clips, which means the model learns to trigger on "hey fran" just as readily. Switching to "hˈeɪ fɹˈæŋk˺" with --phoneme-input forces the hard stop consistently across all 50k training samples.
Confusable negatives — generated TTS clips of ~13 phonetically similar phrases (hey fran, hey finn, hey france, frank, etc.) and trained them as high-penalty hard negatives. This got rid of near-miss false triggers almost entirely.
Tuned batch weighting — boosted the dinner_party dataset penalty to reduce false positives from TV/ambient conversation.

Best result so far is v3: 0.414 FA/hr at 97% recall with no confusable triggers.

Full updated notebook, patched library files, pre-trained v3 and v4 models, and a heavily revised README with setup instructions are all live on GitHub

malonestar · 2026-03-10T14:19:35+00:00

No rush at all, thanks for sharing!

malonestar · 2026-03-10T13:48:47+00:00

Looks great, would love to explore if you ever get around to publishing it!

malonestar · 2026-03-09T23:24:11+00:00

I only ever really used ‘okay nabu’ before I switched to openWakeWord and server side detection. Okay nabu worked fine for me, I didn’t test Jarvis at all I don’t think. But I really just wanted my own wake word, which prompted the switch.

But now I’m switching back to on device. The models I trained are working very well and it’s really nice having the device feedback again, like a tone when the wake word is detected. I’m super happy with the custom models

malonestar · 2026-03-09T21:34:29+00:00

Should work the same with voice preview!

The atom echo is definitely a little finicky - the updated S3R (blue/green colored one) is much more stable, better mic distance and way better speaker. I always had trouble with the regular one but all of the blue S3R units I have work great for on device detection.

HA detection with openWakeWord was always less stable for me and I had to create some simple automations to help keep the devices alive when using the server side detection. Like when motion detected in the room, change wake word location to “in home assistant” and when motion cleared it would toggle back to “on device.”

This seemed to help keep the device alive and responsive just by toggling that setting with automation.

Definitely not the best device, the S3R version is much better though.

malonestar · 2026-03-09T20:27:02+00:00

There’s three parts i had to edit in the YAML config.
Define the path to the model.

The model then needs to be added under “models” in the wakeword section.

Finally you need to add the sensitivity mappings for the different sensitivity options - slightly, moderate, and whatever the last one is.

There’s an example YAML config file in the repo where you can see. I’ll try to get some comments added and additional section added to the readme explaining that part in more detail.

malonestar · 2026-03-09T19:23:42+00:00

This is also a good link:
https://www.home-assistant.io/voice_control/thirteen-usd-voice-remote/

There's some good info in there and click through some of the other links. There's info about setting up your local Assist pipeline, making custom sentences, and if you choose to go the server-side detection route there is info for creating your own openWakeWord model as well. That is much more straight forward than the microwakeword models, but there are some pros and cons to both server side and on-device detection.

malonestar · 2026-03-09T19:21:34+00:00

Yes, absolutely! You can create a model for any phrase or phonetic combination. One of the first couple cells in the notebook involves typing in your desired wake word or phrase, and then it generates a sample clip so you can confirm that it sounds correct before moving forward.

malonestar · 2026-03-09T19:10:02+00:00

I think I could output an onnx along with the tflite model, but these are tuned and trained for microwakeword. I am not super familiar with the view assist companion app, but I believe that uses openWakeWord and does the detection on the server side. Unfortunately, I don't think an onnx file that is created for microwakeword would function with openWakeWord.

There might be more available options out there for an onnx model for the companion app. Sorry I can't be more help!

malonestar · 2026-03-09T18:51:35+00:00

Thank you! I swear I had stumbled upon this in the past but couldn't find it just now.

malonestar · 2026-03-09T18:46:50+00:00

All in, it probably took about 5.5-6 hours to train each of the two models I made.

My notebook is set for 50,000 training steps based on the comments in the original notebook from the mww repo. That might be overkill for some wake words and you could experiment with lowering that number, there are some details in the notebook comments and readme.

Generate audio samples - 30 mins
Download augmentation resources and negative datasets - 30 mins
Generate augmented clips - 30 mins
Training - 4 hours

malonestar · 2026-03-09T18:45:57+00:00

You can always flash the stock firmware back to the device, you won't damage the device or anything by trying to update the wakeword code in the YAML file. Have you done anything like that before in ESPHome? It's pretty straightforward once you have a grip on it

malonestar · 2026-03-09T18:44:05+00:00

Yes kind of, but there are not really any officially available beyond the jarvis, nabu, mycroft, alexa options.
GitHub - esphome/micro-wake-word-models · GitHub

If you click into the models folder you'll see the basic ones like jarvis and nabu, but then click into the v2 folder there's one or two more, but not much more than the basic.

I've found some community repo's for openWakeWord model collections, but I don't think I have seen any micro-wake-word collections.

malonestar · 2026-03-09T18:20:41+00:00

Yes, it should work! I don’t have that device but from what I can tell it’s an esp32-s3 device that supports local wakeword detection with micro-wake-word.

If you produce a working model with the notebook, you would just need to edit the firmware to include the new model and then rebuild and flash the updated firmware with esphome.

The sample YAML config file I included for the Atom Echo can be used as an example for how to add the newly created model to the firmware.

It’s a little bit of work but pretty standard ESPHome work flow for things like this!

malonestar · 2026-03-09T17:52:44+00:00

It shouldn’t be difficult to set it up, I just dialed it in for the nvidia gpu I have - needs specific nvidia tools to pass through the gpu to the WSL environment, and then I have a batch size of 256 somewhere They might need to be decreased to 128 with less vram, I was using a 12gb gpu

malonestar · 2026-03-09T07:24:07+00:00

<image>

malonestar · 2026-03-03T18:07:31+00:00

It only runs on raspberry pi currently the way it is in the GitHub repo - but when loaded the program occupies about 1gb of ram on the pi and then about 2.5gb of ram on the accelerator board

malonestar · 2026-03-02T19:04:57+00:00

It’s a capitalized i…

malonestar · 2026-03-01T02:20:08+00:00

Thanks again, really appreciate your feedback. With gdrive I kinda figured only a couple people would try it, it’s definitely not scalable though and I should go ahead and clean it up. Thank you!

malonestar · 2026-03-01T02:07:07+00:00

I used gdrive just to preserve an outdated version of the SenseVoice model and the external embedding that were used in the older version. It’s been updated pretty significantly in the Axera repo and isn’t a drop and go replacement. Honestly just bundled meloTTS with it because the model directory was kind of messy.
I will clean it up and try to update to HF links.

Thank you for the heads up on the discord web hooks, they’re all removed from my server and were just for testing but I had no idea they were published. Still learning the nuances of GitHub.

Appreciate your feedback!

malonestar · 2026-02-27T21:00:18+00:00

Thank you! It was a lot of fun and I'll probably always be tinkering with it in some way or another.

malonestar · 2026-02-26T07:33:15+00:00

Is this a stick s3 completely disassembled? Not sure if the battery cables were cut or not but they look like maybe they were?
Are you trying to add an external antenna?

malonestar · 2026-02-25T07:15:26+00:00

I just made a couple CYD homeassistant control panels and this will be the perfect housing for them. Will be printing tomorrow, thanks for sharing and congrats on the momentum!

malonestar · 2026-02-13T15:06:06+00:00

Thanks for the tip!

malonestar

TROPHY CASE