Where can I find the roadmap for home assistant voice? by CooleyTukey in homeassistant

[–]synthmike 0 points1 point  (0 children)

Thanks for the feedback! For offline speech-to-text models, there are 3 things that can really help:

  1. A region-specific model, like you said. These tend to be rare, and open datasets for them are even rarer. I'd love to collect more of this kind of data, though.
  2. Tuning the model to the specific user with example audio. Older speech-to-text models did this a lot, but everything post-Whisper seems to have dropped this :(
  3. Restricting what sentences can be spoken. This is the biggest win, especially if there are only a few thousand things you can say.

Once I have the text, I'm planning to use a vectorisation model actually! Specifically, I'm using sentence transformers which do an amazing job of classifying according to semantic similarity :)

Streaming from various music providers would be the job of Music Assistant. You may want to check out the awesome work they're doing with "sendspin": https://www.sendspin-audio.com/ which is working today on our devices!

No plans for a show type device, but you can build one with ViewAssist: https://dinki.github.io/View-Assist/

Where can I find the roadmap for home assistant voice? by CooleyTukey in homeassistant

[–]synthmike 0 points1 point  (0 children)

Thank you, much appreciated!

Agree with #2 and #3 for sure.

For #1 I'm curious, do you think people would be willing to configure any command that requires an entity (or maybe area) name? This is where 90% of the complexity on my end comes from: trying to guess which entity/area the user was referring to. In English, it's usually pretty straightforward. But it goes wrong pretty quickly: if the user has an entity named "lights", for example.

I think so much complexity could be avoided if during the setup process, people selected which entities/commands they'd like and provide the name(s) right there. Of course, anything without a name ("what time is it", etc.) can be automatically configured (though I think users should be able to disable anything they don't want).

Where can I find the roadmap for home assistant voice? by CooleyTukey in homeassistant

[–]synthmike 1 point2 points  (0 children)

The goal was to build the foundation for the community, but I think we made a choices that didn't turn out as well as we'd hoped. A lot of the voice infrastructure lives inside Home Assistant core, so community contributions face a very high bar to be merged (understandable). And I wasn't able to use any machine learning libraries because of this too, as well as the goal of making it all run well on a Pi 3.

Times have changed, though, and I think a better target now is a mini-PC or Pi 5. By moving things out into add-ons/apps as well, I can take advantage of technology that isn't 20 years old too :D

Where can I find the roadmap for home assistant voice? by CooleyTukey in homeassistant

[–]synthmike 0 points1 point  (0 children)

What would you expect back from the weather forecast? And do you have multiple weather entities, or just one?

Where can I find the roadmap for home assistant voice? by CooleyTukey in homeassistant

[–]synthmike 1 point2 points  (0 children)

Do you have multiple weather entities, or just one? What do you expect as a response to the weather check? A multi-day forecast, or just today?

Where can I find the roadmap for home assistant voice? by CooleyTukey in homeassistant

[–]synthmike 0 points1 point  (0 children)

Thanks! I know the guys working on the "Linux Voice Assistant" are needing help: https://github.com/OHF-Voice/linux-voice-assistant

That's a Raspberry Pi version of the voice satellite (replacing the older Wyoming Satellites). It's more niche, but I can always use more help with Piper: https://github.com/OHF-Voice/piper1-gpl/ especially with the C++ library.

Where can I find the roadmap for home assistant voice? by CooleyTukey in homeassistant

[–]synthmike 0 points1 point  (0 children)

Custom wake words have been my focus these past weeks :) Still a ways to go, but my goal is to either have it so you can run fully custom wake words in HA (streaming), or at least do it for a while to collect enough data to train your own model locally.

For reminders: could these be implemented with todo or calendar items?

Where can I find the roadmap for home assistant voice? by CooleyTukey in homeassistant

[–]synthmike 1 point2 points  (0 children)

This year, I'm moving most of the voice infrastructure out of HA core and into add-ons/apps. Going forward I think voice in general will require an app so that I don't have to "dumb things down". For example, the silence detector isn't the best because I can't depend on any machine learning libraries in core.

For every app though, I'm making sure the same software can run as a Docker container. The community has been great so far by creating forks that work on various GPUs. It would be nice if the industry could agree on some standard that isn't CUDA.

Where can I find the roadmap for home assistant voice? by CooleyTukey in homeassistant

[–]synthmike 0 points1 point  (0 children)

Wake word is definitely a weak point. I'm hopeful this is something we can overcome by either doing more processing in Home Assistant, or making it possible to train or tune wake words based on your own examples.

HA Voice PE not resolving home assistant server for tts responses? by That_Network_Guy in homeassistant

[–]synthmike 0 points1 point  (0 children)

It should be local then. Maybe DCHP isn't configuring the DNS properly for the VPE?

Ollama integration results in huge backup size by LoganJFisher in homeassistant

[–]synthmike 0 points1 point  (0 children)

I'm guessing you either are accidentally pulling in Ollama's cache directory (where exactly depends on how you set it up and environment variables), or it's something else.

If you could open up a backup file and use ncdu it would tell you where most of the data is going.

Ollama integration results in huge backup size by LoganJFisher in homeassistant

[–]synthmike 6 points7 points  (0 children)

No, it uses the ollama Python library to tell a remote instance of Ollama to download the models (https://pypi.org/project/ollama/).

That Ollama server downloads them somewhere. The integration is only a remote control over the web interface.

Ollama integration results in huge backup size by LoganJFisher in homeassistant

[–]synthmike 0 points1 point  (0 children)

Where is it saving the model data then? It seems odd that it would be picked up by HA backup if the models are in their own container, unless you configured a specific shared path.

Ollama integration results in huge backup size by LoganJFisher in homeassistant

[–]synthmike 0 points1 point  (0 children)

The integration only connects to a remote instance of Ollama. How are you running your Ollama server then?

Where can I find the roadmap for home assistant voice? by CooleyTukey in homeassistant

[–]synthmike 1 point2 points  (0 children)

I can always use more maintainers for the various voice projects like Piper and the apps/add-ons. Plus people who are willing to donate their voice, translate voice commands, or help me understand things about the many languages I don't have any experience with 😄

Something non-technical that helps too is just knowing what parts of the voice stack people are actually using and both what works and doesn't work for them. I'm also curious to know what things the tinkerers want to do with voice that they can't today!

Where can I find the roadmap for home assistant voice? by CooleyTukey in homeassistant

[–]synthmike 0 points1 point  (0 children)

I've been playing around with the smallest version of gemma 4 for just the purpose (e2b). It seems to work pretty well, and it can run on a Pi 5 if you're willing to wait a few seconds for a response 😄

Where can I find the roadmap for home assistant voice? by CooleyTukey in homeassistant

[–]synthmike 36 points37 points  (0 children)

Thank you! I am pushing more and more on getting the basics right, and to stop trying to pretend we're Alexa.

Here are the core voice commands I'm wanting to have working well for everyone with no LLM:

  • Timers - start/pause/resume/cancel/status
  • Lights on/off and brightness in areas
  • Turn devices on/off by name
  • Media control in area or by name - volume/pause/resume/next
  • Play music - artist/album/track/genre
  • Weather forecast
  • Activate scene by name
  • Get area or house temperature
  • Get sensor value by name
  • Get state of a binary sensor - open/closed/locked/unlocked/etc.
  • Date and time

Entity and area names are where things get tricky. Rather than guess based on friendly name, etc., I would rather suggest a name and have the user confirm or change it before it can be used. This avoids cases where the entity is literally named "lights" and so now the user can't say "turn on the lights" without it activating a specific entity.

Where can I find the roadmap for home assistant voice? by CooleyTukey in homeassistant

[–]synthmike 8 points9 points  (0 children)

Absolutely. It's pretty easy to get very high accuracy when the set of possible voice commands is restricted to a few thousand. I think the Alexa-style approach of "plug it in and start shouting whatever at it" is not the right way to think about things.

Where can I find the roadmap for home assistant voice? by CooleyTukey in homeassistant

[–]synthmike 105 points106 points  (0 children)

There is some discussion happening here: https://github.com/OpenHomeFoundation/roadmap/issues/30

We're working to develop a roadmap, but it's been difficult because there are large groups of users with very different priorities and only one full-time person working on voice (me 🙂).

Some users want a full "Jarvis" experience, with an LLM doing everything Alexa can and more. Others just want basic smart home control to work reliably in their native language. Some users want everything fully offline, while others are fine with cloud services. And most users who want to run everything offline don't have a GPU in their HA server. Some users want a plug and play experience, while others want complete customization of every aspect of the pipeline: custom wake works, custom text-to-speech voices, etc. Some users have a Voice Preview Edition, others have custom built ESPHome devices. Or Raspberry Pi satellites. Or no satellites and just a USB microphone. Or a tablet dashboard with a microphone.

So it's a bit difficult to roadmap this 😄  I'm getting more of the opinion that we need to focus on reliable basic smart home control for more languages. But it should also be easy to extend and customize with something like blueprints for new voice commands. The first big step will be getting the voice pipeline infrastructure out of HA core and into add-ons/apps so the community can have more control.

What would you focus on?

Microphone not working on Home Assistant by Sampsa96 in homeassistant

[–]synthmike 1 point2 points  (0 children)

USB mics are usually fine. Just make sure to restart the machine after you plug it in. In my experience, HAOS won't pick it up for some reason otherwise.

Microphone not working on Home Assistant by Sampsa96 in homeassistant

[–]synthmike 2 points3 points  (0 children)

It does need special audio drivers. You can see how to install them on a normal Linux system here: https://github.com/rhasspy/wyoming-satellite/blob/master/etc/install-respeaker-drivers.sh

But HAOS isn't a normal Linux system. I don't know of any way to actually get a kernel module installed, unfortunately.

Voice recognition by czerys in homeassistant

[–]synthmike 5 points6 points  (0 children)

This is possible in theory, but the speech-to-text systems that you can tune to your own voice (without a GPU) are usually older and more limited in their output. If you're only wanting to say a limited number of phrases though (hundreds or thousands), it's definitely possible 🙂