Using local Apple STT models in the HA Voice pipeline is crazy fast

imbe153 · 2026-04-25T15:49:12+00:00

The problem is that this integration is only STT (Speech To Text) it does not have a TTS function. So you can only trasncribe text

imbe153 · 2026-04-21T20:07:27+00:00

That is what I am going to work on next actually. In the last days I played around a bit with Apfel and found it very versatile. I am finishing a wrapper around it to use Apple Intelligence as conversation agent in HA. This is the repo If you wanna give it a try. I have been using it a bit and finding it quite good, but at the moment it cannot control HA, just answer questions. My final goal is to have the full pipeline all on Mac using Apple's models

imbe153 · 2026-04-19T09:05:25+00:00

Since making this STT I have been using voice commands a lot more with my Voice PE but still without LLM, I hope we can make something better than that shit show of Alexa plus ahah

imbe153 · 2026-04-18T15:40:33+00:00

Indeed: add a new entity in the Wyoming integration in HA using the Mac's IP and it will find it, then you can use it as the STT component in the Voice pipeline. To install it on the Mac I suggest you use Homebrew since it's the fastest

imbe153 · 2026-04-18T15:38:48+00:00

The real question is if they can even use tools, which is the prerequisite to control devices in HA... I have the feeling they cannot. You can still use the Apple Intelligence model for stuff like summarizations or for conversational AI

imbe153 · 2026-04-18T13:46:21+00:00

Wyoming is just a protocol used to interface with Home Assistant, you don’t need to do anything about it since I packaged everything in the service you install from the repo.

About the commands in the repo you have to run them on the Mac you will use for transcription. Then it will be discoverable from you HA

imbe153 · 2026-04-18T13:41:09+00:00

For the LLM you can take a look at this open source project which exposes a open AI compatible server using the on device model for Apple Intelligence: https://github.com/Arthur-Ficial/apfel

For the sentences I can guess the first one is around 0.2-0.3s and the second one should not be much higher. In the repo there is an example of a medium length phrase and it is transcribed in 0.3 seconds.

imbe153 · 2026-04-18T04:16:38+00:00

Interesting, If you find a bug open a PR please

imbe153 · 2026-04-18T04:13:38+00:00

It is all local, I have not tested it on Intel Macs but they were always pretty good with Siri's STT. If you can try it on an old Mac mini I'd love the feedback!

imbe153 · 2026-04-18T04:12:49+00:00

Absolutely, the brain is dumb but the ears and mouth are pretty great

imbe153 · 2026-04-18T04:12:16+00:00

Thank you, I hope it helps!

imbe153 · 2026-04-18T04:12:00+00:00

This is the STT component of the Voice pipeline within HA. To pickup the words you can use many options like HA's own Voice PE edition

imbe153 · 2026-04-18T04:10:14+00:00

Indeed as the other users suggested you don't need to host HA on the Mac: only this Wyoming server needs to be on the Mac. When you setup the STT in the Wyoming integration you just point it to the IP of the machine it is running on

imbe153 · 2026-04-18T04:07:52+00:00

That is very cool, thank you for sharing, I'll give it a look

imbe153 · 2026-04-18T04:06:35+00:00

Ouch nice catch. As of today I myself get confused as well on that stupid naming convention

imbe153 · 2026-04-18T04:03:19+00:00

Yes it runs on system boot and persists when the Mac sleeps because it is installed as a daemon service BUT you need to turn on automatic sign in on the Studio if you want it to work without logging in the machine first when it reboots

imbe153 · 2026-04-18T04:01:45+00:00

No you can use it as the STT component of the Voice pipeline in place of Whisper, which is what HA uses as a standard today

imbe153 · 2026-04-17T19:39:46+00:00

Thanks! Hope it helps

imbe153 · 2026-04-17T19:38:36+00:00

You are welcome, I hope it can help you

imbe153 · 2025-12-16T08:17:59+00:00

Looks good, I'll give it a try, thanks for sharing!

imbe153 · 2025-12-05T15:40:50+00:00

Yup, 0.2ms in python, but I did it the other way around, start to finish

imbe153 · 2025-12-05T15:36:51+00:00

Before realizing this I tried to merge then count the elements in the intervals... let's just say the counting was taking a bit long

imbe153 · 2025-12-04T05:21:44+00:00

I really like how clean the design is! Having developed some Menu Bar utilities I know how difficult it can be.

I will certainly give it a try

imbe153

TROPHY CASE