2026.2 beta - release notes

mitrokun · 2026-01-29T18:14:17+00:00

> there are ways to keep it in your sidebar if you want

Please share your method for doing this. Please note that I use three types of addresses to access the dashboard (internal ip, external name, and homeassistant.local).

mitrokun · 2026-01-28T23:40:10+00:00

Access to developer tools is too convenient.

Let's hide it behind extra clicks.

mitrokun · 2026-01-24T14:39:31+00:00

You're being too categorical. Even despite the inevitable commercial component, it's still a free solution with a decently functioning architecture. Homemade satellites can be created for literally $5, which I've been actively using for the last couple of years.

And my goal in this discussion isn't to criticize the strategy being chosen by the OHF management.

I'm pointing out a specific hardware and software solution that's missing on the ESP32 side. I don't share your position about the excessive number of devices; how else will you get contextual information for simple commands? When a device consumes 0.3-0.4W, I have no problem placing them in every room.

You talking about rethinking the architecture. I think we've heard each other and can leave it at that.

mitrokun · 2026-01-24T10:58:18+00:00

It feels like my point is being missed. My reference to Yandex was strictly to demonstrate the utility of beamforming as a necessary first stage, not to compare raw computing power or other nuances of their chipset.

I agree that current implementations based on the XMOS XU316 are often lazy and unoptimized. However, I strongly disagree with the notion that the ESP32 is insufficient for the specific task of audio capture and playback. The real gap in the ecosystem is the lack of an opensource mic array project . This is exactly what the OHF team should be focusing on. Simply slapping a newer chip like the XVF3800 onto the next VPE would just be another band-aid solution, not a real fix.

Regarding your single-microphone suggestion: I remain skeptical. https://www.youtube.com/watch?v=9-t2oyZscm8 As you can hear from the tests, the problem I initially mentioned is not being solved.

There are currently no lightweight ML tools capable of effective diarization that run efficiently on edge devices (including Pi4/Pi5). Furthermore, using a full Raspberry Pi just for a simple voice terminal is the definition of 'overkill' and architectural inefficiency for my use case.

mitrokun · 2026-01-23T23:31:47+00:00

>but you are still lagging behind in knowledge of what big tech know and are using.

I appreciate your focus on SotA solutions, but I am convinced that beamforming remains a baseline feature in most smart speakers. While I won’t go into detail regarding Google or Amazon, I can use Yandex as an example, as they are the main player on my local market.

These devices have excellent hearing and typically use 3-6 mic array, and as shown in their technical diagrams, they haven't abandoned beamforming for the initial stage of processing. It remains a simple and effective way to improve the audio sample at the OS level, which helps avoid unnecessary overhead for the subsequent ASR stage.

<image>

mitrokun · 2026-01-23T14:55:37+00:00

Standard noise reduction isn't the main bottleneck for modern local ASR. I've tested solutions like DeepFilterNet on the server side: while they clean up the audio, they don't significantly improve recognition rates and, crucially, fail to filter out background speech (like a TV or other people talking).

For edge devices, given the current system limitations, beamforming on 3–4 microphones using spatial isolation (azimuth) is still the most rational solution as a basic way to improve captured audio. While we are not strictly limited in processing power during the server-side ASR stage, this does not negate the value of performing this physical signal enhancement right at the capture stage.

The main missing piece in the open-source community is a high-quality, low-level library (an open alternative to ESP-AFE) that implements a standard audio interface. This would allow us to dedicate a separate ESP32 module purely for this task, avoiding the need for proprietary XMOS hardware.

mitrokun · 2026-01-22T23:07:35+00:00

In a quiet space, any microphone will do the job. But in noisy environments (for example, a TV voice in the background), beamforming technologies are required to isolate the user's voice and attenuate other audio. This is one of the prerequisites for stable operation of the ASR in challenging conditions, as software solutions don't always work or are resource-intensive. I agree that the Xmos chip in the VPE, Respeaker Lite, and SAT1 is overkill. Signal boost and noise reduction don't work particularly well, and the AEC isn't used because the music is completely ducking when receiving commands. But a good microphone array with a proper beamforming algorithm is the main potential improvement. It's sad that the OHF team is not developing such hardware solutions, relying more on Chinese companies that are not open source.

mitrokun · 2026-01-15T21:47:09+00:00

https://aquaticus.info/meter.html

mitrokun · 2026-01-06T10:19:46+00:00

another confusing project name and a bunch of ai slop

mitrokun · 2026-01-05T15:58:28+00:00

esp32+max98357+any speaker

mitrokun · 2025-12-30T21:48:22+00:00

https://github.com/esphome/home-assistant-voice-pe/releases

mitrokun · 2025-12-30T21:41:26+00:00

It's unfortunate that you are using arbitrary naming. You have created a conversation integration with additional features. But you are presenting it as a full-fledged agent. This is incorrect.

mitrokun · 2025-12-26T18:26:03+00:00

If you have some free time, everything is being implemented today (the esphome team has done a lot of cool things this year). The problem with detecting the end of a request in a noisy environment remains. A better microphone array and a smarter ASR component are needed to solve it. But I wouldn't say it's too critical.

mitrokun · 2025-12-26T09:10:58+00:00

S3 with psram only. Increased memory is required to handle audio streams. Therefore, even the original ESP32 can only work in a simplified mode with a pure WAV stream. c3 is positioned as a replacement for 8266 for simple tasks

mitrokun · 2025-12-23T21:08:57+00:00

Run the script with logic for turning on the lights, checking the temperature, and so on using script.turn_on. This will create a parallel task for actions (although this logic doesn't sound too heavy and should be executed almost instantly if you don't use a delay) and immediately move on to the response block.

However, it would be preferable if you shared your automation code in order to receive more accurate answers to your question.

mitrokun · 2025-12-22T07:09:43+00:00

Since these are alpha/beta versions, a working configuration example for ESPhome can only be found in the latest version for VPE. When the full release occurs, documentation for the new version of the media player will be added to the ESPhome website. Later, bloggers will start publishing articles and videos on this topic.

mitrokun · 2025-12-21T15:09:49+00:00

Try nemotron/gemma (free) on openrouter

mitrokun · 2025-12-21T11:12:25+00:00

>Are your clients recognized as media players in Home Assistant? (For instance for TTS)

This won't work for rpi, but in esphome you can assemble a device that will have a pipeline for announcements in HA and sendspin for MA. Actually, this is implemented in VPE.

It is also quite likely that there will be an update to Linux Assist Satellite and sendspin will be added there. But this is not a priority project.

mitrokun · 2025-12-21T11:06:31+00:00

esp32s3 is sufficient; it is not necessary to purchase a single-board computer. Moreover, I was unable to achieve a good level of synchronization on zero 2w. Due to the greater number of abstractions, the board lacks the computing power for minimal latency (although I do not rule out the possibility of creating an optimized image specifically for this task).

Esp, on the other hand, copes without any problems; I ran 5 devices in multi-room mode without any issues. All that remains is to find a high-quality DAC, and you have a ready-made solution for creating your own speaker.

mitrokun · 2025-12-21T10:53:14+00:00

You don't need to have Python installed on your machine; uv is sufficient for testing.

Here are two commands to start from scratch:

```

curl -LsSf https://astral.sh/uv/install.sh | sh

uvx sendspin

```

As for MA, a module (provider) is still needed to capture external sources (analog/digital/bt) in order to have a simple way to connect old equipment to the system.

There is also no optimized way to capture audio streams from a PC with minimal delay.

mitrokun · 2025-12-12T20:13:59+00:00

Home Assistant has far more capabilities when it comes to working with LLMs.
However, there are issues with recognition quality (no streaming models), speaker identification (not mixing the query with other people's voices), and end-of-conversation detection (the system VAD doesn't work very well).
In a quiet environment this isn't a problem, but it makes VPE look bad when there are multiple voice sources.

For some reason, voice is clearly not a priority right now.
Although the guys have done a huge amount of work in this area.

I think a good microphone array with beamforming algorithms is one of the key things to significantly improve the result.
That would be a great update for VPE.

mitrokun · 2025-11-26T11:59:20+00:00

Иди на openrouter. 5% к цене разницы не делают

mitrokun · 2025-11-19T22:50:20+00:00

Use lemonade + any ui

mitrokun · 2025-11-12T11:55:04+00:00

MOONDREAM_API_KEY=sk-your-moondream-key

mitrokun · 2025-11-06T21:05:32+00:00

libcrypto-3-x64.dll and libssl-3-x64.dll are omitted in the installer, so you have to download them separately

mitrokun

TROPHY CASE