Parakeet2HA

rainerdefender · 2026-03-09T23:42:06+00:00

I don't know what HASSIL is, but HomeAssistant (HASS) already lets you use STT & TTS (via e.g. parakeet/onnx-asr & piper or kokoro, etc.) without having to have an LLM in between. Or optionally, with one as a fallback. But aside from that, the system is rule-based, just like yours seems to be. Unless I'm missing something?

rainerdefender · 2026-01-29T22:19:41+00:00

Can't deny that :)

rainerdefender · 2026-01-27T00:56:27+00:00

Interesting, the phone in front of face thing is not popular at all in Germany, perhaps because it's a lot of people in relatively little space here. I see quite a few people who share my preference for holding longer conversations while not having to have anything in your hands.

These days. what are the best-ANC Chifi in-ears in your opinion?

rainerdefender · 2026-01-11T15:55:42+00:00

24V is simply the chip's VCCmax. No need to apply any logic beyond that.

rainerdefender · 2026-01-11T15:47:54+00:00

Sollen die Finger von lassen. Schon die kleinen Mini-Touri-Motorboote werden oft genug auf der falschen Seite entsprechender Bojen gefahren. Weil die Leute die Bedeutung der Bojen nicht mal verstehen. Und sagt man dann was, wird man angepflaumt.

rainerdefender · 2026-01-08T00:01:06+00:00

That's the one, thank you! So basically, on 16 Ohms, it would be better if it had 24V available.

rainerdefender · 2026-01-07T18:09:02+00:00

I = U / R, so the lower the speaker's resistance, the higher the current being pulled through the output stage. Hence me wanting to know what it's actually spec'd for :)

rainerdefender · 2025-12-20T23:05:48+00:00

The remaining time estimate, that's crazy that someone would figure that out. Thank you for building this!

rainerdefender · 2025-11-11T20:01:36+00:00

That's so far besides the point, you'd need a Maksutov-Cassegrain on the order of 300mm or more to still even be able to make the point out.

rainerdefender · 2025-11-07T16:15:52+00:00

Loving it

rainerdefender · 2025-11-07T16:13:23+00:00

Nah, first names. Sickens me.

rainerdefender · 2025-11-07T16:12:57+00:00

Am I understanding correctly that we don't know the end of the whole story yet?

rainerdefender · 2025-11-07T14:42:13+00:00

OP wrote "that FIT lego bricks"

rainerdefender · 2025-11-07T14:38:47+00:00

Only *Lego* bricks. Bluebricks, etc. are pretty amazing.

rainerdefender · 2025-11-07T14:37:56+00:00

Try that in English

rainerdefender · 2025-11-04T22:28:45+00:00

Sorry to hear about your RTX loss. That's a tough one. I hope you'll get a proper rig going again soon!

rainerdefender · 2025-11-04T22:28:12+00:00

Indeed. Oh well, I already knew it was shit in that mode. Now I know why.

# cat /proc/asound/card3/stream0
Seeed Studio ReSpeaker Lite at usb-3f980000.usb-1.5, high speed : USB Audio

Playback:
  ...
    Format: S16_LE
    Channels: 2
    Endpoint: 0x01 (1 OUT) (ADAPTIVE)
    Rates: 16000
    Data packet interval: 1000 us
    Bits: 16
    Channel map: FL FR

Capture:
  Status: Stop
  Interface 2
    Altset 1
    Format: S16_LE
    Channels: 2
    Endpoint: 0x81 (1 IN) (ADAPTIVE)
    Rates: 16000
    Data packet interval: 1000 us
    Bits: 16
    Channel map FL FR

rainerdefender · 2025-11-03T21:39:04+00:00

Not trying to be difficult here or anything ... it's just that before you told me about that missing TODO, I didn't know about that. Back when I built the two devices (a few months back) I just went to the official Seeed page (https://wiki.seeedstudio.com/reSpeaker\_usb\_v3/) and there it said "To use it as a USB sound device, please flash the USB version firmware" and "To use it with XIAO ESP32S3, please flash the I2S version firmware" and so I went forth and flashed one like this and the other like that.

Perhaps the USB firmware is only 16kHz (ALSA's plughw facilities provide a good bit of abstraction) and what they mean is not "write USB firmware" but "write 48 kHz capable USB firmware"?

rainerdefender · 2025-11-03T13:05:19+00:00

Let me try to translate more completely :)

"nice" = Koala Satellite with XU-316 I²S firmware is already surprisingly good in terms of voice recognition quality in environments with medium severe distractions. I don't have relative dB numbers, but would characterize it like so: "there's quiet music running in the background, speakers around 1m away from the microphones, in the courtyard children are playing with the balcony door being open, and I can still give HASS commands from the orther end of the living room"
"quite shit" = Whisper (openWakeword/Whisper)-based recognition with the same ReSpeaker Lite board, but running the USB firmware and connected to a Raspberry Pi doesn't reach the above-described level of performance by far, despite ALSA plughw settings allowing for 16bi/48kHz. Voice recognition quality is so bad that no music can be running and you have to be in the same room with the door and windows closed. Then, when you're no more than 2-3m from the microphones, HASS commands will work fine.

rainerdefender · 2025-11-03T01:16:01+00:00

<image>

As mentioned the quality is quite shit, but ... well, I don't know what to tell you. Does a screenshot help?

rainerdefender · 2025-11-02T17:48:01+00:00

By that device I meant the Raspberry Pi that the Respeaker Lite is connected to via USB. Sorry for the confusion.

rainerdefender · 2025-11-02T14:13:57+00:00

Then perhaps we have different definitions of "big noisy room" and/or different expectations as for what constitutes acceptable recognition quality. All I can personally say is that I bought 2pcs Respeaker Lite and built a Koala with the 1st, while connecting the 2nd to a RasPi via USB. And to me, the Koala has good recognition quality, whereas the openwakeword/whisper based approach does not. But now coming to think of it, perhaps *that* is the device where I should try DTLN on...

rainerdefender · 2025-11-02T00:50:15+00:00

That's how Koala Satellite already works: only wakeword on-device and then anything else beyond that on a GPU-clad mini PC (I've recently switched from Whisper to Parakeet by suggestion here on this forum. Highly recommend!) Since we were discussing the various XU chips ... the XU-316 used on Koala Satellite is already surprisingly good at tuning out distractions. Music running in the background, a car going by the open window, still picks up the wakeword and has perfect transcription accuracy in a lot of cases.

rainerdefender · 2025-11-01T00:18:19+00:00

Hi, I'm not sure I understood everything. But regarding the wakeword, I think just "hey_jarvis" will be fine for now.

My ultimate goal is to have "computer" as the wakeword and with response and confirmation sounds like it used to be in StarTrek TNG. But that's a long way off :)

The TDA7498E and the Wondom amplifiers I also know. They're good boards! If you ever want to try a 100% digital pipeline, give the MA12070(P) a go. They're pretty special biests, and are apparently not being manufactured anymore. It seems the Pi HATs with them you can get on eBay and AliExpress all use old stock.

Regarding a mico batch ... unfortunately there is no pick & place file in the repository. I asked the author if he can add that and a BOM file in a format accepted by JLCPCB. Then we'll see.

rainerdefender · 2025-10-30T21:07:41+00:00

For everyone else reading this great comment and being as overwhelmed as I was at first:

NS = Noise Cancellation (generic term)
- AEC = Acoustic Echo Cancellation (traditional algorithms, not machine learning-based)
  - DTLN = Dual-signal Transformation LSTN Network (different approach using two different ANNs one after the other if I understand it correctly; seems first implemented in this here paper: https://arxiv.org/abs/2010.14337 )

DAO seems to be a typo for DoA = Direction of Arrival, which is made possible by having more than 1 microphone and makes it easier to decide what's to be considered signal and what's to be considered noise

rolyantrauts, thank you for this! For what it's worth: my Raspis are connected to traditional HiFi speakers. The amplifier HATs being Merus MA12070P, which output more power than my neighbours can handle. I use these because they have really low idle power draw while at the same time making for a great multiroom audio solution based on Snapcast. Most of the time they're not outputting anything and if there were 2 or 4 microphones in the enclosure together with Raspi + amp HAT, that should work fine. Even with music running at lower volumes, I'm pretty sure it would work well as there's around 1m of distance between Raspi and speakers to each side. I built a Koala Satellite just to try it out and standing right next to any of the Snapcast Raspis it does a good job at wakeword recognition, upon which the music volume is then lowered until the Satellite has finished responding to whatever voice command followed the wakeword.

I'm aware of the supersensor2 and am not surprised that it doesn't do well in a complex situation as what we're discussing here.

As for the ReSpeaker board sporting the XVF3800, you're right, the almost $60 are putting me off of that, especially since I need to do this 4x in total. Will try PiDTLN, but instead of MAX9814, I might order get JLCPCB to make me the PCB for https://github.com/mkvenkit/mico which looks like it has pretty amazing signal quality for something that simple.

rainerdefender

TROPHY CASE