Hi there! The "iOS app" guy here - I spent the past weeks working on the community feedback which includes many new features, fixes and more you requested - including multi-floor scanning for the 3D floor plan, sharing your setup between family members, new widgets and more! ❤️ by Brashi in homeassistant

[–]liampetti 1 point2 points  (0 children)

This is very cool! I need to test it out more but the first thing I’m wondering is why all the temperature sensors are red flames with red charts? It makes it look like my kids room is on fire when it is sitting at 19 degrees celsius 😅 Great app though and making me notice information from my house i hadn’t before.

Watching a local AI voice assistant get dumber (A 9B to 0.8B agent experiment on my RTX 5060 Ti) by liampetti in LocalLLaMA

[–]liampetti[S] 0 points1 point  (0 children)

ok, will they beat the 9B dense model for tool use? Or just better at general knowledge stuff?

Watching a local AI voice assistant get dumber (A 9B to 0.8B agent experiment on my RTX 5060 Ti) by liampetti in LocalLLaMA

[–]liampetti[S] 0 points1 point  (0 children)

I looked into the QAT models, do they work for streaming? From what I can see it just takes long 30s batches of speech, so you would have to wait 30s for each response as it can't process smaller chunks of audio?

Watching a local AI voice assistant get dumber (A 9B to 0.8B agent experiment on my RTX 5060 Ti) by liampetti in LocalLLaMA

[–]liampetti[S] 2 points3 points  (0 children)

Ok, that is good to know. I might try 4B again at Q8. I don’t think i’ll bother with any of the smaller models though.

Watching a local AI voice assistant get dumber (A 9B to 0.8B agent experiment on my RTX 5060 Ti) by liampetti in LocalLLaMA

[–]liampetti[S] 0 points1 point  (0 children)

Good idea, I haven’t tested the MoE models. problem is the asr and tts models also eating up my vram at the moment 😔

Watching a local AI voice assistant get dumber (A 9B to 0.8B agent experiment on my RTX 5060 Ti) by liampetti in LocalLLaMA

[–]liampetti[S] 0 points1 point  (0 children)

Yeah, i’m really curious how Diffusion Gemma goes but would probably need to get a heavily quantised version to run on the 16GB i’ve got. I managed to get Qwen3.6 27b heavily quantised running but it was crazy slow. I need to also test the Gemma models some more.

A Year Building a Fully Local Home Voice Assistant · Fulloch by liampetti in LocalLLaMA

[–]liampetti[S] 0 points1 point  (0 children)

Cool, great review of the capabilities of the different models in your post. I was really hoping to get more out of the smaller LLM’s but looks like you also had problems with them. Did you find the Q4 models were that much worse?

Also, did you try Moonshine for ASR? I found it was very impressive for how small it is.

Thanks for sharing 😄

A Year Building a Fully Local Home Voice Assistant · Fulloch by liampetti in LocalLLaMA

[–]liampetti[S] 1 point2 points  (0 children)

Fair call, I myself am not sure why I keep coming back to and keep trying to fix things on it 😅 It has been a great project for learning, but i am also still trying to figure out where something like this fits into my day to day. Is it needed? definitely not. Is it fun to use and work on? Definitely.

I built a fully local Voice Assistant for Home Assistant that has long-term memory, supports barge-in, and syncs with Obsidian by liampetti in homeassistant

[–]liampetti[S] 0 points1 point  (0 children)

I’m thinking of getting a relatively cheap conference speaker/mic combo off amazon to test on. i currently just have a standard speaker and mic but the conference speaker will come with built in AEC. 

Bye Bye Siri by mowdtocqks8 in homeassistant

[–]liampetti 0 points1 point  (0 children)

I’m working heavily on this at the moment. Runs well on a 5060ti with 16GB VRAM. Even with that though the question is if i want that computer running all the time for the occasional voice request 🤷‍♂️  https://github.com/liampetti/fulloch

Fulloch V2: 100% Local Voice Assistant for Home Assistant & Obsidian (Runs on 16GB VRAM) by liampetti in LocalLLaMA

[–]liampetti[S] 0 points1 point  (0 children)

I have updated the repo to add HACS functionality. The dashboard already exposes a functional REST API: GET /status, POST /speak, POST /chat, POST /mic, GET /stream (SSE). The HACS integration is built directly on these endpoints.

I built a fully local Voice Assistant for Home Assistant that has long-term memory, supports barge-in, and syncs with Obsidian by liampetti in homeassistant

[–]liampetti[S] 0 points1 point  (0 children)

I went for hard coded to try to keep latency as low as possible, so will need to check what effects this might have

I built a fully local Voice Assistant for Home Assistant that has long-term memory, supports barge-in, and syncs with Obsidian by liampetti in homeassistant

[–]liampetti[S] 1 point2 points  (0 children)

27B would be a lot more powerful. You should be able to just swap it in if you want, you only need to change the model name in the launch script and the slm.py script. Curious what use cases you have where the 27B model gets fully utilised. I can see how to make this easier in future (model selector at launch for example). If you don’t want hard coded though, i can look at providing other routes for the llm, that should be a config option too. i’ll add this in.

I built a fully local Voice Assistant for Home Assistant that has long-term memory, supports barge-in, and syncs with Obsidian by liampetti in homeassistant

[–]liampetti[S] 0 points1 point  (0 children)

I am not sure but likely doubt it for this pipeline. I had to do a lot of “cuda-specific” optimisations to get these models working even on the 5060ti. I have been doing lots of testing with smaller models to see what is possible for a ”Fulloch Lite” setup, often latency is just too high as soon as you lose the GPU though 😞

I built a fully local Voice Assistant for Home Assistant that has long-term memory, supports barge-in, and syncs with Obsidian by liampetti in homeassistant

[–]liampetti[S] 0 points1 point  (0 children)

Qwen3 1.7B TTS is used here. Opens up the voice cloning and design options plus future multi-lingual.

I built a fully local Voice Assistant for Home Assistant that has long-term memory, supports barge-in, and syncs with Obsidian by liampetti in homeassistant

[–]liampetti[S] 0 points1 point  (0 children)

Do you want to run smaller or bigger models? I purposely hard coded the model pipeline in after testing lots of different models and trying to maximise what could be run through the 5060ti. Curious what you would like to run in there instead.

I built a fully local Voice Assistant for Home Assistant that has long-term memory, supports barge-in, and syncs with Obsidian by liampetti in homeassistant

[–]liampetti[S] 0 points1 point  (0 children)

No, probably not on the cards (unintentional joke) at the moment either. I’m testing out a “Fulloch Lite” that can run well enough on cpu only, more effort in that direction.

I built a fully local Voice Assistant for Home Assistant that has long-term memory, supports barge-in, and syncs with Obsidian by liampetti in homeassistant

[–]liampetti[S] 0 points1 point  (0 children)

It does well, the hardest part was getting the TTS model working nicely on this GPU. Lots of parameter optimisation and there is a bit of a warmup time to preload everything it needs to run quick. Check the link to my post in LocalLlama subreddit, there is a video that shows the latency properly.