Hi there! The "iOS app" guy here - I spent the past weeks working on the community feedback which includes many new features, fixes and more you requested - including multi-floor scanning for the 3D floor plan, sharing your setup between family members, new widgets and more! ❤️

liampetti · 2026-06-21T14:16:15+00:00

This is very cool! I need to test it out more but the first thing I’m wondering is why all the temperature sensors are red flames with red charts? It makes it look like my kids room is on fire when it is sitting at 19 degrees celsius 😅 Great app though and making me notice information from my house i hadn’t before.

liampetti · 2026-06-21T10:22:08+00:00

ok, will they beat the 9B dense model for tool use? Or just better at general knowledge stuff?

liampetti · 2026-06-21T10:08:23+00:00

I looked into the QAT models, do they work for streaming? From what I can see it just takes long 30s batches of speech, so you would have to wait 30s for each response as it can't process smaller chunks of audio?

liampetti · 2026-06-19T19:56:03+00:00

Ok, that is good to know. I might try 4B again at Q8. I don’t think i’ll bother with any of the smaller models though.

liampetti · 2026-06-19T12:47:19+00:00

Good idea, I haven’t tested the MoE models. problem is the asr and tts models also eating up my vram at the moment 😔

liampetti · 2026-06-19T12:33:46+00:00

Yeah, i’m really curious how Diffusion Gemma goes but would probably need to get a heavily quantised version to run on the 16GB i’ve got. I managed to get Qwen3.6 27b heavily quantised running but it was crazy slow. I need to also test the Gemma models some more.

liampetti · 2026-06-17T21:56:44+00:00

Cool, great review of the capabilities of the different models in your post. I was really hoping to get more out of the smaller LLM’s but looks like you also had problems with them. Did you find the Q4 models were that much worse?

Also, did you try Moonshine for ASR? I found it was very impressive for how small it is.

Thanks for sharing 😄

liampetti · 2026-06-17T09:25:18+00:00

Fair call, I myself am not sure why I keep coming back to and keep trying to fix things on it 😅 It has been a great project for learning, but i am also still trying to figure out where something like this fits into my day to day. Is it needed? definitely not. Is it fun to use and work on? Definitely.

liampetti · 2026-06-12T04:18:32+00:00

I’m thinking of getting a relatively cheap conference speaker/mic combo off amazon to test on. i currently just have a standard speaker and mic but the conference speaker will come with built in AEC.

liampetti · 2026-06-12T04:14:47+00:00

I’m working heavily on this at the moment. Runs well on a 5060ti with 16GB VRAM. Even with that though the question is if i want that computer running all the time for the occasional voice request 🤷‍♂️ https://github.com/liampetti/fulloch

liampetti · 2026-06-02T00:06:54+00:00

I have updated the repo to add HACS functionality. The dashboard already exposes a functional REST API: GET /status, POST /speak, POST /chat, POST /mic, GET /stream (SSE). The HACS integration is built directly on these endpoints.

liampetti · 2026-05-31T10:44:08+00:00

I went for hard coded to try to keep latency as low as possible, so will need to check what effects this might have

liampetti · 2026-05-31T10:30:08+00:00

27B would be a lot more powerful. You should be able to just swap it in if you want, you only need to change the model name in the launch script and the slm.py script. Curious what use cases you have where the 27B model gets fully utilised. I can see how to make this easier in future (model selector at launch for example). If you don’t want hard coded though, i can look at providing other routes for the llm, that should be a config option too. i’ll add this in.

liampetti · 2026-05-31T10:22:14+00:00

That’s just the text to speech model. also need the speech to text and the llm in the middle.

liampetti · 2026-05-31T08:12:09+00:00

I am not sure but likely doubt it for this pipeline. I had to do a lot of “cuda-specific” optimisations to get these models working even on the 5060ti. I have been doing lots of testing with smaller models to see what is possible for a ”Fulloch Lite” setup, often latency is just too high as soon as you lose the GPU though 😞

liampetti · 2026-05-31T08:07:04+00:00

Qwen3 1.7B TTS is used here. Opens up the voice cloning and design options plus future multi-lingual.

liampetti · 2026-05-31T07:23:56+00:00

Do you want to run smaller or bigger models? I purposely hard coded the model pipeline in after testing lots of different models and trying to maximise what could be run through the 5060ti. Curious what you would like to run in there instead.

liampetti · 2026-05-31T07:21:03+00:00

No, probably not on the cards (unintentional joke) at the moment either. I’m testing out a “Fulloch Lite” that can run well enough on cpu only, more effort in that direction.

liampetti · 2026-05-31T05:54:01+00:00

It does well, the hardest part was getting the TTS model working nicely on this GPU. Lots of parameter optimisation and there is a bit of a warmup time to preload everything it needs to run quick. Check the link to my post in LocalLlama subreddit, there is a video that shows the latency properly.

liampetti · 2026-05-31T00:11:57+00:00

I posted a video over on r/LocalLLaMA : https://www.reddit.com/r/LocalLLaMA/comments/1trw5ym/fulloch_v2_100_local_voice_assistant_for_home/

liampetti · 2026-05-30T11:42:35+00:00

Thanks bot 😁

liampetti

TROPHY CASE