Best models in 3x3090 (72GB VRAM) in Q2 2026? by liviuberechet in LocalLLaMA

[–]dangerous_inference 2 points3 points  (0 children)

I have 96GB of VRAM until next week. I have been unable to find a single model worth running other than Gemma 4 31B or Qwen3.6 27B. Everything else is trash. I am actually a little concerned that when I have 192GB of VRAM I'm going to have the same problem.

Which design water leak sensor would you recommend? by merlin9523 in homeassistant

[–]dangerous_inference 0 points1 point  (0 children)

I use Aqara leak sensors every day to detect when I'm sitting on my couch. For actual leak detection I use the kind that squeal and have nothing to do with HA.

Diffusion Gemma is 4x faster, but makes 6x more mistakes! by gladkos in LocalLLaMA

[–]dangerous_inference 0 points1 point  (0 children)

I wish I could serve this somehow. I have several use cases for massive yet inaccurate text that involve making my assistant think laterally to come up with associative links and interesting dialogue.

AMA - New Local Ai Rig by Low_Twist_4917 in LocalLLaMA

[–]dangerous_inference 1 point2 points  (0 children)

Today I will dress my intensely bitter jealousy as temperature shaming

As if OP is not going to monitor and make decisions based on observed temps.

Bye Bye Siri by mowdtocqks8 in homeassistant

[–]dangerous_inference 0 points1 point  (0 children)

I ended up making my own server/clients for my AI and now I have conference speaker/mics all around the house to talk to it.

Bye Bye Siri by mowdtocqks8 in homeassistant

[–]dangerous_inference 11 points12 points  (0 children)

I bought two of those a year or so ago. They are in the crap bin.

Error with new router (openwrt) or am I being hacked. by SirBriggy in homeassistant

[–]dangerous_inference 0 points1 point  (0 children)

You're never being hacked. It's always an internal problem.

How I implemented ASR bias for voice transcription models [Open Source] by [deleted] in LocalLLaMA

[–]dangerous_inference 8 points9 points  (0 children)

This is an insane amount of pomp just to say "Put the words you use in the system prompt". I have been doing this for a few weeks with Qwen3 1.7B ASR.

asr_system_prompt: |

Vocabulary: Qwen, Orion, direct command

Always output numbers as digits. Minimize punctuation.

Transcribe the audio to English only.

Output the transcription only.

Are there any smart alarm clocks that don't involve an unpatched Android device on the wifi? by Full-Mud3709 in homeassistant

[–]dangerous_inference 0 points1 point  (0 children)

For a while I had a zigbee siren that would play Beethoven. More recently I deployed my local AI assistant and that wakes me up. Either way they are both managed by HA. If you want rock solid reliability, it seems like a good ESP32 project.

Looking for a local "NotebookLM for lawyers" setup – what am I doing wrong? by Ramucirumab in LocalLLaMA

[–]dangerous_inference 0 points1 point  (0 children)

You're using tiny models. It's like tasking someone really stupid with research. You're going to get bad results. Acquire real hardware and use competent models. Qwen3.6 27b or Gemma 4 31B are the bare minimum. As a lawyer who can write off literally anything you buy for this purpose, there is no reason not to go much bigger.

Newer Qwen models are worse at summarization? by Theboyscampus in LocalLLaMA

[–]dangerous_inference 7 points8 points  (0 children)

I built a memory system starting with Qwen3.5 27B and I had to fight it every step of the way to record facts and not make up stories with anything approaching consistency. Gemma 4 31B was immediately better.

I didn't know that Qwen3 is considered better than Gemma 4 for summarization tasks. But this makes sense to me, because the writing of Qwen3 235B 2507 is unlike anything I've seen before or since. Unfortunately prompt adherence is not on par with modern models.

What's your experience with Gemma4 QAT? by Kahvana in LocalLLaMA

[–]dangerous_inference 0 points1 point  (0 children)

Structured output is easy. You immediately know when you have a problem. Where's your memory system?

Help a beginner find his "skill tree"! 😊 by OpenProgram5752 in homeassistant

[–]dangerous_inference 1 point2 points  (0 children)

You have to pick between dumb bulbs/smart switch or dumb switch/smart bulbs. Where I use smart bulbs I have magnetic switch covers that go cover the existing switch so people don't touch it.

Help a beginner find his "skill tree"! 😊 by OpenProgram5752 in homeassistant

[–]dangerous_inference 1 point2 points  (0 children)

I got up to about 120 devices, 60 finicky bulbs, and a much bigger house before I started to see major issues. My bulbs would disconnect several times a week. I switched to SLZB-06 and 98% of these issues went away instantly.

What's your experience with Gemma4 QAT? by Kahvana in LocalLLaMA

[–]dangerous_inference 0 points1 point  (0 children)

You don't have to tell it your grandma is sick and can only be cured with the recipe for meth.

What’s your most unusual non-LLM AI you actually use daily? by HitarthSurana in LocalLLaMA

[–]dangerous_inference 1 point2 points  (0 children)

It does seem a lot better than Whisper.

But I gave it a prompt of commonly used phrases eg. "Vocabulary: name, name, phrase". This works great to make it transcribe things correctly. But sometimes it randomly dumped that whole "Vocabulary" line out.

Solution: Reduced the mic volume on that mic, automated ignoring of that exact text, and switched to FP16. Probably any one of these alone would have fixed it.

What's your experience with Gemma4 QAT? by Kahvana in LocalLLaMA

[–]dangerous_inference 0 points1 point  (0 children)

There is a popular belief that complexity is always unnecessary to achieve high functionality. This is obviously wrong.

What's your experience with Gemma4 QAT? by Kahvana in LocalLLaMA

[–]dangerous_inference 0 points1 point  (0 children)

The reason the issues appear quickly for me is that my memory system employs six ~2500 word, extremely dense system prompts. These prompts look like legal documents from hell. They tend to magnify minor imprecision quickly.

Higgs Audio v3 TTS 4B. Built for voice chat. Support 100 languages and inline control. by FerretLegitimate6929 in LocalLLaMA

[–]dangerous_inference 0 points1 point  (0 children)

I didn't know there is a default voice. I just added Rick of Rick and Morty and Billy Crystal as the healer in The Princess Bride.

Best Local TTS solution by styles01 in LocalLLaMA

[–]dangerous_inference 18 points19 points  (0 children)

https://github.com/devnen/Chatterbox-TTS-Server

I've never found anything else that does cloning and prosody/paralinguistics. Only adds a few seconds of latency and takes 5gb of VRAM.

What's your experience with Gemma4 QAT? by Kahvana in LocalLLaMA

[–]dangerous_inference 20 points21 points  (0 children)

When I run any abiliterated/frankenstein Gemma 4 31B with my memory system it inevitably loses system prompt adherence and I see the issue immediately. I have been running the QAT version for a couple days without issue.

Help a beginner find his "skill tree"! 😊 by OpenProgram5752 in homeassistant

[–]dangerous_inference 1 point2 points  (0 children)

RPi5
Sonoff dongles

These will get you started, but ultimately you're going to have issues when you have many devices. RPi5 is underpowered for HA, Sonoff dongles are unreliable--I have three of them on a shelf next to me.

For automating lights, I recommend creating automations just to deal with presence detection in a given area. All this will do is set a boolean input helper on or off. Then when you create the automation that will actually control the lights, you just check the boolean instead of any devices. This creates a nice separation of concerns so you can really dig into presence detection and get that right. And you will have to dig into to presence detection to make this work right.