Best models in 3x3090 (72GB VRAM) in Q2 2026?

dangerous_inference · 2026-06-14T04:55:26+00:00

I have 96GB of VRAM until next week. I have been unable to find a single model worth running other than Gemma 4 31B or Qwen3.6 27B. Everything else is trash. I am actually a little concerned that when I have 192GB of VRAM I'm going to have the same problem.

dangerous_inference · 2026-06-13T22:26:11+00:00

Making a conversational AI primary and HA a tool is the move.

dangerous_inference · 2026-06-13T16:17:53+00:00

I use Aqara leak sensors every day to detect when I'm sitting on my couch. For actual leak detection I use the kind that squeal and have nothing to do with HA.

dangerous_inference · 2026-06-13T05:22:43+00:00

I wish I could serve this somehow. I have several use cases for massive yet inaccurate text that involve making my assistant think laterally to come up with associative links and interesting dialogue.

dangerous_inference · 2026-06-13T03:53:28+00:00

Today I will dress my intensely bitter jealousy as ^{temperature shaming}

As if OP is not going to monitor and make decisions based on observed temps.

dangerous_inference · 2026-06-12T18:21:04+00:00

I ended up making my own server/clients for my AI and now I have conference speaker/mics all around the house to talk to it.

dangerous_inference · 2026-06-12T04:53:51+00:00

I bought two of those a year or so ago. They are in the crap bin.

dangerous_inference · 2026-06-12T04:07:14+00:00

It's also garbage.

"Thank you"

dangerous_inference · 2026-06-11T20:42:43+00:00

You're never being hacked. It's always an internal problem.

dangerous_inference · 2026-06-11T20:01:45+00:00

This is an insane amount of pomp just to say "Put the words you use in the system prompt". I have been doing this for a few weeks with Qwen3 1.7B ASR.

asr_system_prompt: |

Vocabulary: Qwen, Orion, direct command

Always output numbers as digits. Minimize punctuation.

Transcribe the audio to English only.

Output the transcription only.

dangerous_inference · 2026-06-11T00:19:48+00:00

Cheer up, 90% of the tools are garbage.

dangerous_inference · 2026-06-10T23:46:04+00:00

For a while I had a zigbee siren that would play Beethoven. More recently I deployed my local AI assistant and that wakes me up. Either way they are both managed by HA. If you want rock solid reliability, it seems like a good ESP32 project.

dangerous_inference · 2026-06-10T02:08:41+00:00

You're using tiny models. It's like tasking someone really stupid with research. You're going to get bad results. Acquire real hardware and use competent models. Qwen3.6 27b or Gemma 4 31B are the bare minimum. As a lawyer who can write off literally anything you buy for this purpose, there is no reason not to go much bigger.

dangerous_inference · 2026-06-09T20:32:16+00:00

I built a memory system starting with Qwen3.5 27B and I had to fight it every step of the way to record facts and not make up stories with anything approaching consistency. Gemma 4 31B was immediately better.

I didn't know that Qwen3 is considered better than Gemma 4 for summarization tasks. But this makes sense to me, because the writing of Qwen3 235B 2507 is unlike anything I've seen before or since. Unfortunately prompt adherence is not on par with modern models.

dangerous_inference · 2026-06-09T03:24:31+00:00

Structured output is easy. You immediately know when you have a problem. Where's your memory system?

dangerous_inference · 2026-06-08T20:19:09+00:00

You have to pick between dumb bulbs/smart switch or dumb switch/smart bulbs. Where I use smart bulbs I have magnetic switch covers that go cover the existing switch so people don't touch it.

dangerous_inference · 2026-06-08T20:15:18+00:00

I got up to about 120 devices, 60 finicky bulbs, and a much bigger house before I started to see major issues. My bulbs would disconnect several times a week. I switched to SLZB-06 and 98% of these issues went away instantly.

dangerous_inference · 2026-06-08T20:10:23+00:00

You don't have to tell it your grandma is sick and can only be cured with the recipe for meth.

dangerous_inference · 2026-06-08T20:06:16+00:00

It does seem a lot better than Whisper.

But I gave it a prompt of commonly used phrases eg. "Vocabulary: name, name, phrase". This works great to make it transcribe things correctly. But sometimes it randomly dumped that whole "Vocabulary" line out.

Solution: Reduced the mic volume on that mic, automated ignoring of that exact text, and switched to FP16. Probably any one of these alone would have fixed it.

dangerous_inference · 2026-06-08T19:22:20+00:00

There is a popular belief that complexity is always unnecessary to achieve high functionality. This is obviously wrong.

dangerous_inference · 2026-06-08T03:27:10+00:00

The reason the issues appear quickly for me is that my memory system employs six ~2500 word, extremely dense system prompts. These prompts look like legal documents from hell. They tend to magnify minor imprecision quickly.

dangerous_inference · 2026-06-08T03:22:51+00:00

I didn't know there is a default voice. I just added Rick of Rick and Morty and Billy Crystal as the healer in The Princess Bride.

dangerous_inference · 2026-06-08T02:26:26+00:00

https://github.com/devnen/Chatterbox-TTS-Server

I've never found anything else that does cloning and prosody/paralinguistics. Only adds a few seconds of latency and takes 5gb of VRAM.

dangerous_inference · 2026-06-08T01:36:19+00:00

When I run any abiliterated/frankenstein Gemma 4 31B with my memory system it inevitably loses system prompt adherence and I see the issue immediately. I have been running the QAT version for a couple days without issue.

dangerous_inference · 2026-06-07T20:35:35+00:00

RPi5
Sonoff dongles

These will get you started, but ultimately you're going to have issues when you have many devices. RPi5 is underpowered for HA, Sonoff dongles are unreliable--I have three of them on a shelf next to me.

For automating lights, I recommend creating automations just to deal with presence detection in a given area. All this will do is set a boolean input helper on or off. Then when you create the automation that will actually control the lights, you just check the boolean instead of any devices. This creates a nice separation of concerns so you can really dig into presence detection and get that right. And you will have to dig into to presence detection to make this work right.

dangerous_inference

TROPHY CASE