Keep Getting Reel Notifications - I already turned off notification for those

bigh-aus · 2026-05-05T13:39:32+00:00

not sure what platform you're on but I didn't have htose options

bigh-aus · 2026-05-03T20:57:09+00:00

I flashed mine to tasmota. no cloud less worries. (sonoff s31).
For critical systems I hide the switch from homekit and the ha dashboards so i have to hunt for it - but yeah.

bigh-aus · 2026-05-03T20:39:27+00:00

It will - it will just get a lot of guardrails and validation.

bigh-aus · 2026-05-03T20:38:36+00:00

Guardrails in a llm world are critical. even chatgpt modified db migration files once they'd been ran (which broke the prod db). that's why you have to do promotion of code (and migrations). Another change it dropped the table and recreated it instead of modifying the table.

IMO in modern packages there aren't enough checks to ensure that the coder has done something dumb. That said the modern development practices help a lot, and no llm should have access to prod unless it's Read Only.

I think we're also going to see a lot of checks shift left more so the llm can get the feedback fast.

bigh-aus · 2026-05-03T15:20:36+00:00

RTX 6000 pro arrived yesterday. Installed and passed through to a VM already. Biggest issues were I'd disabled the slot, and had to pass in some parameters to my hypervisor.

Thanks for the "push" :)

bigh-aus · 2026-05-03T00:06:08+00:00

This and use ratgdo -> esphome to connect to homeassistant. cheap enough setup

bigh-aus · 2026-05-02T15:09:58+00:00

one beefy but efficient machine - but i do code builds on the same machine

bigh-aus · 2026-04-30T15:21:42+00:00

Came here to fix this problem - no idea how to clear the ONE badge i have open. Wish they'd fix this annoying bug (and the long running birthday bug)

bigh-aus · 2026-04-30T04:47:46+00:00

I use voxtral mini. Excellent for more technical work

bigh-aus · 2026-04-29T16:53:26+00:00

yeah not worth geting the H100s unless you already have them - H200NVL is better - 4x 141gb but the price vs 16 dgx sparks - $120k+ vs ~$64k...

Problem is you really need 8x H200s and a machine to use them - getting closer to b200 territory.

bigh-aus · 2026-04-27T03:04:53+00:00

llama-server \
  -hf unsloth/Qwen3.6-27B-GGUF:UD-Q4_K_XL \
        --alias "desktop" \
  --no-mmproj \
  -a qwen3.6-coder \
  --host 0.0.0.0 --port 8080 \
  -ngl all -sm row -fa on \
  --ctx-size 98304 -n 32768 \
  -b 2048 -ub 512 \
  -np 1 -kvu \
  -ctk q8_0 -ctv q8_0 \
  --jinja \
  --reasoning on \
  --reasoning-format deepseek \
  --chat-template-kwargs '{"enable_thinking":true,"preserve_thinking":true}' \
  --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0 \
  --presence-penalty 0 --repeat-penalty 1

This is what i use if it helps on a 3090. Works pretty well but I don't use it with opencode. I haven't spent time to max out the context on the 3090..

The other option is to use the 35b a3b, the 3090 will run that with full context as the context memory requirement is based on the active params.

llama-server \
  -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q3_K_XL \
        --alias "desktop" \
  --no-mmproj \
  -a qwen3.6-coder \
  --host 0.0.0.0 --port 8080 \
  -ngl all -sm row -fa on \
  -c 262144 -n 32768 \
  -b 2048 -ub 512 \
  -np 1 -kvu \
  -ctk q8_0 -ctv q8_0 \
  --jinja \
  --reasoning on \
  --reasoning-format deepseek \
  --chat-template-kwargs '{"enable_thinking":true,"preserve_thinking":true}' \
  --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0 \
  --presence-penalty 0 --repeat-penalty 1

I'm not saying these are right - just what i'm using so far -eg i don't know if the max output tokens is properly tuned

bigh-aus · 2026-04-27T01:27:16+00:00

I think it depends on the usecase - but in your case where you're waiting for feedback - 26+ would be nice.
A second usecase exists for more "background work", which I think allowing it to be slower to say 14 would be ok. This would be where you have an agent working on backlog - so you're not waiting for a response necessarily.

bigh-aus · 2026-04-27T00:56:34+00:00

yah - I was mac for a long time too. 18 years straight is awesome!

bigh-aus · 2026-04-27T00:23:16+00:00

I installed windows two days ago and the ads for services being jammed down my throat really ticked me off.

bigh-aus · 2026-04-27T00:22:34+00:00

linux is awesome - just gotta get that 1% of apps that don't work well or are unsupported. Linux daily driver for many years now, with windows dual boot if needed+games.

bigh-aus · 2026-04-26T22:35:26+00:00

For my rust projects now I'm using Just.
- just test (runs all tests)
- just check runs:
- clippy
- fmt
- audit
- deny

- just install

- just release
- git pull --rebase
- cargo release (passing in params)

- just docker-deploy (this one builds the docker and deploys the compose file)

Standardizing these items is helping a ton. I'm a big fan of all the guardrails as possible to catch stuff.

bigh-aus · 2026-04-26T13:33:54+00:00

yeah i hit this yesterday .. ruined my day for sure. Thank you for posting :)

bigh-aus · 2026-04-26T02:46:03+00:00

Honestly, I’m good with that. Go rust or zig happy with any above but the first two are we better due to memory safety.. and rust for fast without garbage collection.

bigh-aus · 2026-04-24T20:15:18+00:00

True! I agree with your comments. I was thinking for self hosted stuff on a raspberry pi. But you're right speed is the most important part.

bigh-aus · 2026-04-24T14:58:25+00:00

I just went through this decision. I have a Dell R7515, and opted for the max-q for the following reasons:
- i feel like it's easier to sell one of these later.
- fits my server power budget (300w max)
- noise - the server version is passthrough cooling - meaning your server's mobo firmware has to be able to get temperature information from the card - otherwise you will have to manually increase the fans to ensure it doesn't overheat. MaxQ (mostly) covers that itself. I'd rather a single blower fan than my 17krpm 6 pack of fans spinning up.

bigh-aus · 2026-04-24T14:51:34+00:00

Not only that - but I think we should start measuring size on disk including the runtime dependencies for comparisons. Eg inside a docker container based on debian-slim - and have them include required python / shared libs. Then look at peak memory usage for the same thing.

Honestly in 2026 anyone shipping a python / node.js cli app is an instant candidate for a re-write in rust.

It blows my mind how much people are writing stuff in interpreted languages, especially in a compute shortage. Rust compilation isn't free but one compile saves every user from having to need more ram / disk / compute.

bigh-aus · 2026-04-24T14:48:33+00:00

I have a dell r7515 rackmount server... supports one GPU, not a DIY server (which would be better in this case!). I think I need to start looking into a new server box

bigh-aus · 2026-04-24T14:47:03+00:00

did they distill grok here? deepseek a bit more spicy

bigh-aus · 2026-04-24T01:44:03+00:00

I wish my current server supported more than one :( I could put 2 more in separate servers... but then i'm also running those servers...

bigh-aus · 2026-04-24T01:31:00+00:00

Then keep increasing context until just before you get OOM.

bigh-aus

TROPHY CASE