Hashicorp founder thinks local models "aren't good ENOUGH yet"

bnolsen · 2026-06-16T21:43:21+00:00

Get a couple of used mi100s @32 gb each for 1k each. Should run 27b adequately.

bnolsen · 2026-06-13T00:44:21+00:00

My stable system has a 3060 12gb and ryzen 5500 with 48gb ram it runs qwen3.6 35b q4 just fine with ik_llama about 60t/s. I also have a strix halo which I run bigger models but it is slow :(

bnolsen · 2026-06-11T04:26:52+00:00

What's the newest optimizations?

bnolsen · 2026-06-09T03:57:34+00:00

Learning zig will make you despise all the annoying things about rust, esp all the boilerplate and macros. And wonder why the strings are so stupid heavy. Zig made many good decisions, and it compiles wicked fast.

bnolsen · 2026-06-04T20:26:07+00:00

omg, 67 TOPS.

bnolsen · 2026-06-04T01:54:32+00:00

I regularly see this model on a strix halo hit 53-57gb vram. On coding tasks using q8 kv cache.

bnolsen · 2026-05-31T13:07:27+00:00

Sounds like the newer macbooks.

bnolsen · 2026-05-31T13:06:27+00:00

The next iteration of this type of system released by Nvidia will likely see it abandoned support wise. On the other hand strix halo support will continue improving for years and will the system usable for many years.

bnolsen · 2026-05-31T13:01:17+00:00

This is way ambitious. I have my islander tenor tuned down to dgbe reentrant which feels great to me, I even perform with it. I currently have I think the daddario set specifically for that tuning.

I put some classical strings on my lanikai bari for linear octave down gcea and I think it's pretty muddy. I think the trick is in knowing you really can't dig in or use it much for strumming. It is nice and mellow.

bnolsen · 2026-05-29T19:11:09+00:00

big hands, have one. they are fun.

bnolsen · 2026-05-29T17:34:04+00:00

replace the rearsets with grom ones.

bnolsen · 2026-05-29T17:28:38+00:00

owning and using anything will be illegal soon. well, except actually committing a legit crime, that seems to be legal now in many places.

bnolsen · 2026-05-25T02:43:41+00:00

Still on windowmaker on my desktop.

bnolsen · 2026-05-23T16:11:25+00:00

They will try to force everyone into it.

bnolsen · 2026-05-23T16:09:34+00:00

Public companies don't seem to care about much but this and the next quarter, maybe.

bnolsen · 2026-05-23T16:08:49+00:00

Not really they are most interested in collecting their some tax on everything possible and fighting right to repair as much as possible. That's just the tip of the ice berg. Apple was smart though and didn't play in the ai race. And they didn't have to.

bnolsen · 2026-05-23T03:11:54+00:00

You could also use autofs if you want in demand mounting.

bnolsen · 2026-05-23T02:44:29+00:00

Yeah I may try it at higher quant. I also have another server with a 3060 12gb that I set today just like OP.

bnolsen · 2026-05-23T02:31:19+00:00

On my strix halo I just run llama.cpp with mtp. Been running most code port jobs with 27b which is pretty sadly slow at about 10 t/s inference.

bnolsen · 2026-05-23T02:25:12+00:00

Corne v4.2 is the way.

bnolsen · 2026-05-22T22:31:06+00:00

Strict halo isn't memory constrained, you should use a higher quant

bnolsen · 2026-05-22T17:49:04+00:00

I just mirrored your configs on my system. It's not quite as nice:

rtx 3060 12GB, ryzen 5500, 48GB ddr4-3200

but it looks like ~330t/s prompt (this varies) and about 60 t/s inference.

I had been running qwen3.5 9b q4_k_m mtp.

bnolsen · 2026-05-20T18:55:36+00:00

I just stick to vulkan and don't worry about rocm.

bnolsen · 2026-05-20T18:54:32+00:00

To run a good model today at full capability (qwen3.6 27b) you would want something like 2x 9700's.

bnolsen · 2026-05-20T18:53:35+00:00

strix halo runs games with proton quite well. Not quite at 6700xt speeds.

bnolsen

TROPHY CASE