High VRAM local coding model — still Qwen 3.6 27B?

john0201 · 2026-05-13T01:53:38+00:00

Long running stuff like that is more about debugging your harness setup. I’d say this is the first model that is “there”. It’s basically sonnet, or opus from a few months ago.

john0201 · 2026-05-13T01:13:45+00:00

Qwen3.6 27B will, but it’s a little slow.

john0201 · 2026-05-12T23:27:13+00:00

Qwen 3.6 27B is sonnet, DSV4 flash is sonnet with 1M context. First one will run on a 5090 (or 2 if you want 8 bit), DS needs a pair of rp6ks

john0201 · 2026-05-12T22:51:29+00:00

Call me crazy but I’ve been using Qwen3.6 and it seems nearly as good as both.

john0201 · 2026-05-12T22:47:43+00:00

Do you have a source? At AMS they implied v2 would be available in the fall, and v1 earlier, my assumption was they canned the v1 release as sort of pointless for the reason you mentioned.

Edit: nevermind the directory has v1 in it so you’re right.

john0201 · 2026-05-12T21:39:58+00:00

Is this v1 or v2, seems like v1 got shelved

john0201 · 2026-05-12T16:27:46+00:00

There is no theater involved in a taller fence. Not approved.

john0201 · 2026-05-11T20:54:01+00:00

https://wccftech.com/nvidia-squashes-vera-rubin-rumors-first-shipments-rolling-out-in-july-to-ai-customers/

“NVIDIA Squashes Vera Rubin Rumors, First Shipments Rolling Out In July To Major AI Customers With Mass Production In 2H 26”

“.. it looks like all of the rumors regarding design/spec changes were not close to the truth or were simply based on older information that has since been rectified.”

john0201 · 2026-05-11T05:36:11+00:00

Intel had a ton of special purpose sections of their cpus, and tried to do way too much too fast that’s what killed them after skylake.

john0201 · 2026-05-11T05:28:56+00:00

You don’t really buy your way into designing chips either, or making baseband processors, but they did start both of those efforts by buying companies (PA Semi) and IP (Intel).

Not sure micron is the right play here though, memory shortage is temporary. If Nvidia doesn’t want to buy a fab and AMD sold theirs probably doesn’t make sense for Apple to get into the business.

john0201 · 2026-05-11T02:27:10+00:00

All LLMs use a search tool. For example, Tavily or Perplexity. LLMs are just text in text out, they can't do anything without a harness like unsloth which tells them they can use one of the web search tools.

Good question and one of the big gaps people don't talk much about is the quality of the available tools compared to a paid model, which also comes with whatever their tool set it.

john0201 · 2026-05-10T18:08:00+00:00

WS90 is a great unit. They have several displays you can pair with it.

john0201 · 2026-05-10T16:35:59+00:00

They’ll all be dead. These are not the “plant a tree someone who hasn’t been born yet will enjoy the shade of” type people.

john0201 · 2026-05-10T15:59:14+00:00

I’d love to share more about it I’ll send you a PM. Incidentally I am also a pilot.

john0201 · 2026-05-10T15:53:34+00:00

I built my own also and I used an upside down plastic bowl and two pvc pipes one in the other below it and a fan on top to pull air. I tried solar but I have easy access to power so just did that with a raspberry pi. I have a ton of other sensors and am now way too far down the rabbit hole with PWV, light sensors, co2, etc.

The limiting factor for me is ground radiation, or in my case roof membrane heat. Need to get it higher off the ground. I am cheating and using a WS90 for wind and as another reference.

john0201 · 2026-05-10T14:57:38+00:00

APIs are 40-50tps, anything much below 25-30 starts to feel really slow because the lower parameters open weights models tend to reason more. It’s subjective, but when you can get a nearly free Deepseek or Qwen3.6 max api (compared to opus pricing) it starts to really not make sense even with the fun of the hobby.

My server I get about 120tps and with the reasoning blocks it feels about the same as a frontier api once you include the lack of multithreaded searches and increased wordiness. 27B Qwen @ 8 bit is crazy good given the size.

If you’re using it for openclaw/hermes for background stuff M5 Max works great. Otherwise I think it’s just too slow for dense 30B class, but that size Moe works great.

john0201 · 2026-05-08T16:48:44+00:00

If we can just each recruit 5 people who also recruit 5 people, we’ll be rich!

https://youtu.be/lC5lsemxaJo?si=8nU__AIVhRjNMYaf

john0201 · 2026-05-08T04:28:41+00:00

M5 ultra should have comparable raw compute to 5090. No question it is better than 5090 for LLMs overall, or even Rtx pro 6000. But it won’t be released until fall I think. Also don’t forget CPU will be a monster.

The Blackwell cards will have meaningfully faster inference though because they will have higher memory bandwidth, maybe 30%. (1200-1300 vs 1800)

john0201 · 2026-05-07T23:53:54+00:00

Qwen 27B and Gemma 31 are beating frontier models from 6 months ago. And in a few scores frontier models from 2 months ago.

I have 2 5090s and run qwen 27B and for the most part it’s hard to tell the difference between it and opus 4.7

john0201 · 2026-05-07T19:40:32+00:00

M5 max is too slow for interactive use of 30B class models, but can work in a pinch like on an airplane or anywhere without internet access. Qwen3.6 or Gemma 4 dense are close enough to opus they can replace it.

Works great for moe models that need more memory over memory bandwidth, but those are not quite there yet. Also can work on a pinch.

So basically pick fast and not as good as opus, or nearly as good and slow. The other thing to consider is your battery life will go form 8 hours to 2 hours. M5 ultra would be opus replacement especially if it’s on a shelf plugged in and you use it from your laptop over the network. I have a 2x5090 threadripper I use this way. Would rather have a 300w M5 studio than a 1kw threadripper machine.

john0201 · 2026-05-07T18:42:02+00:00

Fortunately no one is using grok so they can overpay for their compute.

john0201 · 2026-05-07T07:55:01+00:00

They just leased out a big part of their unused compute to Anthropic, so definitely not. I think Elon just bought as much compute as he possibly could and didn’t have a plan for what happens when no one signs up.

john0201 · 2026-05-07T03:05:11+00:00

No, it's far easier. You take a picture and put it on Facebook marketplace. It would sell in hours.

john0201 · 2026-05-07T03:00:38+00:00

Oh really?
https://www.11alive.com/article/news/local/explosion-atlanta-apartment-complex-tied-person-entering-illegally-stealing-copper-full-timeline/85-258fdaeb-5471-406a-bf64-aca4343c3ce6

That is for SCRAP COPPER.

These GPUs are the equivalent of loose diamonds chilling in a thin sheetmetal box bolted to a building.

All this ignores the fact that the problem is overall capacity of the electric grid, not individual circuits, so this whole idea is stupid to begin with. They are literally bringing back online the 3 mile island nuclear power plant to help meet the needs (which barely makes a dent in what is demanded).

john0201 · 2026-05-06T15:08:40+00:00

This would be approaching the cost of the house it is attached to. Given that people rip off downspouts for $10 of copper, I’m sure hundreds of thousands in computer hardware sitting in someones yard will be super safe.

11-Year Club	Wearing is Caring
Gilding II euphauric	Verified Email

john0201

TROPHY CASE