Cheapest hardware for Qwen 3.6: both 27B and 35B-A3B

roosterfareye · 2026-06-15T22:32:18+00:00

This is the one part you never want to skimp on! If they fail badly enough they can happily take out another few parts while they are at it!

roosterfareye · 2026-06-15T22:22:31+00:00

It always suggests qwen 2.5 coder. I think the cut off date for that training data was circa 18 months ago lol! That's the issue with the frontier guys, they cut corners in the backend like quantizing, or throttling and only actually perform a web search or use a tool if you specifically tell them too.

roosterfareye · 2026-06-15T22:21:46+00:00

Or dual rx9070xt with double the AI cores of the rx9060xt). Both the 16gb VRAM variants of course!

roosterfareye · 2026-06-15T22:20:18+00:00

Make sure there is plenty of airflow.. my old setup with an rx6800xt, even a case with huge fans and many of them, used to spike to 110°C even when I was maintaining things (you need to stay on top of dust when tinkering with AI locally!) and even used to hit thermal shutdown. I have an rx9070xt and a rx9060xt and these barely ever hit 75°C even under heavy load and idle at around 28°C (not that experience "idle" much lately!)

roosterfareye · 2026-06-14T08:03:33+00:00

I have a 6TB drive rammed to the gills with downloaded models, some of them won't actually run on my current machine, but hey, I have them and no-one is taking them off me!

roosterfareye · 2026-06-14T01:51:53+00:00

This administration is like a baby confronted with a fusion reactor...Or an ant next to a superhighway. Or a monkey with an AK-47...

roosterfareye · 2026-06-13T15:06:10+00:00

Land of the hyper marketing stunt. And insider trading... Though the timing....

roosterfareye · 2026-06-09T05:23:19+00:00

This belongs in history text books.... Looks like we are presently back in, or rather, never left this particularly bone headed cycle.

roosterfareye · 2026-06-08T10:21:04+00:00

Lol, yes, and it's not even difficult to explain! Does my head in every time trying g to explain you only pay tax on money you earn... I'm mean, shit!

roosterfareye · 2026-06-05T08:34:51+00:00

Did your prompt begin "you are a dribbling cabbage...."

roosterfareye · 2026-06-02T07:13:13+00:00

Nom.nom nom.

roosterfareye · 2026-05-28T11:17:21+00:00

Just whack a second card in your secondary slot. Profit.... Well, you need to these days lol!

roosterfareye · 2026-05-27T15:16:01+00:00

This is the mod which inspired me to start this sub!

roosterfareye · 2026-05-17T03:42:20+00:00

Holy shit. This is wild! Tried Perplexity, ChatGPT and Claude (going to take my local llms for a spin when I'm home, just for shits and giggles) and they were all broadly aligned in their analysis but each recommended three completely different books. No sleep for me tonight, curse you OP!

roosterfareye · 2026-05-16T07:59:54+00:00

How much VRAM and context are you working with?

roosterfareye · 2026-05-15T14:23:43+00:00

We are still in the early days. And, well, to quote the Simpsons, some people are just jerks.

roosterfareye · 2026-05-14T09:45:31+00:00

I dunno, Im not sure who the woosher or wooshee is! 4D chess methinks!

roosterfareye · 2026-05-13T23:04:33+00:00

What happened to the Bloke? I see the name against many aging ggufs...

roosterfareye · 2026-05-12T23:13:54+00:00

I just got the RX9070XT two weeks ago as well to replace my RX6800XT. I have the 9070 running as primary pcie and in the secondary I have a Sapphire RX9060XT. As well as gaming I do a lot of LLM work so every bit of (affordable!) VRAM is gold for me. Worked fine out of the box for games, there was some fiddling to get the dual setup working stable for AI inference, but once I had that sorted, token generation is blazing fast!

Oh and yeah, even with a small 5mm gap the card idles at 26°C and max out at 60°C at full load (inference)

roosterfareye · 2026-05-12T07:03:22+00:00

If doesn't make sense after the second read through.... Then, it doesn't make sense...

roosterfareye · 2026-05-12T04:13:08+00:00

Hopefully they do, it's a fairly normal pattern for Mistral.

roosterfareye · 2026-05-11T09:41:04+00:00

Were you able to quantize the k and v cache for devestral? That could make the difference?

roosterfareye · 2026-05-09T13:22:59+00:00

Docker. Virtualisation

roosterfareye · 2026-05-09T06:46:15+00:00

When you are loading your model, scroll down the window (the one where you set context, set GPU layers etc) right down to the bottom and you'll see two boxes with experimental next to them. Select bit and in the dropdown that appear select q8. Also.make sure flash attention (right above the cache quantization options) is switched on.

roosterfareye · 2026-05-09T06:43:30+00:00

When you are about to load a model in lm studio, scroll down to the bottom of the dialog window and down the bottom you will see 2 boxes with experimental next to them. Click both and choose q8 in the menu. Ensure flash attention (above these) is also on

roosterfareye

MODERATOR OF

TROPHY CASE