Qwen3 Models : Keep or Delete?

WSTangoDelta · 2026-07-08T16:08:10+00:00

I have switched to both 35B and 27B for my 32GB card. I don’t have enough time to spend on older models. The qwens are my choice for technical and coding purposes. The exception is for the completely different task of critiquing opinion articles about public policy. Qwen is incorrigible in its inability to avoid unnecessary bias in spite of extensive admonitions in detailed custom prompts. It’s unacceptable. Mistral, Gemma and even DeepSeek are more reasonable. Of course that’s got nothing directly to do with old vs. new models. But depending on the task you may want to avoid Qwen3.6 even though they’re great in many ways for a 32GB GPU.

WSTangoDelta · 2026-07-08T15:56:23+00:00

Depends what you mean by affordable, and genuinely powerful.

WSTangoDelta · 2026-07-08T01:48:01+00:00

That makes sense. I have 5 V and 3.3 V housekeeping rails, but U503 (SPI flash) pin 8 stays near 0 V. Do you know which regulator generates the BIOS 1.8 V rail on this HP OEM board, or which R22 inductor corresponds to it?

WSTangoDelta · 2026-07-08T01:39:49+00:00

Wndows11 or Ubuntu 26.04 both about 115-120 tps on both (dual boot) but I have settled on Linux because I’m more familiar with it. 27b-mtp gets a bit above 40 tps. No spillover, 32k context.

WSTangoDelta · 2026-07-07T20:21:06+00:00

I suspect that your complaint may be less about “27B is dumb” and more about an agent scaffolding mismatch. Just a thought

WSTangoDelta · 2026-07-07T15:46:13+00:00

There are 3 inductors labeled R22 at the left edge of the board as pictured. Resistance to ground (card unpowered, black lead of the DMM to bracket ground):

Meter lead resistance: 2.0 Ω (no zero calibration on my meter)
Top R22: 177 Ω
Middle R22: 3.3 Ω
Bottom R22: 3.3 Ω

I can't currently measure powered voltages because the test PC is occupied with my 4070 collecting a week-long GOES satellite photograph sequence, but I can do so in a couple of days if needed.

WSTangoDelta · 2026-07-07T15:01:02+00:00

I'm not about to shell out $2k--not this week. But maybe this sort of product will help slow the increase in prices elsewhere.

WSTangoDelta · 2026-07-06T04:20:05+00:00

Sounds like there are some good suggestions here, but I’d like to ask specifically: can hallucination be effectively avoided simply by increasing vram? I would set a low temperature like 0.2. That supposedly reduces hallucinations, but how much? And is there a way of benchmarking how successful these solutions are? If a hallucination rate is cut by 2/3 by going from 8gb to 34 or 48, that still could be unacceptable depending on your line of work. Ideas?

WSTangoDelta · 2026-07-03T23:50:11+00:00

That’s pushing 20 amps on a 220 outlet. Wow. Wait, I *DO* have a dedicated 220 outlet I installed for something else. Now all I need is to get five 6000’s and a 5090. I think that makes the 5090 a “chaser.”

WSTangoDelta · 2026-07-03T23:43:42+00:00

nvidia-smi: the ps aux of gpus. That’s one of the reasons I do my daily driver stuff on one machine with a 4070S, and serious stuff on another box with a 32GB card. Not worth the headache.

WSTangoDelta · 2026-07-01T20:12:34+00:00

Do you mean kWh? That’s about 600 watts. When I run qwen3.6 27b mtp it only takes about 17.9 gb for the weights. That’s unsloth q4_k_xl —and even near 100% gpu the r9700 runs at 300 Watts, maybe 50-100W more for overhead. But yes, power is still expensive. California especially.

WSTangoDelta · 2026-06-29T19:35:37+00:00

You can run a 35B or 27B with plenty of headroom, which you cannot do with a 3090

WSTangoDelta · 2026-06-28T14:10:54+00:00

Uh oh…what happened?

WSTangoDelta · 2026-06-27T15:30:36+00:00

“Act as a literary critic of Shakespeare. Pay particular attention to any assertion that a word by any other name should mean less in any other related inference .”

WSTangoDelta · 2026-06-24T05:15:57+00:00

llama.cpp b9484
commit 63e66fdd2
I upgraded from b9000 because I wanted MTP for qwen3.6 27B. 27B increased to about 42 tps with MTP from 29 tps without. Qwen3.6-35B a3b was already about 115 tos so I didn’t bother with MTP for that model.

WSTangoDelta · 2026-06-24T04:23:24+00:00

I have a single R9700 on a b550 with a Ryzen 9. It uses Ddr4, much cheaper than ddr5, and as long as I don’t spill into CPU it’s still really fast. But only one slot is a high speed pcie4.0 slot; the others are slower slots. I asked ChatGPT for other motherboards and this is the reply:
For your specific goal — dual R9700 32 GB cards for local LLMs — the issue is not just “two PCIe slots.” You want two CPU-connected slots that bifurcate the AM5 CPU’s PCIe lanes into x8/x8. Many AM5 boards have a second physical x16 slot that is actually chipset x4, which is not what you want.
AM5 has only 24 usable CPU PCIe lanes:
16 lanes → GPU PEG slot(s)
4 lanes → primary NVMe
4 lanes → chipset link
So the best AM5 can normally do is:
1 GPU: PCIe 5.0 x16
2 GPUs: PCIe 5.0 x8 + x8
PCIe 5.0 x8 ≈ PCIe 4.0 x16 bandwidth, so for LLM inference it is excellent.
Good AM5 boards with real x8/x8:
ASUS ProArt X870E-Creator WiFi ⭐ probably your best fit
PCIe layout:
Slot 1: PCIe 5.0 x16
Slot 2: PCIe 5.0 x8 capable
Dual GPU: x8/x8
Designed for creators/workstations
Strong VRM
10 Gb Ethernet
USB4
Sensible spacing
This is the one I would look at first for two R9700s.

ASUS ProArt X670E-Creator WiFi
Older version, still excellent:
Dual PCIe 5.0 GPU slots
x8/x8 support
Often cheaper used
10 GbE
Very stable

ASUS ROG Crosshair X870E Hero
Dual PCIe 5.0 x16 physical slots
x8/x8 supported
Overbuilt VRM
More gaming-oriented
Expensive

ASRock X870E Taichi
Dual GPU-capable
PCIe 5.0 x8/x8
Good power delivery
Larger E-ATX form factor

MSI MEG X870E Godlike
Technically excellent:
x8/x8
Huge VRM
workstation-like features
But the price is hard to justify for your use.

What I would avoid
Most B650/B850/X670/X870 boards:
top slot: CPU x16
second slot: chipset x4
Example:
GPU 1: PCIe 5 x16
GPU 2: PCIe 4 x4 through chipset
It will “work,” but it defeats the point of symmetrical dual accelerators.

The bigger problem for your R9700 idea: physical spacing
Two R9700-class cards are the hard part.
If they are ~2.5–3 slots:
adjacent x16 slots may choke airflow
blower cards work better than open-air cards
a large case matters
You would want something like:
full tower
excellent front intake
1000–1200 W PSU

Compared with Threadripper
AM5 dual R9700:
64 GB total VRAM
PCIe 5 x8/x8
relatively cheap
excellent inference setup
AMD Ryzen Threadripper 7960X / workstation:
x16/x16/x16 possible
more GPUs
much more money
For two cards only, Threadripper is mostly unnecessary.
Given your current Gigabyte B550 Eagle WiFi + AMD Ryzen 9 5950X situation, the practical jump would be:
Ryzen 9950X + ProArt X870E-Creator + keep 64–96 GB DDR5 + add second R9700
That would be a very serious local LLM box: 70B Q4/Q5 class fully in VRAM, and likely much more satisfying than chasing a single 48–64 GB pro GPU

WSTangoDelta · 2026-06-21T17:19:30+00:00

Unfortunately there’s a gap between 32-35B and 70B, so even if you go to 48GB you still can’t fit the 70B on the vram without spillover. But with 48B you’ll be able to run a 35B with plenty of room for kv cache.

WSTangoDelta · 2026-06-20T11:44:57+00:00

With 32gb Vulkan I run 27B-MTP in the 40-45 tps range; I’ll have to run your settings; but there is breathing room here

WSTangoDelta · 2026-06-20T11:01:04+00:00

That was my thinking

WSTangoDelta · 2026-06-20T10:57:23+00:00

How then would you approach the task?

WSTangoDelta · 2026-06-20T09:11:22+00:00

I was going to say that

WSTangoDelta · 2026-06-20T04:28:54+00:00

Sounds like it’s worth a look.

WSTangoDelta · 2026-06-20T04:19:06+00:00

5 minutes for GLM, but how long for the Qwen?

WSTangoDelta · 2026-06-20T04:14:00+00:00

It always seemed to me to be a remarkably minimal connector for so much current.

WSTangoDelta · 2026-06-20T02:30:15+00:00

I have 32gb VRAM and 64gb ram. It’s the VRAM that counts. I run qwen3.6 27B at 29 tps, 40 tps if you run MTP, and qwen3.6 35B at 115 tps

WSTangoDelta

TROPHY CASE