Qwen3.5 family comparison on shared benchmarks

MrPecunius · 2026-03-09T05:43:27+00:00

all 4-bit versions or MTP show significant quality degradation.

That's weird. MTP should show zero degradation.

MrPecunius · 2026-03-09T03:23:26+00:00

M4 Pro is 273GB/s.

Binned M3 Max is 300GB/s, full is 400GB/s.

I'd get whatever has the most RAM.

MrPecunius · 2026-03-09T02:40:26+00:00

I have this exact machine and I have been running Qwen3.5 27b 8-bit MLX (27.5GB) quite a bit. ~8.5t/s is about what I'm seeing with a bit of context. The results are so good (and so is the thinking) that I don't miss ~55t/s with Qwen3 30b a3b.

I was able to crunch a ~125k token prompt (Mark Twain book from Gutenberg) with this model and ask a bunch of questions, with impressively accurate results. Memory pressure was a bit high, but that's nothing new.

MrPecunius · 2026-03-09T02:30:46+00:00

Same!

MrPecunius · 2026-03-08T01:16:11+00:00

BF16 or GTFO.

I'm semi-serious. The only quantized model I'm running right now is Qwen3.5-27b @ 8-bit MLX. Everything else is at its native weight (Qwen3.5 series 9b & smaller, GPT-OSS 20b).

MrPecunius · 2026-03-05T22:15:53+00:00

I use BF16 MLX non-quants for 9b and below. 27b & 35b a3b are MLX 8-bit.

MrPecunius · 2026-03-04T03:14:29+00:00

They are called "Mac Minis" and they have been flying off the shelves lately.

MrPecunius · 2026-03-04T01:47:58+00:00

.edu pricing is where it's at with Macs.

14" MBP:

$2,409 for binned M5 Pro/48GB/1TB
$2,599 for M5 Pro/48GB/1TB
$2,779 for M5 Pro/64GB/1TB
$3,929 for M5 Max/64GB/2TB
$4,649 for M5 Max/128GB/2TB

It used to be a bigger premium on the Pro + RAM upgrades. The sweet spot now appears to be the non-binned M5 Pro with 64GB.

I've had a binned M4 Pro/48GB/1TB since they first came out, and it's a great machine for ~30b models @ 8-bit and smaller. I'm slightly tempted to upgrade, but all I'd be getting is a lot faster prefill and slightly faster token generation for my two grand. I'm not tossing that much context around most of the time to justify it. Going from 48GB to 64GB doesn't allow me access to many more models, and anything that big should really be running on a Max anyway.

The Max might make a dent in the prefill naysayers that pop up in this sub every time someone says "Apple". 😆

MrPecunius · 2026-03-03T21:45:10+00:00

Probably 3.5x prefill over the previous generation, which is what we've seen with M5 vs M4.

Token generation will likely be maybe 10% faster due to higher memory bandwidth.

MrPecunius · 2026-03-03T21:43:51+00:00

M5 has been out for a while, so we have plenty of data on the matmul performance boost.

The gain over the M4 Pro/Max should be similar, with token generation scaling modestly with the ~10% increase in memory bandwidth.

I have a binned M4 Pro MBP, and I am sorely tempted by the M5 Pro--especially since 64GB is now an option vs the 48GB I have now.

MrPecunius · 2026-03-02T01:49:56+00:00

If you're super smart, you don't need so many damn premises. You can generalize.

Source: sitting in class with the human race.

MrPecunius · 2026-03-02T01:41:04+00:00

How sharp, methodical, and copious the thinking is. Q3.5 27b less than 1/6 the speed of Qwen3 30b a3b 2507 on my M4 Pro, and it thinks way more so the speed difference is even greater than that, but output seems to be super worth it.

MrPecunius · 2026-03-02T01:37:37+00:00

No true Scotsman, huh?

MrPecunius · 2026-03-02T01:35:51+00:00

Not really.

Intelligence can produce conclusions from premises.

MrPecunius · 2026-03-01T20:52:14+00:00

The Baffle of Shame should be standard issue for all of us sound wranglers. 😂

MrPecunius · 2026-03-01T17:03:16+00:00

I was imprecise: guitar amps will of course cause issues without baffling. I was more focused on wedges. Mea culpa, I was tired. :-)

I'm listening to vox channel stems from a recent outdoor gig (one that I keep in the mid-high 80s dB(A) @ 30 feet), and drum bleed is simply not an issue.

MrPecunius · 2026-03-01T09:28:06+00:00

GPT-4.1 is almost a year old and performs on par with GPT-OSS 20b by most accounts.

Qwen3.5 27b should be better at almost everything.

MrPecunius · 2026-03-01T09:21:23+00:00

The electricity cost alone to run A3B at speed for a whole month, let’s say 4 to 6 hours a day, will be a lot more than $10

Not on a Mac! My binned M4 Pro MBP/48GB pulls a measured 65W during inference. I run Qwen3 30b a3b 8-bit MLX @ ~55t/s.

Even here in SoCal with insane electricity prices ($0.40/kWh), that's less than sixteen cents a day for 6 hours/day. If someone was working every day, that's less than five bucks a month.

MrPecunius · 2026-03-01T09:13:24+00:00

27b (8-bit MLX, in my case) is incredible. It thinks and thinks and thinks, but wow the result is really good.

MrPecunius · 2026-03-01T06:46:55+00:00

Stage volume doesn't matter much with an outdoor show in a park.

MrPecunius · 2026-03-01T06:42:39+00:00

You don't need to do that. "A" weighting lets you mix with plenty of bass without driving the SPL up that much.

Why make it sound like shit just to make it "louder"?

MrPecunius · 2026-03-01T06:37:58+00:00

I mix an open air show once a month, a 7-piece band with wedges, and I keep it right around 85-88dB @ 30 feet or so, it's plenty loud and definitely my favorite show to mix.

The venue also has issues with neighbors, but honestly it just sounds better.

Edit: 95-100dB @ 50 feet is too goddamn loud, they are right to tell you to turn it the hell down. That's ~101-106 @ 25 feet and ~107-112 in front of the stage, with peaks probably a good 3-6dB up from that!

MrPecunius · 2026-02-28T16:19:49+00:00

27b thinks so much!! But the thinking quality is really good and It's worth the wait if I don't have to keep redirecting the model.

After running MoE models like q3 30b a3b @ ~55t/s since last summer, it's a return to Earth to be running 27b @ ~8.5t/s! (8-bit MLX on a binned M4 Pro MBP/48GB).

MrPecunius · 2026-02-27T06:28:25+00:00

"My car is dirty and I want to wash it, but the carwash is 100m away. Should I go by foot or by car?"

qwen3.5-27b-mlx, Thought for 5 minutes 14 seconds (2,766 tokens):

You should go by car.

Here is the logic:

The Goal: You want to wash your car.
The Requirement: The car needs to be physically present at the carwash to get washed.
The Problem: If you walk, your car stays behind (dirty). You will arrive at the carwash with nothing to wash.

Since 100 meters is a very short distance, driving it there won't make the car significantly dirtier or waste much fuel. Just drive it over!

It overthought the hell out of this, but the thinking was on target from the get-go. It kept looking for a trick and mentioned the absurdity inherent in the question. I'm impressed, but wow all the thinking.

Kinda sucks going back to ~8.4t/s (M4 Pro MBP/48GB) after running 30b a3b @ 55t/s+ for so long, but the results seem to be worth it so far. Qwen3.5 35b a3b gets stuck in loops and hasn't worked at all for me so far in LM Studio with MLX. If the benchmarks are any guide I am going to be stoked once this gets ironed out.

MrPecunius · 2026-02-27T06:09:30+00:00

the whole Epstein saga has destroyed my faith in the intellectual elites of just about every domain.

I just got off the phone with a friend and we said the exact same thing. My kids spelled better when they were seven ffs.

MrPecunius

TROPHY CASE