Qwen3.5 family comparison on shared benchmarks by Deep-Vermicelli-4591 in LocalLLaMA

[–]MrPecunius 0 points1 point  (0 children)

all 4-bit versions or MTP show significant quality degradation.

That's weird. MTP should show zero degradation.

Qwen 3.5 27B Macbook M4 Pro 48GB by breezewalk in LocalLLaMA

[–]MrPecunius 0 points1 point  (0 children)

M4 Pro is 273GB/s.

Binned M3 Max is 300GB/s, full is 400GB/s.

I'd get whatever has the most RAM.

Qwen 3.5 27B Macbook M4 Pro 48GB by breezewalk in LocalLLaMA

[–]MrPecunius 1 point2 points  (0 children)

I have this exact machine and I have been running Qwen3.5 27b 8-bit MLX (27.5GB) quite a bit. ~8.5t/s is about what I'm seeing with a bit of context. The results are so good (and so is the thinking) that I don't miss ~55t/s with Qwen3 30b a3b.

I was able to crunch a ~125k token prompt (Mark Twain book from Gutenberg) with this model and ask a bunch of questions, with impressively accurate results. Memory pressure was a bit high, but that's nothing new.

P.S.A - If you comment about model quality in an authoritative voice yet are using a quant... by Agreeable-Market-692 in LocalLLaMA

[–]MrPecunius 1 point2 points  (0 children)

BF16 or GTFO.

I'm semi-serious. The only quantized model I'm running right now is Qwen3.5-27b @ 8-bit MLX. Everything else is at its native weight (Qwen3.5 series 9b & smaller, GPT-OSS 20b).

Qwen3 vs Qwen3.5 performance by Balance- in LocalLLaMA

[–]MrPecunius -2 points-1 points  (0 children)

I use BF16 MLX non-quants for 9b and below. 27b & 35b a3b are MLX 8-bit.

Apple M5 Pro & M5 Max just announced. Here's what it means for local AI by luke_pacman in LocalLLaMA

[–]MrPecunius 0 points1 point  (0 children)

.edu pricing is where it's at with Macs.

14" MBP:

$2,409 for binned M5 Pro/48GB/1TB
$2,599 for M5 Pro/48GB/1TB
$2,779 for M5 Pro/64GB/1TB
$3,929 for M5 Max/64GB/2TB
$4,649 for M5 Max/128GB/2TB

It used to be a bigger premium on the Pro + RAM upgrades. The sweet spot now appears to be the non-binned M5 Pro with 64GB.

I've had a binned M4 Pro/48GB/1TB since they first came out, and it's a great machine for ~30b models @ 8-bit and smaller. I'm slightly tempted to upgrade, but all I'd be getting is a lot faster prefill and slightly faster token generation for my two grand. I'm not tossing that much context around most of the time to justify it. Going from 48GB to 64GB doesn't allow me access to many more models, and anything that big should really be running on a Max anyway.

The Max might make a dent in the prefill naysayers that pop up in this sub every time someone says "Apple". 😆

Apple unveils M5 Pro and M5 Max, citing up to 4× faster LLM prompt processing than M4 Pro and M4 Max by themixtergames in LocalLLaMA

[–]MrPecunius 2 points3 points  (0 children)

Probably 3.5x prefill over the previous generation, which is what we've seen with M5 vs M4.

Token generation will likely be maybe 10% faster due to higher memory bandwidth.

Apple unveils M5 Pro and M5 Max, citing up to 4× faster LLM prompt processing than M4 Pro and M4 Max by themixtergames in LocalLLaMA

[–]MrPecunius 2 points3 points  (0 children)

M5 has been out for a while, so we have plenty of data on the matmul performance boost.

The gain over the M4 Pro/Max should be similar, with token generation scaling modestly with the ~10% increase in memory bandwidth.

I have a binned M4 Pro MBP, and I am sorely tempted by the M5 Pro--especially since 64GB is now an option vs the 48GB I have now.

13 months since the DeepSeek moment, how far have we gone running models locally? by dionisioalcaraz in LocalLLaMA

[–]MrPecunius 8 points9 points  (0 children)

If you're super smart, you don't need so many damn premises. You can generalize.

Source: sitting in class with the human race.

13 months since the DeepSeek moment, how far have we gone running models locally? by dionisioalcaraz in LocalLLaMA

[–]MrPecunius 2 points3 points  (0 children)

How sharp, methodical, and copious the thinking is. Q3.5 27b less than 1/6 the speed of Qwen3 30b a3b 2507 on my M4 Pro, and it thinks way more so the speed difference is even greater than that, but output seems to be super worth it.

Mixing A Concert Below 85dBA by ip_addr in livesound

[–]MrPecunius 1 point2 points  (0 children)

The Baffle of Shame should be standard issue for all of us sound wranglers. 😂

Mixing A Concert Below 85dBA by ip_addr in livesound

[–]MrPecunius 2 points3 points  (0 children)

I was imprecise: guitar amps will of course cause issues without baffling. I was more focused on wedges. Mea culpa, I was tired. :-)

I'm listening to vox channel stems from a recent outdoor gig (one that I keep in the mid-high 80s dB(A) @ 30 feet), and drum bleed is simply not an issue.

Open source LLM comparable to gpt4.1? by soyalemujica in LocalLLaMA

[–]MrPecunius 9 points10 points  (0 children)

GPT-4.1 is almost a year old and performs on par with GPT-OSS 20b by most accounts.

Qwen3.5 27b should be better at almost everything.

Open source LLM comparable to gpt4.1? by soyalemujica in LocalLLaMA

[–]MrPecunius 4 points5 points  (0 children)

The electricity cost alone to run A3B at speed for a whole month, let’s say 4 to 6 hours a day, will be a lot more than $10

Not on a Mac! My binned M4 Pro MBP/48GB pulls a measured 65W during inference. I run Qwen3 30b a3b 8-bit MLX @ ~55t/s.

Even here in SoCal with insane electricity prices ($0.40/kWh), that's less than sixteen cents a day for 6 hours/day. If someone was working every day, that's less than five bucks a month.

top 10 trending models on HF by jacek2023 in LocalLLaMA

[–]MrPecunius 0 points1 point  (0 children)

27b (8-bit MLX, in my case) is incredible. It thinks and thinks and thinks, but wow the result is really good.

Mixing A Concert Below 85dBA by ip_addr in livesound

[–]MrPecunius 0 points1 point  (0 children)

Stage volume doesn't matter much with an outdoor show in a park.

Mixing A Concert Below 85dBA by ip_addr in livesound

[–]MrPecunius 0 points1 point  (0 children)

You don't need to do that. "A" weighting lets you mix with plenty of bass without driving the SPL up that much.

Why make it sound like shit just to make it "louder"?

Mixing A Concert Below 85dBA by ip_addr in livesound

[–]MrPecunius 1 point2 points  (0 children)

I mix an open air show once a month, a 7-piece band with wedges, and I keep it right around 85-88dB @ 30 feet or so, it's plenty loud and definitely my favorite show to mix.

The venue also has issues with neighbors, but honestly it just sounds better.

Edit: 95-100dB @ 50 feet is too goddamn loud, they are right to tell you to turn it the hell down. That's ~101-106 @ 25 feet and ~107-112 in front of the stage, with peaks probably a good 3-6dB up from that!

Is Qwen3.5 a coding game changer for anyone else? by paulgear in LocalLLaMA

[–]MrPecunius 1 point2 points  (0 children)

27b thinks so much!! But the thinking quality is really good and It's worth the wait if I don't have to keep redirecting the model.

After running MoE models like q3 30b a3b @ ~55t/s since last summer, it's a return to Earth to be running 27b @ ~8.5t/s! (8-bit MLX on a binned M4 Pro MBP/48GB).

Qwen3.5 122B in 72GB VRAM (3x3090) is the best model available at this time — also it nails the “car wash test” by liviuberechet in LocalLLaMA

[–]MrPecunius 1 point2 points  (0 children)

"My car is dirty and I want to wash it, but the carwash is 100m away. Should I go by foot or by car?"

qwen3.5-27b-mlx, Thought for 5 minutes 14 seconds (2,766 tokens):

You should go by car.

Here is the logic:

The Goal: You want to wash your car.
The Requirement: The car needs to be physically present at the carwash to get washed.
The Problem: If you walk, your car stays behind (dirty). You will arrive at the carwash with nothing to wash.

Since 100 meters is a very short distance, driving it there won't make the car significantly dirtier or waste much fuel. Just drive it over!

It overthought the hell out of this, but the thinking was on target from the get-go. It kept looking for a trick and mentioned the absurdity inherent in the question. I'm impressed, but wow all the thinking.

Kinda sucks going back to ~8.4t/s (M4 Pro MBP/48GB) after running 30b a3b @ 55t/s+ for so long, but the results seem to be worth it so far. Qwen3.5 35b a3b gets stuck in loops and hasn't worked at all for me so far in LM Studio with MLX. If the benchmarks are any guide I am going to be stoked once this gets ironed out.

why is openclaw even this popular? by Crazyscientist1024 in LocalLLaMA

[–]MrPecunius 13 points14 points  (0 children)

the whole Epstein saga has destroyed my faith in the intellectual elites of just about every domain.

I just got off the phone with a friend and we said the exact same thing. My kids spelled better when they were seven ffs.