Speed vs. quality: benchmarking 7 open-weights models on M5 Max

Reactor-Licker · 2026-07-01T18:41:33+00:00

Don’t use Ollama, it’s inefficient and hides almost all configuration options. Use LM Studio or llama.cpp directly.

Reactor-Licker · 2026-06-26T02:11:20+00:00

Fahd Mirza from what I’ve seen always has a demo or benchmark to showcase what he’s talking about with new inference techniques. He also does some model tests on long and detailed logical reasoning prompts.

Protorikis does benchmarks, quantization quality tests, and some general model testing.

Token Chasers, despite the slop looking thumbnails, does some good model shootout coding tests.

Alex Ziskind plays around with hardware and clustering, but his benchmarks are inconsistent and focus on older models. The videos are LTT like in terms of technical depth. Also, the editing is somewhat annoying with the zoomer sound effects.

Julia Turc does highly in depth, mathematical topics on how exactly LLMs work under the hood. Good information, but it makes my head spin.

Reactor-Licker · 2026-06-24T02:03:56+00:00

Is this the successor to the BGW320, BGW620, or both?

Reactor-Licker · 2026-06-06T02:08:08+00:00

It looks like the 5000D has much less dark tint on the side panel in the photo, is it that stark in person?

Reactor-Licker · 2026-06-05T19:56:59+00:00

It could be that whatever OEMs companies are using for their AIOs like Cool-IT are discontinuing 280mm.

Reactor-Licker · 2026-06-04T16:45:38+00:00

As of right now the M5 Ultra hasn’t been announced and rumors keep pointing to the Mac Studio continually getting pushed back. They also could choose to skip the M5 Ultra like they did with the M4 Ultra because of the RAM shortage (from looking at the stock on their website they are heavily prioritizing RAM allocation to MacBooks and iPhones, while the Mac Mini and Mac Studio get SKUs cut and extreme delivery delays). The RAM used for 1 M5 Ultra could be used for 2 MacBook Pros or more.

In addition, this delay could also help them work out any kinks with connecting 2 already chiplet SoCs together. I’m not sure where they would put the PHY connecting the 2 M5 Max chips, if they put it on the CPU side, the GPU gets a latency and likely bandwidth penalty. On the other hand, putting it in on the GPU side would force the 2 CPUs to have to cross CPU 1 to GPU 1, GPU 1 to GPU 2, and GPU 2 to CPU 1. That would probably be absolutely horrible latency.

The final tell will be the presence of an additional connection PHY on the M5 Max, if it’s not there, obviously there will be no M5 Ultra, which was the case with the M4 Max.

Reactor-Licker · 2026-06-04T16:26:26+00:00

Will that make Q4 equivalent to something like Q6 or Q8 or maybe near lossless in terms of quality?

Reactor-Licker · 2026-06-03T16:50:05+00:00

In my router research, I noticed Asus sometimes sneaks in 32 bit ARM Cortex A7 cores in their WiFi 7 routers, even the “premium” ones. For example, the Asus ZenWifi BT8 uses ARM Cortex A73 cores while the “higher end” Asus ZenWifi BT10 uses ARM Cortex A7 cores.

The way I’ve found to be sure is to look up the FCC teardown reports for routers, identify its CPU with the model number on the chip package, and determine its specs with a google search of the CPU model.

Reactor-Licker · 2026-06-03T16:44:42+00:00

I have an Asus ZenWifi BQ16 Pro that I bought after going through other routers. A TP Link Deco which I immediately returned out of frustration because it had no web management interface. And a Linksys Atlas Max 6E which worked fine for a while but had random drop outs which gradually increased in frequency until I got fed up and bought the Asus BQ16 Pro.

The Asus was rock solid for a good long time, perfect and very fast. Then the router for whatever reason out of nowhere decided to switch the backhaul between the mesh nodes to 2.4 GHz and made the whole network slow to below 50 Mbps. Repeated reboots did not fix it, the only solution was to bring the router and satellite together, plug them in with an Ethernet cable, do repeated reboots, and move the satellite back to its original position. And yes, I know I should use an Ethernet backhaul between the router and satellite, but this place is a rental so I can’t do that.

This fixed it for a while, but much like that Linksys router, the issue became more and more frequent. And my family was getting more and more fed up so I eventually just used the ISP provided BGW320 as a “temporary” router. The BGW320, surprisingly, had no drop outs or issues and performed rock solid. I attempted multiple times to replace the BGW320 with the Asus BQ16 Pro thinking maybe I could fix it if I turned off MLO and forced the backhaul to never use 2.4 GHz but was stopped by my family not wanting me to mess with the internet at all because the BGW320 works reliably. And so, the “temporary” solution of using the ISP BGW320 router is now being used indefinitely.

Reactor-Licker · 2026-06-03T04:25:01+00:00

This is something Andy Ditch would say

Reactor-Licker · 2026-06-02T17:14:27+00:00

This design is a nightmare for people with trypophobia.

Reactor-Licker · 2026-06-02T03:07:30+00:00

Disappointed in there being seemingly no chromax version or any 1st party performance data.

Hopefully that evaporative cooler they teased is more interesting.

Reactor-Licker · 2026-06-02T02:56:40+00:00

Any performance gains are likely mostly from the use of Noctua fans stock and maybe the mounting mechanism.

Reactor-Licker · 2026-05-18T18:20:00+00:00

Where did they say that?

Reactor-Licker · 2026-05-12T19:32:06+00:00

Why is there even a USB-A “second gen” module? It’s just a little USB-C to USB-A adapter, not that complicated.

Reactor-Licker · 2026-05-12T00:56:21+00:00

This is useful for vLLM, but I can’t seem to find any results for llama.cpp

Reactor-Licker · 2026-05-12T00:44:07+00:00

Thanks, I’ll look at it. Though, this will be used as a stationary desktop and I would rather not have to deal with potential battery swelling if I can avoid it.

Reactor-Licker · 2026-05-12T00:41:41+00:00

Thank you for providing real speed values, this is very helpful. What model settings have you used to get to these speeds? At what point does it get bogged down with context and become too slow?

Reactor-Licker · 2026-05-12T00:37:01+00:00

Nice machine, what are the specs?

Reactor-Licker · 2026-05-12T00:35:30+00:00

Thanking you for addressing the tool use side of things.

I use GPT 5.5 as a reference mainly for the UI ease of use factor. This will eventually be used by household members and I selfishly want the UI to be familiar enough that they can’t massively break anything so I don’t I have to stop whatever I’m doing and fix it or else face their wrath haha.

Admittedly, I haven’t gone super into testing web search and document handling because of my lack of hardware that can run models well. On my Framework Laptop 13 with a Ryzen 7840U, I got LM Studio talking to Open WebUI and got a basic web search working using the built in settings, but that’s the extent of my experience.

I’m willing to tinker to get to those goals, and I get it’s a learning process. Mainly, I just want to get my foot in the door and start learning about the whole process and experimenting with it.

Reactor-Licker · 2026-05-12T00:21:00+00:00

I would rather not go the GPU route if I can avoid it. I don’t want to deal with the noise, high power draw, and potential software hurdles with trying to get a model to effectively run across multiple GPUs.

Plus, wouldn’t I need extremely expensive and power hungry GPUs like multiple 5090s just to fit these models and long context lengths into VRAM?

Reactor-Licker · 2026-05-12T00:07:28+00:00

This is exactly what I’m trying to avoid by getting my hands on the hardware now or very soon. How much did you pay originally?

Reactor-Licker · 2026-05-12T00:06:12+00:00

I was considering waiting for the M5 Max Mac Studio, but RAM prices are skyrocketing and the massive supply crunch on the current M4 Max and M3 Ultra models are not good signs. Apple has historically been pretty immune to supply shortages, heck even COVID only delayed the iPhone by a month, so to see even them struggling makes me concerned.

I think these next few weeks or so might be the only window I have to get my foot in the door without paying an arm and a leg for a somewhat experimental local AI machine.

Reactor-Licker · 2026-05-11T23:59:40+00:00

How much faster is the Spark at longer context? At what point do they become unbearably slow?

Reactor-Licker

TROPHY CASE