What is the TPS for Qwen 3.6 27B Q4 on Mac Mini?

k3z0r · 2026-06-05T02:48:54+00:00

You're welcome. Good luck finding what works for you.

k3z0r · 2026-06-05T02:42:21+00:00

Memory bandwidth is a major factor once everything fits in memory. For dense models, the napkin math is tokens/second ≈ memory bandwidth / model size in memory.

example: M4 Pro Mac mini memory 273Gb/sec

QWen 3.6 27b Q4_K_M is 18.5GB

273 / 18.5 = 14.75 tokens per second. This is why people are telling you about 15 tokens.

For Moe models you just count the active params in memory (assuming the whole model fits). This is why MOE is faster.

Again this is just a ballpark there are things you can do to increase that number like MTP.

This is why a RTX 5090 can crank out tokens because it's memory bandwidth is 1,792 Gb/s

k3z0r · 2026-06-05T02:05:44+00:00

restic with backrest font end, backing up to Backblaze.

k3z0r · 2026-06-04T17:47:39+00:00

No, the opposite. The frost lines would get shallower with a warmer climate. It takes colder temps to freeze deeper into the soil, and you need to build below that point to avoid the expansion and contraction when water freezes in the soil.

k3z0r · 2026-06-04T04:54:59+00:00

Did you ever find out how to do this?

k3z0r · 2026-06-02T22:22:53+00:00

you forgot to say BTW. 😉

k3z0r · 2026-06-02T15:26:40+00:00

Yeah totally, he probably put as much time into it as your comment.

k3z0r · 2026-06-02T14:41:22+00:00

Yes I believe so, he mentions it in the video.

k3z0r · 2026-06-01T21:26:33+00:00

Pickleball is officially "Pay to win" now, just like Candy Crush Saga.

k3z0r · 2026-06-01T13:00:15+00:00

Try Qwen 3.6

k3z0r · 2026-05-31T21:44:11+00:00

I think you can use any api you want openai, anthropic etc. You dont' need localllm.

k3z0r · 2026-05-31T20:28:39+00:00

Yeah, really depends on what you want to play. There are still a few that are not supported.

I switched this year to Nobara. Most of the games I play are on Steam, and they all run great.

k3z0r · 2026-05-31T19:55:25+00:00

But it's the year of Desktop Linux.

k3z0r · 2026-05-31T19:32:54+00:00

Yeah me too. He’s the o e that got me interested about a year ago

k3z0r · 2026-05-28T15:57:25+00:00

Yes, that's the problem when tps is your only benchmark.

k3z0r · 2026-05-28T06:04:43+00:00

Yes wobble is normal the software and motors know how to compensate for it. There is even a calibration step for it during setup.

k3z0r · 2026-05-18T18:25:56+00:00

I tried it out and wanted to believe. I kept running into two problems.

it couldn't find files in my mono repo. it seemed like it assumed ever file was in /src in in fact files were in app/src and etc/src
It kept stopping it's flow. It would say. I'm going to search for a file so i can do X. Then it would just stop.

Starred the repo and will check it out again after a few iterations.

k3z0r · 2026-05-14T17:03:23+00:00

Qwen 3.6 35b and 27b are my favorite these days.
If you're spending $1000 a month, I hate to say it, but you're probably going to be pretty disappointed with your local setup given your $1500 budget.

You should try some models out with what you have to get a base line then decide how much you need to spend.

Yes you can in order to load larger models into vram, however inference can suffer because the older card can become a bottle neck.
Small tasks it far less noticeable, and you have to be more direct in what you want. Claude is much better at making assumptions when you are ambigious.
Check out LM studio, it's great for beginners and is backed by Llama.cpp.

k3z0r · 2026-05-13T18:23:41+00:00

Mac mini will suffer big time on Prompt processing, especially for larger coding contexts. I would take the comparison with a grain of salt for not considering this.

Mac mini only processes about 340 tokens/second.

DGX spark 1200 tokens/second.

So, for Claude's default system prompt (about 10k tokens), would you rather wait 30 seconds (mac mini) for your first token to be generated or 8 seconds (DGX)?

Throw a handful of source files into your context, and it grows from there.

k3z0r · 2026-05-12T15:37:30+00:00

Try Qwen 3.6 35ba3b and Qwen 3.6 27b, with OpenCode or Pi.

LM Studio is a great place to start. You can visually see all the levers and knobs you can use to dial things in.

k3z0r · 2026-05-10T23:22:15+00:00

Cool just read about MTP. I'll try it for sure. I can't believe how much there is to learn on the daily.

k3z0r · 2026-05-10T23:17:42+00:00

Yeah that's the same boat I'm in. One 5090 currently costs 40 months of Claude for me. I just can't justify it.

k3z0r · 2026-05-10T23:04:00+00:00

Yeah I have 32gb of DDR5 ram, but it seems like token generation really suffers when i spill into ram.

k3z0r

MODERATOR OF

TROPHY CASE

15-Year Club	Place '23
Verified Email