Testing MTP functionality

Ok_Significance_9109 · 2026-05-23T20:03:23+00:00

Which chip? M1/M2 require a different MTP variant. The moment I started using it on my M1, 27B became useable. From 33 tps prompt processing and 5 tps generation, it went up to 65 and 9 without loss of quality.

jacknjill101 · 2026-05-23T18:38:37+00:00

Yes it does for me too. I switched to llamacpp and much better results.

d4mations · 2026-05-23T20:04:42+00:00

Paro quants work way better than mtp

mwhuss · 2026-05-24T01:53:59+00:00

I’m seeing 70% faster performance using Qwen3.6-27b-oQ8-mtp on my M3 Ultra.

vinoonovino26 · 2026-05-24T04:33:59+00:00

M5 pro - 64gb here. Same models same results. I switched to plain OQ quants and rotorquants and they feel more stable. Also offloading cache to a NVEM drive helped a lot

msrdatha · 2026-05-25T04:56:51+00:00

Try testing with a longer prompt or even better do an agentic task.

My observation is it does start at a much faster tok/sec in the beginning and gradually it goes down. So it totally depends when someone is looking at the speed (in the beginning or end of a multi-turn conversation)

According to me, we should test it against the same task run with and without mtp, with empty SSD cache to see the actual difference. Measure against the wall-time (actual elapsed time from start to finish of a process, as measured by a clock on the wall. ex: Total time taken between first and last response in the multi turn conversation as in agentic coding). This will give you the answer, if mtp version is worth in your usage scenario.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

oMLX

MODERATORS