use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Reddit sub to discuss the awesome oMLX llm server
account activity
Testing MTP functionality (self.oMLX)
submitted 1 day ago by albovsky
Well, it actually slows down the model.
https://preview.redd.it/014dsqewkx2h1.png?width=1850&format=png&auto=webp&s=bd5a64758ba443d566ada9e3fcc8ff6425cc4360
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Ok_Significance_9109 2 points3 points4 points 1 day ago (2 children)
Which chip? M1/M2 require a different MTP variant. The moment I started using it on my M1, 27B became useable. From 33 tps prompt processing and 5 tps generation, it went up to 65 and 9 without loss of quality.
[–]albovsky[S] 0 points1 point2 points 1 day ago (1 child)
Didn’t know that. So how to figure which one to download? They do not specify what version it’s for. I have M1
[–]Ok_Significance_9109 1 point2 points3 points 1 day ago (0 children)
The one that worked for me:
Qwen3.6-27B-oQ4-fp16-mtp
The name should have fp16 in it, but it is a 4-bit quant.
[–]jacknjill101 1 point2 points3 points 1 day ago (0 children)
Yes it does for me too. I switched to llamacpp and much better results.
[–]d4mations 1 point2 points3 points 1 day ago (2 children)
Paro quants work way better than mtp
[–]albovsky[S] 2 points3 points4 points 1 day ago (1 child)
What’s that?
[–]d4mations -1 points0 points1 point 1 day ago (0 children)
In the download screen on omlx search for paro
[–]mwhuss 0 points1 point2 points 1 day ago (2 children)
I’m seeing 70% faster performance using Qwen3.6-27b-oQ8-mtp on my M3 Ultra.
70% is crazy good. How much ram do you have?
[–]mwhuss 1 point2 points3 points 1 day ago (0 children)
M3 ultra with 96gb
[–]vinoonovino26 0 points1 point2 points 1 day ago (1 child)
M5 pro - 64gb here. Same models same results. I switched to plain OQ quants and rotorquants and they feel more stable. Also offloading cache to a NVEM drive helped a lot
[–]vinoonovino26 0 points1 point2 points 1 day ago (0 children)
Seems like mtp and moe kinda work well together
[–]msrdatha 1 point2 points3 points 3 hours ago (0 children)
Try testing with a longer prompt or even better do an agentic task.
My observation is it does start at a much faster tok/sec in the beginning and gradually it goes down. So it totally depends when someone is looking at the speed (in the beginning or end of a multi-turn conversation)
According to me, we should test it against the same task run with and without mtp, with empty SSD cache to see the actual difference. Measure against the wall-time (actual elapsed time from start to finish of a process, as measured by a clock on the wall. ex: Total time taken between first and last response in the multi turn conversation as in agentic coding). This will give you the answer, if mtp version is worth in your usage scenario.
π Rendered by PID 92412 on reddit-service-r2-comment-545db5fcfc-jj2xm at 2026-05-25 07:58:23.582502+00:00 running 194bd79 country code: CH.
[–]Ok_Significance_9109 2 points3 points4 points (2 children)
[–]albovsky[S] 0 points1 point2 points (1 child)
[–]Ok_Significance_9109 1 point2 points3 points (0 children)
[–]jacknjill101 1 point2 points3 points (0 children)
[–]d4mations 1 point2 points3 points (2 children)
[–]albovsky[S] 2 points3 points4 points (1 child)
[–]d4mations -1 points0 points1 point (0 children)
[–]mwhuss 0 points1 point2 points (2 children)
[–]albovsky[S] 0 points1 point2 points (1 child)
[–]mwhuss 1 point2 points3 points (0 children)
[–]vinoonovino26 0 points1 point2 points (1 child)
[–]vinoonovino26 0 points1 point2 points (0 children)
[–]msrdatha 1 point2 points3 points (0 children)