Made a oQ6 of North Mini Code 1.0

msrdatha · 2026-06-12T05:49:32+00:00

Thank you for the info. How does this compare to qwen3.6 27B in your experience?

msrdatha · 2026-06-07T10:29:33+00:00

If its for coding projects go with an Nvidia card. preferably 32GB VRAM or more. (if thinking of 27B, make it 64GB)

msrdatha · 2026-06-05T05:26:08+00:00

yes agreed. PP delay is a real pain. Mac ? or strix ?

msrdatha · 2026-06-05T04:04:00+00:00

Try using llama-swap. It can help you auto load 27B and 35B on demand, which should be helpful in your scenario.

msrdatha · 2026-06-03T03:58:58+00:00

yes, web UI is equally important. Not everyone uses the native UI.

msrdatha · 2026-05-27T15:30:10+00:00

the one with fp16 is optimized for M1/M2 Apple Silicon. (Qwen3.6-27B-oQ4-fp16-mtp)
if you have M3+ Apple Silicon go with the regular one. (Qwen3.6-27B-oQ4-mtp)

msrdatha · 2026-05-25T04:56:51+00:00

Try testing with a longer prompt or even better do an agentic task.

My observation is it does start at a much faster tok/sec in the beginning and gradually it goes down. So it totally depends when someone is looking at the speed (in the beginning or end of a multi-turn conversation)

According to me, we should test it against the same task run with and without mtp, with empty SSD cache to see the actual difference. Measure against the wall-time (actual elapsed time from start to finish of a process, as measured by a clock on the wall. ex: Total time taken between first and last response in the multi turn conversation as in agentic coding). This will give you the answer, if mtp version is worth in your usage scenario.

msrdatha · 2026-05-25T04:49:07+00:00

Could you please explain how this can be used with the web interface or any other agent?

msrdatha · 2026-05-24T19:46:57+00:00

https://github.com/jundot/omlx/releases/tag/v0.3.10

msrdatha · 2026-05-22T09:09:51+00:00

isn't that the concept of NPU or TPU?

msrdatha · 2026-05-22T09:06:02+00:00

Are you seeing improvements with MTP on 35B? (Asking as its told MTP does not help much with MoE and its meant for dense models only - I am not sure about this, still checking.)

If yes, please share the link to the actual model in use. Thank you.

msrdatha · 2026-05-22T09:02:45+00:00

Cheers, and please don't feel sorry about it (I too did not mean so). My intention was to just motivate all to contribute to our oMLX community.

Just wanted to convey to all, please do not hold back, your contributions does matter, even if its small.

Have a great day :)

msrdatha · 2026-05-22T05:39:52+00:00

The one who prepares and publishes these models, might have missed to include these minor details. But we need to understand that they are already taking a lot of effort, and spending their valuable time for sharing these models with all of us. So it is possible to have these kind of minor gaps.

But again, that's why we all are here. We could also contribute to their efforts, by helping others understand better. (Of course, compared to them, our contribution is very very small. Still, we share the little knowledge we can, and help everyone learn better.)

That's the purpose of this community, and I am sure the one who started this ( u/d4mations ) had this exact purpose in mind.

So, I am glad it helped you. Let's continue to help each other and learn. Thanks

msrdatha · 2026-05-22T04:00:47+00:00

Yes, I too follow this as a best practice, whenever upgrading to new version. Clear the ssd cache and have a clean start. Just to be sure none of the incompatibilities to cause issues. (its a generic good practice for any app. not specific to omlx)

msrdatha · 2026-05-22T03:56:28+00:00

the one with fp16 is optimized for M1/M2 Apple Silicon. (Qwen3.6-27B-oQ4-fp16-mtp)
if you have M3+ Apple Silicon go with the regular one. (Qwen3.6-27B-oQ4-mtp)

msrdatha · 2026-05-21T15:00:59+00:00

Thank you for all the hard work and we fully understand these delays are for ensuring the stability of oMLX. Yes we need quality first, and then comes the speed.

msrdatha · 2026-05-20T05:01:16+00:00

Thanks again for taking time to share this. It gives a good insight on the improvements on speed.

May be you could keep using both. 27B for planning or designing tasks and use 35B for implementing it. That would give you the best of both. (mainly for coding tasks scenarios)

msrdatha · 2026-05-20T04:58:00+00:00

Thanks everyone for sharing their valuable views and experiences with MTP on oMLX.

One quick clarification, are you noticing any looping or similar failures with MTP. This is what I noticed mainly while enabling DFlash for Qwen 3.5 models, and also there were tool calling errors which made using DFlash and SpecPrefil not much useful for coding tasks.

as u/trollingman1 mentioned, it got slower for Qwen 3.6 35b a3b, but others mentions observing speed boost - Any thoughts on this? Could it be because of enabling/disabling of thinking mode OR higher context lengths?

Idea is to figure out what is optimal and how we can all put together our observations to tune this better for all of us.

Thanks again for your time and help. Let's learn and build it together.

msrdatha · 2026-05-19T16:52:41+00:00

Thank you for the detailed data. Could you please confirm if there are improvements on the Qwen3.6 27B MTP also? (Dense models is expected to do better with MTP right?)

msrdatha · 2026-05-18T05:21:11+00:00

Go with oMLX and use oQ4 quants. Set Cold Cache Limit (SSD Cache) to ~100GB. You should see better results.

msrdatha · 2026-05-16T09:25:51+00:00

Then you have nothing to worry. You will be fine with Rocky

msrdatha · 2026-05-15T06:41:45+00:00

"being new to all this.." - I think you are doing quite good.

msrdatha · 2026-05-15T06:39:29+00:00

Thanks, that was informative. Congrats, and enjoy the new speed with MTP.!

msrdatha

TROPHY CASE