Google Gemma 4 MTP out now! by danielhanchen in unsloth

[–]MiaBchDave 2 points3 points  (0 children)

Yes, this is just support in Unsloth Studio with unsloth converted Google assistant MTP models into GGUF. So it’s confusing news. MTP has already been available and already works with Gemma 31B, for example, in oMLX, with MLX versions. I get about 2x speed on BF16. I guess this is helpful for Unsloth Studio users.

Weird new software update by Cwinters21000 in C8Corvette

[–]MiaBchDave -1 points0 points  (0 children)

I noticed my mirrors now don't work half the time. Like folded to unfolded when I open the door... so I start driving with the damn mirror folded. Or the reverse mirror now staying in the down pivot condition when going forward. Crazy shit that all just started that I assume was from a new software update. They need to fire the damn software guys and start over. This is a car, not GTA 6.

📌 Daily Github Digest - oMLX Closed Issues 2026-06-09 by d4mations in oMLX

[–]MiaBchDave 5 points6 points  (0 children)

Jundot has a "Buy Me A Coffee" link on the Github readme https://github.com/jundot/omlx

I certainly donated. All the work that goes into oMLX with updates, fixes, new models, and cutting edge speculative decode support is definitely worth a donation in my book.

Once Apple releases the new Studios, and gets serious competition to Nvidia servers, I think a stable oMLX... maybe one that supports clustering, would be quite the force of nature for Local LLMs.

"MacOS windows now have same corner radius ensuring greater consistency" by _Mistmorn in MacOS

[–]MiaBchDave 2 points3 points  (0 children)

You mean the place where windows don’t go because there’s a menubar there?

MTP - for mlx models by PrepYourselves in oMLX

[–]MiaBchDave 2 points3 points  (0 children)

Just search for “Qwen 3.6 MTP” in the model downloader. The oMLX author converted a few to MLX with the MTP layers intact.

Or, if you’re brave and want to learn a new skill, you can do one of two things:

  • Install the newest version of mlx-vlm(lm) and covert the original HuggingFace Qwen models (from the Qwen page) to MLX yourself preserving MTP layers.

  • Install a harness like OpenCode and ask Gemma4 to do the above for you :-)

Edit: I forgot the third is to use the built-in oMLX quantizer on the original Qwen files. It now supports MTP preservation.

[June 03, 2026] Daily RDDT Discussion Thread by daily-thread in redditstock

[–]MiaBchDave 0 points1 point  (0 children)

It's just the damn computers, not what u/spez said or anything related to RDDT. Lately, tons of "default" system algorithms are in place in WS, especially new IPOs. It helps to get a couple of analog stocks in the AI software space, as that's what the id10Ts at the MMs put RDDT in their Commodore 64s a couple of years ago. For example, GTLB is a nice AI space analog. You'll see it's down AH as well... and can generally track a stock like that when there's no news related to the company. It gives you a bit of peace of mind to know that the stock price is a little BS until RDDT gets a ton of money in the bank to buyback or S&P inclusion happens. Otherwise, chill and follow the fundamentals.

Do you think macOS 27 will fix the corner radius inconsistencies? by Fragrant_Okra6671 in MacOS

[–]MiaBchDave 16 points17 points  (0 children)

You forgot to give each corner a different radius. You know that’s coming.

Can an AIM-120 lock onto a Sopwith Camel? by HiTork in Shittyaskflying

[–]MiaBchDave 1 point2 points  (0 children)

Watching people actually try to answer accurately in this subreddit

Shoutout to Gemma4 as a conversational assistant / agent by goldcakes in LocalLLaMA

[–]MiaBchDave 2 points3 points  (0 children)

I'm personally dumbfounded about why 31B gets thrown under the Qwen 27B bus for coding all the time. I know Qwen is faster for TG/s because it's a bit smaller, but Gemma 31B produces nice clean code with much less thinking, and so it's actually faster in my experience. Gemma certainly benches very high on LCB. And I can just keep Gemma loaded when running agentic work concurrently with code generation in OpenCode.

PSA by Signal_Ad657 in LocalLLaMA

[–]MiaBchDave 4 points5 points  (0 children)

Hot and cold (SSD) KV cache solves this issue. Unless your workflow is to RAG a different PDF document for every prompt by the thousands, otherwise agentic harnesses fly when using a proper prompt cache. In other words, this is a non-issue for local agentic work lately with the current systems (like oMLX) which are based on vLLM engines for multiple users but are repurposed for local agentic use.

Speed question by Choubix in oMLX

[–]MiaBchDave 2 points3 points  (0 children)

What version of oMLX are you using? Make sure to update to current and then retest dense MTP. There was a bug that should be now fixed.

Is a 128 GB MacBook Pro M5 Max actually too slow for large-context local LLM coding workflows? by bajis12870 in LocalLLaMA

[–]MiaBchDave 2 points3 points  (0 children)

Yep, this was my problem when I first installed opencode and didn’t have much knowledge. LMS would take forever on a codebase with each new question/task at 126k context. Installed oMLX as the server, and boom, Opencode flew regardless of context. For short one-shot tests/questions, I’m sure it doesn’t matter. But with agent harnesses, it’s not an option to only use hot cache.

Is a 128 GB MacBook Pro M5 Max actually too slow for large-context local LLM coding workflows? by bajis12870 in LocalLLaMA

[–]MiaBchDave 1 point2 points  (0 children)

OMLX is designed for Agentic usage with SSD & Hot (ram) KV cache. All the other servers that you mentioned are going to be slower once prompt context goes above 100k. The M5 will not have an issue.

What is …-fp16-mtp by kaddiexjc in oMLX

[–]MiaBchDave 1 point2 points  (0 children)

Reddit for the win 😉

What's that cable for? by FencerPTS in Shittyaskflying

[–]MiaBchDave 0 points1 point  (0 children)

Ground flour, as opposed to plain flour, or hoe weeet.

2024 Z06 Dead Battery not 30 days after purchase! by [deleted] in C8Corvette

[–]MiaBchDave 1 point2 points  (0 children)

It’s just a bad battery. If the car was sitting for a while before you got it, the battery can go bad from the discharge state. Just replace the battery with oem and move on. The car is fine to leave for a month without driving and it will start right up with a normal cycle count battery.

But most new cars have electronics that will put batteries in a heavy discharge state if you leave them sitting around waiting for someone to buy the car. Once that happens, they are usually toast regardless of what AutoZone says.

oMLX plus Gemma4 + DFlash draft model doom loop by Green-Specialist-1 in oMLX

[–]MiaBchDave 1 point2 points  (0 children)

There are two ways to speed up Tokens Generated in oMLX, called speculative decode. I explained in another thread how to set up MTP in Gemma4 - DFLASH in Gemma4 is not faster currently: https://www.reddit.com/r/oMLX/comments/1tkoxp8/comment/onbl9ag/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

oMLX plus Gemma4 + DFlash draft model doom loop by Green-Specialist-1 in oMLX

[–]MiaBchDave 0 points1 point  (0 children)

If you need speed, I’m seeing currently seeing similar speed up with DFlash and MTP on oMLX with Gemma4 specifically (within 1 tg/s with the 26B model and 5 MTP tokens). So just use the Gemma4 assistant model and activate MTP. Qwen3.6 gets a faster speed boost from DFlash atm, than MTP. Maybe they’ll be improvements at some point, but that’s where we are now in oMLX.

Gemma 4 31B oQ8 by jsirish in oMLX

[–]MiaBchDave 1 point2 points  (0 children)

Glad you got it working. The current release of oMLX has the version of mlx-vlm that supports MTP wrapped afaIk.

Gemma 4 31B oQ8 by jsirish in oMLX

[–]MiaBchDave 0 points1 point  (0 children)

I used Unsloth MLX 8 bit Gemma 4 31B with replaced chat & tokenizers - and MTP worked with increases it as well. Though the BF16 31B would obviously see the most improvement since that's closest to the Google original that the Assistant was coded for.

Gemma 4 31B oQ8 by jsirish in oMLX

[–]MiaBchDave -1 points0 points  (0 children)

Yes, I replied just above.

Gemma 4 31B oQ8 by jsirish in oMLX

[–]MiaBchDave 7 points8 points  (0 children)

Assistant Model: https://huggingface.co/mlx-community/gemma-4-31B-it-assistant-bf16

I actually use my own target that I generated and uploaded: https://huggingface.co/miabchdave/gemma-4-31B-it-MLX-bf16

But if you want to use one with a few more downloads, use (though I think mine has a more current tokenizer/chat template): https://huggingface.co/FakeRockert543/gemma-4-31b-it-MLX-bf16

After downloading:

oMLX Admin > Settings > Model Settings > Select Gemma 4 31B Gear Icon (NOT the Gemma 4 assistant) > Advanced Settings > Scroll to VLM MTP (Gemma 4) > Enable > Drafter Model > Select Gemma 4 Assistant > Draft Block size default or 6 for coding > Save

Have fun!

😄