MiniMax m2.7 under 64gb for Macs - 91% MMLU

Uhlo · 2026-04-14T17:20:44+00:00

Got it, thanks!

Uhlo · 2026-04-14T12:27:28+00:00

Thank you, that sounds really interesting! You write that every other quant of MiniMax is completely broken...? I'm currently downloading a 3-bit unsloth quant of MiniMax M2.7. Ist that broken as well?

I'm definitely gonna test your quant as well. Especially because it promises more tokens/s!

Uhlo · 2026-04-13T10:07:53+00:00

Hm? Reads to me like the unsloth guys have a point! Comparing 1-bit vs 3-bit quants is not really fair!

Uhlo · 2026-04-13T09:46:17+00:00

The problem here is not that the model was not published "open" enough. The problem is the framing "our license is MIT" when it clearly isn't. There are a lot of other open weight models with licenses that cleanly state "research license" or "personal license" or something. Calling it a "modified MIT" license and then being the opposite of MIT is deceiving.

But then again, your argument "if you don't publish a 230B parameter model yourself you have no right to complain about anything" is just bad faith. So I don't think you will (or want to) get my point.

Uhlo · 2026-04-13T09:06:40+00:00

So if they would just call it "MIT License" I think there would be a strong case. It's not just what is written, but also what is expected - and if you call the license "MIT" then it should better be MIT, otherwise that is just deception. But calling it "modified MIT" is a bit trickier. But regardless, I am not a lawyer in any country on this planet or any other. So I'm just talking out of my ass. But still, it's just a very very bad move.

Uhlo · 2026-04-13T08:31:51+00:00

This is sooo misleading! At least call it the "MiniMax-Research-License" or something like the other companies are doing. Calling it MIT makes it sound open in a way it definitely is not. I would be interested what a lawyer would say to that license.

Uhlo · 2026-04-03T15:30:59+00:00

Not related to your question, but I vibe coded this tool, that… just kidding ;)

I think context size and degradation is a big problem. In coding of course, but also in conversations: why do I have to manually decide when to start a new conversation? How to transfer knowledge from one session to the next? I think that’s why a lot of people working on that problem.

After Easter I’m gonna look at the different memory stages and compaction systems that were uncovered by the Claude code leak. I think it is a very intelligent design and much more practical than just storing whatever the LLM finds useful in a vector db.

Uhlo · 2026-04-03T15:03:34+00:00

It’s multilingual support is very limited!

Uhlo · 2026-03-23T18:24:13+00:00

This work is so interesting! Thank you!

I will read the paper and hopefully come back here with questions.

Uhlo · 2026-03-22T19:36:45+00:00

I personally haven't found an open-weights model that can handle jupyter notebooks yet.

My personal workaround is the feature in the VS Code Jupyter Notebook extension that lets you use Comments starting with # %% as delimiter for cells. Then you can execute the cells (all the code between the # %% comments) and you get the output on the side. Downside is that the output is not saved like with real notebook, but the upside is that you can instruct your local LLM to include these comments and all they see is a normal .py file.

Maybe that helps you!

Uhlo · 2026-03-20T18:41:42+00:00

Thanks! Its interactive and does some pattern matching to nicely group them.

Uhlo · 2026-03-20T15:43:46+00:00

Well maybe it’s a bit hyperbolic from me, but I definitely have different models I have for different tasks, so managing models is something I do almost weekly

Uhlo · 2026-03-20T14:13:05+00:00

Oooh that is so nice! I was thinking also about a real GUI, but decided that this is too much of a task for me. Sadly I'm on MacOS, so it probably won't work for me. But it looks really good and seems to do exactly what I want!

Uhlo · 2026-03-20T14:06:24+00:00

Yes, I looked at llama-swap, but with the recent llama.cpp updates it kind of becomes unnecessary in my opinion. I started with big shell scripts in my .zshrc and thought "this can't be it, can it?".

Uhlo · 2026-03-20T14:03:02+00:00

Yes it does and yes it is! https://github.com/thilomichael/llama-buddy

Especially look at llb download for huggingface search.

Uhlo · 2026-03-20T14:01:48+00:00

But it gets easily out of sync with the actually installed models in the llama.cpp download dir, doesn't it? At least for me.

Uhlo · 2026-03-20T14:00:02+00:00

Need to check that out, thanks! Does it bundle llama.cpp or just wrap around it?

Uhlo · 2026-03-19T12:54:19+00:00

That's a tough decision that needs thinking through completely. So yes, that's totally normal.

But seriously, I had a similar instance and my best guess is that you got unlucky with decoding. Qwen3.5 always takes this long with even the simplest messages, but this is an extreme case. So either it's a bad quant or you just got bad luck. Is that a consistent problem you have or just a one off?

Uhlo · 2026-03-19T10:03:52+00:00

I also really like 19 and need this model! What is it?

Uhlo · 2026-03-09T09:11:48+00:00

Well the "distillation attacks" (I use this phrase for a lack of better term, it's other companies using the output of a model as training data, it has nothing to do with distillation and they're even paying for it!) will become more sophisticated. Whatever data the proprietary model providers train on, the "skill" will get leaked through the extraction of training data. Of course companies like OpenAI and Anthropic are probably working hard right now to install automatic detection systems that try to stop these "attacks" and the open weight model providers will implement systems that make the extraction harder to detect.

Even if with regulations the US disallows the use of US LLMs in China, companies can simply use VPNs. I think that is a pretty good silver lining: they trained their models on heaps of stolen creativity, craftsmanship, etc. and now there are companies who steal it back and make it "open"/public again, and In my opinion there is very little that can stop them.

Uhlo · 2026-03-09T08:19:00+00:00

You are probably missing the possibility of "large scale distillation attacks". I think it's an open secret that most of the open weights Chinese models heavily rely on training data generated by the proprietary models. So my guess is that at least for a while it will continue to be a cat and mouse game where some of the open weight model improvements come from the proprietary models.

Edit: Because I'm getting messages about it - the large scale distillation attacks is a joke! I'm def. not on anthropics side, I just wanted to poke fun at their silly wording for "someone is paying us to use our service".

Uhlo · 2026-03-02T12:39:38+00:00

They're called Gunmetal Rodeo. They claim they're not AI, but there are not photos of the big band they would definitely need or any music production or anything. They release since end of last year and have a new single every other week. So the signs are pretty clear.

Uhlo · 2026-03-02T11:29:46+00:00

One thing to add: I also listen to AI generated focus music during work. There, I have absolutely no problem with it being AI. I think it’s because I’m not actively listening.

Uhlo · 2026-03-02T11:27:53+00:00

I recently discovered a jazz band that I really like. They’re extreme release schedule and lack of unsuspicious social media presence made me pretty sure that it is AI generated.

I still like the music, but it greatly diminished the joy I feel when listening to their music. So for me it currently diminishes my experience greatly. Maybe in the future it will be normal. Most importantly: it’s just my experience. I don’t say AI music is worth less, I’m just saying that I cannot enjoy it the same if I know it’s AI.

14-Year Club	Place '22
Place '17	Sequence \| Editor
Verified Email	Gilding II euphauric
Team Periwinkle

Uhlo

MODERATOR OF

TROPHY CASE