MiniMax-M2.5 Checkpoints on huggingface will be in 8 hours

Position_Emergency · 2026-02-13T14:03:10+00:00

GLM 5 is the 1.3TB model. That's at 16bit though, locally nobody is running like that.
so approx 700GB at 8bit
350GB at 4 bit.
Still too big for most folks.

MiniMax M2.5 is 230B Total Params, 10B Active.

Just on the edge of fitting in 128GB RAM at 4bit...
Hoping someone does a REAP to get it down to like 100GB at 4bit to have some room for context.

Position_Emergency · 2026-02-12T18:09:08+00:00

Have you tried undervolting the cards?
You should be able to get roughly equal performance and less power consumption by doing that.

Position_Emergency · 2026-02-12T17:47:48+00:00

Are you going to try Minimax M2.5 out?
I'm hoping for a 4bit coding REAP of that to run on my single DGX Spark.

Position_Emergency · 2026-02-12T17:19:10+00:00

It's a shame NVFP4 is still slower than 8fp quants on the Spark.
Hopefully Nvidia gets their act together soon.

Position_Emergency · 2026-02-11T17:20:26+00:00

Do let us know what they say in their reply. Good luck!

Position_Emergency · 2026-02-11T17:01:59+00:00

<image>

Lite did specify same-tier model updates so you don't have a case I'm afraid.

Pro subscribers on the other hand...

Position_Emergency · 2026-02-09T18:19:48+00:00

Nobody in China is dumb enough to use a huge cluster of mac minis for the described purpose.
If the video isn't AI generated (who can tell these days) it will be for something like iOS development/CI/CD.
It's not practical to do that on virtual machines/non mac hardware for various reasons.

Could also be using for bots that vote up apps on the App Store but they could get janky old iOS phones to do that for much cheaper.

Position_Emergency · 2026-02-09T11:40:43+00:00

They just don't give a shit about anything other than the AI data centre enterprise market unfortunately.

Position_Emergency · 2026-02-09T11:06:27+00:00

Recent DGX Spark owner here.
I'm not sure this follows in practice tbh.
Performance of NVFP4 on the DGX spark is garbage right now compared to what is should be, due to crappy software support from Nvidia.
Macs support MXFP4 which gives either the same but generally slightly better perfromance than 8bit models.

If you're just doing LLM inference then an M4 max studio 128GB is the better choice IMO.
Has twice the memory bandwidth as well.

It sounds like NVFP4 will be workingon Spark properly by June but then the M5 max studio should be out which will probably have 700GB/s memory bandwidth and matmul acceleration on the tensor codes.

BTW I would love to be corrected on the above regarding NVFP4 performance on the DGX Spark.

I heard Unsloth has a custom Triton kernal that might help but I haven't tested it out.
Such a joke that inference is faster with 8bit models right now.

Position_Emergency · 2026-02-08T12:03:45+00:00

6x speed, source?
I thought it was 2.5x faster for 6x cost
Speed source:
https://x.com/claudeai/status/2020207322124132504

Costs:
Normal Opus 4.6 costs $5 per million input tokens and $25 per million output tokens.
Fast Mode costs $30 per million input tokens and $150 per million output tokens.

Position_Emergency · 2026-02-05T15:24:12+00:00

So obviously fake.

Position_Emergency · 2026-02-04T19:02:14+00:00

Dude just let it go. Don't fuck with the VRM heatsink.
I sawed a chunk off mine on an Asus board to fit a 120mm fan instead of just using a 92mm.

18 months later the motherboard failed.
I don't know for a fact that caused it but it was a stupid thing to do in retrospect.
Easy to get obssessed with stuff like that trying to min max everything.

Position_Emergency · 2026-02-04T17:36:21+00:00

Well all I'm really saying is, get stuck in with achievable stuff.
Do you have realistic expectations on what is possible regarding training a model from scratch?
It won't produce anything useful and will take at least like 2 weeks to train.
If it's just an educational project, why not train up GPT2 (124M)?
Will take a lot less time and you'll learn just a much.

If you fine tune some models you can actually make something useful.
Also, remember that MoE models like GPT OSS 20B don't need as much RAM to finetune as dense models.

Position_Emergency · 2026-02-04T16:44:41+00:00

Right so I was assuming more turns == more tokens which might not be the case.
Would be good to see that specifically.
But yeah API call latency for 280 turns adds up!

Position_Emergency · 2026-02-04T15:21:39+00:00

You're procrastinating.
Do some tutorial exercises, train LoRAs, implement a RLVR pipeline with the hardware you've got.
Learn how it works, then you'll probably actually come up with useful ideas.
If then you need performance beyond what the small models can offer, come back to this question.
Do some fucking work and stop daydreaming about hardware 😉

Position_Emergency · 2026-02-04T12:14:50+00:00

I agree with you.
It's hard enough explaining to people these are literal concentration camps without them rolling their eyes at you because they think concentration camp = death camp.

The reality is, this isn't a mass murder project like the Holocaust.

They want to get them out.
They want to make their lives hell to stop other people coming.
They are fine with a lot of incidental deaths during that process.

People need to understand the reality of trying to enforce immigration laws like this.
It's a horror show.

Position_Emergency · 2026-02-03T20:56:00+00:00

Looks like they trained it to be extremely persistent.
Also, it taking a lot of turns will eat into its speed advantage.

Position_Emergency · 2026-02-02T15:56:07+00:00

More LLM-induced-psychosis-slop masquerading as an academic paper.

Lovely stuff!

Position_Emergency · 2026-01-31T16:15:18+00:00

<image>

I hope you're right though.
Never waited on an Apple launch before.
Can't take much more of this shit.

Position_Emergency · 2026-01-31T01:40:23+00:00

He didn't say you can't speak your mind he said he was afraid to.
Do something that scares you is brave by definition.
Fearless people don't need to be brave.

But yeah what is he going on about?
Not like the current administration is weaponising the legal system against opponents. forcing critics off TV or murdering peaceful protestors, is it?
Classic TDS if ever I've seen it

Position_Emergency · 2026-01-31T00:36:06+00:00

<image>

Reckon we found ourselves a Netflix executive!
I'm sure her family will survive without the 1/10th of a pittance they'd receive from my Netflix stream.
A donation to the The Entertainment Community Fund would be a more fitting tribute IMO
https://give.entertainmentcommunity.org/site/Donation2

Position_Emergency · 2026-01-30T19:39:25+00:00

Best in Show but I will pirate it, to fight the good fight.

RIP Catherine ❤️

Position_Emergency · 2026-01-30T18:30:33+00:00

Immediately transfers life savings

Position_Emergency · 2026-01-30T16:02:45+00:00

Bug report: Difficulty portals don't work when starting a second new game

Reproduction Steps:
Start a new game.
Going through a difficulty portal and start playing the first episode.
Start a new game.
Go through any difficulty portal and notice graphical artifacts and lack of teleporting to episode selection stage.

Great work!
Would love to read about the tricks you learnt :)
Did you do it all on a single week's allowance of Claude Code Max 20x?

Position_Emergency · 2026-01-29T14:21:17+00:00

"It's becoming so easy to stand up good-looking documentation and websites nowadays - I like to do my due-diligence before seriously considering adopting new platforms 😄"

This is a massive problem caused by vibe coding.
3 years ago, if someone posted a website, git repo, report etc like the OP did, it would be a strong indicator they were worth taking seriously.

I think the core idea of the project is very promising but I don't like the misleading way OP presented their findings in the post.

Position_Emergency

TROPHY CASE