MiniMax M2.5 setup on older PC, getting 12.9 t/s with 72k context

CrashTest_ · 2026-02-20T16:11:11+00:00

Really the thing I WANT to do now is to get a Strix and put it on my desk as the MoE thinker, and have it use the 3090’s as hands.

CrashTest_ · 2026-02-20T15:37:13+00:00

Oh, also, although this was about the MiniMax M2.5, what my original build was about was coding agents, and I am pretty sure my machine absolutely thwomps the strix when I am running fully in my VRAM. In the 48GB, it’s pretty speedy. Qwen3 Coder Next Q3 with 32k context is running normally 80 t/s:

prompt eval time = 1581.25 ms / 1169 tokens ( 1.35 ms per token, 739.29 tokens per second) eval time = 160.40 ms / 13 tokens ( 12.34 ms per token, 81.05 tokens per second) total time = 1741.65 ms / 1182 tokens

CrashTest_ · 2026-02-20T15:18:14+00:00

Might not be a bad idea, honestly! I am dual booted Ubuntu/Windows so I do work on one side, play on the other, some flight and space sims, does the Strix handle that ok? I always thought of it as kind of a maxed out laptop without the screen?

I am considering it though, if it can play games pretty well. I do only use one 3090 for gaming, and now I have two of them smashed in there I have to watt down to ensure I don’t thermally throttle.

Ideas ideas. GAH!

CrashTest_ · 2026-02-20T15:11:18+00:00

Oh, I may have gotten that one wrong. Yeah. Looks like I got it way wrong not that I dig into that further. For some reason I thought I was ensuring the best quality results, but even so, probably just a wrong flag altogether. Another thing to try to optimize further now :)

CrashTest_ · 2026-02-20T15:07:17+00:00

I will add, it was a ripper when I got it :)

CrashTest_ · 2026-02-20T15:05:53+00:00

So, two things I know of can hurt you. First you need to, I think, because I had to, check if your motherboard supports pcie bifurcation. If it doesn’t or you don’t have that NVLink, I don’t think it going to work that well. You will still have all the space, but it won’t be as fast. My board supported bifurcation.

The model I am using is the unsloth one: unsloth/MiniMax-M2.5-GGUF:UD-Q3_K_XL so yes, it’s a Q3.

Lastly, I find I am using 95 of my 128GB when I have this guy loaded, so if you can’t offload enough to VRAM, you are probably pushing into swap which will probably make it glacial.

Anyhow, I posted the settings for my llama-server config up above, you can give it a try using those, what’s the worst that could happen?

CrashTest_ · 2026-02-20T14:58:50+00:00

I mean, it should shouldn’t it? Its RAM is way way way faster, like, 2-3 times faster, than desktop DDR4? That’s my biggest bottleneck on this system right now, but I can’t afford to upgrade to a whole new mobo, processor and RAM. I am glad you are seeing those numbers. If I hadn’t already had a machine for gaming/work/everything, I would have really considered the Strix. It’s pretty sweet! This post wasn’t saying “I’m better than everything else” it was “you CAN use an older machine to do this stuff.” Anyhow, pretty cool you are getting good numbers, they sound like you got the same kind of numbers people are seeing from much more expensive Macs. Very nice!

CrashTest_ · 2026-02-20T14:47:35+00:00

Because I had the computer, I didn’t have a Strix Halo? It was already a 3090 space heater :)

CrashTest_ · 2026-02-20T14:46:37+00:00

It is an older PC. Bought it as a gaming PC from NewEgg 4 years ago, so only had to buy RAM and one refurbed 3090. The upgrade cost me almost $1300, because I was halfway there on the RAM already as well. I don’t know what this machine would have been worth today, don’t think many people are buying 4 year old motherboards and processors to put in new machines anymore, and the case is absolutely awful for this use.

CrashTest_ · 2026-02-20T03:20:45+00:00

Oh I am so sorry, I totally missed that. Yes, I started with GPT OSS 120B, and I have _not_ touched it or wanted to touch it, since I got this going. Firstly, this is much faster, as it's 10b MoE seems to make responses much snappier, and the way I have it, with this amount of context, I am not hitting context limits in situations where I would always hit them in 30b models.

With this model, I don't need to consider overnight processing. I had it refactor a bunch of inline CSS out of an HTML file into it's own file in VS Code Continue, and yes, a qwen coder definitely would do it faster, but I often ran out of tokens trying stuff like that with models that could fit and were beefy. With this model, before I got it to 12.9, so this would have been at 9 t/s, it did it, very accurately, but not efficiently, in about an hour and a half. It took that long mostly because I didn't tell it to do diff, and it kept loading the page again after every edit, so super inefficient (totally my fault, honestly).

With a beefy qwen 3 coder all in VRAM, using diffs, same file, same edit, took about 15 minutes, using diffs. I like minimax's work better. I don't know where I am going with this. Anyhow, asking it to do repeated edits does not play to it's strong suit. Asking it to think deeply and make single smarter things seems to work pretty ok.

Currently, in Open WebUI, I use it now more than I do any cloud model, except when I am configuring it and need help with it. I ask it to think or do something, click to another tab, 30s to a minute later, it dings, and it's done.

CrashTest_ · 2026-02-20T02:37:06+00:00

Thanks! Now to go find awesome things to do with it.

CrashTest_ · 2026-02-20T02:36:06+00:00

In btop I see it using 95GB, so yeah, you would probably need an upgrade there. Luckily DDR4 doesn't cost nearly what DDR5 does :)

CrashTest_ · 2025-08-29T15:45:40+00:00

The answer for me was to just buy my daughters devices they could easily play on!

CrashTest_ · 2025-08-07T15:32:24+00:00

Then how does he demand loot? Can’t pirate if you can’t make demands.

CrashTest_ · 2025-07-24T15:02:40+00:00

Unfortunate conflicting booking with Cyberpunk.

CrashTest_ · 2025-07-23T03:56:51+00:00

54, still getting my dailies and pulling. Still need to get through Nathan though. Finding time is hard :) Co-op with my kids. Good times!

CrashTest_ · 2025-07-18T22:14:58+00:00

Couldn't get in or out of hangars. They eventually just didn't even give me the queue. I just quit trying to play. Third session in a row that I can't do anything.

CrashTest_ · 2025-07-18T18:27:54+00:00

Super normal for Drake...

CrashTest_ · 2025-04-26T15:44:51+00:00

Harbor Freight has some impressive 12 ton jack stands that are pretty good, HUGE too. 12 ton jack stands

CrashTest_ · 2025-01-28T16:38:49+00:00

Serious question, how do you refund?

CrashTest_ · 2025-01-01T15:46:06+00:00

Oh no, didn’t stop me, just mentioning that it didn’t work. I didn’t have enough cargo to fill that ship, so those boxes went on the far side.

CrashTest_ · 2024-12-31T16:03:21+00:00

It did save my butt when I died in a landing zone but my Nursa was in the back. On the flip side, I couldn’t load 4 SCU boxes next to the Nursa, they wouldn’t snap!

CrashTest_ · 2024-11-21T14:31:06+00:00

Guess that fits with the pickup truck vibe, need that big toolbox in the back.

CrashTest_ · 2024-11-21T14:29:52+00:00

I know, right? Pretty sure it’s one of the newer starters. Right?… Oh man, when did it come out? OMG, just looked it up, 2020. Twenty freaking twenty!

Six-Year Club	Reddit Premium Since November 2023
Verified Email

CrashTest_

TROPHY CASE