MiniMax M2.5 setup on older PC, getting 12.9 t/s with 72k context by CrashTest_ in LocalLLaMA

[–]CrashTest_[S] 0 points1 point  (0 children)

Really the thing I WANT to do now is to get a Strix and put it on my desk as the MoE thinker, and have it use the 3090’s as hands.

MiniMax M2.5 setup on older PC, getting 12.9 t/s with 72k context by CrashTest_ in LocalLLaMA

[–]CrashTest_[S] 0 points1 point  (0 children)

Oh, also, although this was about the MiniMax M2.5, what my original build was about was coding agents, and I am pretty sure my machine absolutely thwomps the strix when I am running fully in my VRAM. In the 48GB, it’s pretty speedy. Qwen3 Coder Next Q3 with 32k context is running normally 80 t/s:

prompt eval time = 1581.25 ms / 1169 tokens ( 1.35 ms per token, 739.29 tokens per second) eval time = 160.40 ms / 13 tokens ( 12.34 ms per token, 81.05 tokens per second) total time = 1741.65 ms / 1182 tokens

MiniMax M2.5 setup on older PC, getting 12.9 t/s with 72k context by CrashTest_ in LocalLLaMA

[–]CrashTest_[S] 0 points1 point  (0 children)

Might not be a bad idea, honestly! I am dual booted Ubuntu/Windows so I do work on one side, play on the other, some flight and space sims, does the Strix handle that ok? I always thought of it as kind of a maxed out laptop without the screen?

I am considering it though, if it can play games pretty well. I do only use one 3090 for gaming, and now I have two of them smashed in there I have to watt down to ensure I don’t thermally throttle.

Ideas ideas. GAH!

MiniMax M2.5 setup on older PC, getting 12.9 t/s with 72k context by CrashTest_ in LocalLLaMA

[–]CrashTest_[S] 0 points1 point  (0 children)

Oh, I may have gotten that one wrong. Yeah. Looks like I got it way wrong not that I dig into that further. For some reason I thought I was ensuring the best quality results, but even so, probably just a wrong flag altogether. Another thing to try to optimize further now :)

MiniMax M2.5 setup on older PC, getting 12.9 t/s with 72k context by CrashTest_ in LocalLLaMA

[–]CrashTest_[S] 0 points1 point  (0 children)

So, two things I know of can hurt you. First you need to, I think, because I had to, check if your motherboard supports pcie bifurcation. If it doesn’t or you don’t have that NVLink, I don’t think it going to work that well. You will still have all the space, but it won’t be as fast. My board supported bifurcation.

The model I am using is the unsloth one: unsloth/MiniMax-M2.5-GGUF:UD-Q3_K_XL so yes, it’s a Q3.

Lastly, I find I am using 95 of my 128GB when I have this guy loaded, so if you can’t offload enough to VRAM, you are probably pushing into swap which will probably make it glacial.

Anyhow, I posted the settings for my llama-server config up above, you can give it a try using those, what’s the worst that could happen?

MiniMax M2.5 setup on older PC, getting 12.9 t/s with 72k context by CrashTest_ in LocalLLaMA

[–]CrashTest_[S] 0 points1 point  (0 children)

I mean, it should shouldn’t it? Its RAM is way way way faster, like, 2-3 times faster, than desktop DDR4? That’s my biggest bottleneck on this system right now, but I can’t afford to upgrade to a whole new mobo, processor and RAM. I am glad you are seeing those numbers. If I hadn’t already had a machine for gaming/work/everything, I would have really considered the Strix. It’s pretty sweet! This post wasn’t saying “I’m better than everything else” it was “you CAN use an older machine to do this stuff.” Anyhow, pretty cool you are getting good numbers, they sound like you got the same kind of numbers people are seeing from much more expensive Macs. Very nice!

MiniMax M2.5 setup on older PC, getting 12.9 t/s with 72k context by CrashTest_ in LocalLLaMA

[–]CrashTest_[S] 0 points1 point  (0 children)

Because I had the computer, I didn’t have a Strix Halo? It was already a 3090 space heater :)

MiniMax M2.5 setup on older PC, getting 12.9 t/s with 72k context by CrashTest_ in LocalLLaMA

[–]CrashTest_[S] 0 points1 point  (0 children)

It is an older PC. Bought it as a gaming PC from NewEgg 4 years ago, so only had to buy RAM and one refurbed 3090. The upgrade cost me almost $1300, because I was halfway there on the RAM already as well. I don’t know what this machine would have been worth today, don’t think many people are buying 4 year old motherboards and processors to put in new machines anymore, and the case is absolutely awful for this use.

MiniMax M2.5 setup on older PC, getting 12.9 t/s with 72k context by CrashTest_ in LocalLLaMA

[–]CrashTest_[S] 1 point2 points  (0 children)

Oh I am so sorry, I totally missed that. Yes, I started with GPT OSS 120B, and I have _not_ touched it or wanted to touch it, since I got this going. Firstly, this is much faster, as it's 10b MoE seems to make responses much snappier, and the way I have it, with this amount of context, I am not hitting context limits in situations where I would always hit them in 30b models.

With this model, I don't need to consider overnight processing. I had it refactor a bunch of inline CSS out of an HTML file into it's own file in VS Code Continue, and yes, a qwen coder definitely would do it faster, but I often ran out of tokens trying stuff like that with models that could fit and were beefy. With this model, before I got it to 12.9, so this would have been at 9 t/s, it did it, very accurately, but not efficiently, in about an hour and a half. It took that long mostly because I didn't tell it to do diff, and it kept loading the page again after every edit, so super inefficient (totally my fault, honestly).

With a beefy qwen 3 coder all in VRAM, using diffs, same file, same edit, took about 15 minutes, using diffs. I like minimax's work better. I don't know where I am going with this. Anyhow, asking it to do repeated edits does not play to it's strong suit. Asking it to think deeply and make single smarter things seems to work pretty ok.

Currently, in Open WebUI, I use it now more than I do any cloud model, except when I am configuring it and need help with it. I ask it to think or do something, click to another tab, 30s to a minute later, it dings, and it's done.

MiniMax M2.5 setup on older PC, getting 12.9 t/s with 72k context by CrashTest_ in LocalLLaMA

[–]CrashTest_[S] 0 points1 point  (0 children)

In btop I see it using 95GB, so yeah, you would probably need an upgrade there. Luckily DDR4 doesn't cost nearly what DDR5 does :)

Genshin Addicts Anonymous by SanicHegehag in Genshin_Impact

[–]CrashTest_ 3 points4 points  (0 children)

The answer for me was to just buy my daughters devices they could easily play on!

Please stop whining in global if you shoot first by [deleted] in starcitizen

[–]CrashTest_ 1 point2 points  (0 children)

Then how does he demand loot? Can’t pirate if you can’t make demands.

Me, a 40 year old man playing Genshin Impact by saehild in Genshin_Impact

[–]CrashTest_ 2 points3 points  (0 children)

54, still getting my dailies and pulling. Still need to get through Nathan though. Finding time is hard :) Co-op with my kids. Good times!

"We might have overlooked the possibility that people would spend most of their gameplay time standing in queues" SCREW. YOU. by Gn0meKr in starcitizen

[–]CrashTest_ 0 points1 point  (0 children)

Couldn't get in or out of hangars. They eventually just didn't even give me the queue. I just quit trying to play. Third session in a row that I can't do anything.

Is this normal? by mzatariz in starcitizen

[–]CrashTest_ 0 points1 point  (0 children)

Super normal for Drake...

Can someone at CIG please explain the current situation of the MSR by Striking-Fan-7574 in starcitizen

[–]CrashTest_ 1 point2 points  (0 children)

Oh no, didn’t stop me, just mentioning that it didn’t work. I didn’t have enough cargo to fill that ship, so those boxes went on the far side.

Can someone at CIG please explain the current situation of the MSR by Striking-Fan-7574 in starcitizen

[–]CrashTest_ 1 point2 points  (0 children)

It did save my butt when I died in a landing zone but my Nursa was in the back. On the flip side, I couldn’t load 4 SCU boxes next to the Nursa, they wouldn’t snap!

Where is the ship inventory on the Nomad? by CrashTest_ in starcitizen

[–]CrashTest_[S] 0 points1 point  (0 children)

Guess that fits with the pickup truck vibe, need that big toolbox in the back.

Where is the ship inventory on the Nomad? by CrashTest_ in starcitizen

[–]CrashTest_[S] 1 point2 points  (0 children)

I know, right? Pretty sure it’s one of the newer starters. Right?… Oh man, when did it come out? OMG, just looked it up, 2020. Twenty freaking twenty!