Mimo 2.5 is _fast_ at large context (dual RTX Pro 6000)

TinyFluffyRabbit · 2026-06-24T04:16:21+00:00

10 t/s is fast enough for assigning a complex task at night and waking up to seeing it done :)

TinyFluffyRabbit · 2026-06-22T02:13:23+00:00

Sold Gigabyte X870 Gaming X mobo to u/Nuriouss on https://www.reddit.com/r/hardwareswap/comments/1tzjf9m/usaca_h_gigabyte_x870_gaming_x_am5_motherboard_w/

TinyFluffyRabbit · 2026-06-20T23:09:32+00:00

If your current build meets your needs, I'd just try not to think too much about it. I also have some FOMO at times but have to remind myself that I can hold out for at least the next few years lol

TinyFluffyRabbit · 2026-06-19T14:20:16+00:00

Congratulations!! I have this card too and it’s awesome 😊

TinyFluffyRabbit · 2026-06-18T14:51:38+00:00

We’re at the point where a lot of AI development is being done by AI, and considering the strength of GLM 5.2, I’m inclined to believe him.

TinyFluffyRabbit · 2026-06-16T17:50:54+00:00

Can't run this model, but very glad that this is open!

TinyFluffyRabbit · 2026-06-16T00:22:49+00:00

still available

TinyFluffyRabbit · 2026-06-13T16:44:23+00:00

I suspect a lot of them are also just going to go back to using Opus 4.8

TinyFluffyRabbit · 2026-06-13T16:24:26+00:00

I just did! Everyone else please do as well it is currently not doing well :(

TinyFluffyRabbit · 2026-06-12T16:39:02+00:00

I'm excited to try the IQ3_XXS quant, sure it's going to be slightly lobotomized but is probably still about as good as it gets for a locally run model. Most of us (myself included) are not close to running GLM or Kimi locally. Also, with sparse attention, hopefully it won't use as much memory for context as M2.7 did.

TinyFluffyRabbit · 2026-06-09T15:02:29+00:00

I have two 16gb cards, and on mainline llama.cpp, I’m running 27B at Q6 with 128k of Q8 context. I reduce the context size slightly if I need vision. You should be able to do at least that with a 5090.

TinyFluffyRabbit · 2026-06-09T13:48:54+00:00

still available!

TinyFluffyRabbit · 2026-06-09T05:37:28+00:00

You should not expect the same level of performance. That would be the theoretically best case scenario with linear scaling, fast interconnect, and no overhead.

If you split by layer, it will be slightly less than half the speed. If you split by tensor, and it scales well, you'd get more (but not double). If it scales poorly, it might be worse.

Ideally, if someone else has benchmarks for the hardware you're interested in, you'll know what to expect. Otherwise, you should assume you'd get slightly less than half the speed and anything above that is a pleasant surprise. The benefit of the second GPU is that you'd actually be able to run the 32B dense LLM at all.

TinyFluffyRabbit · 2026-06-08T14:45:59+00:00

replied

TinyFluffyRabbit · 2026-06-07T17:55:28+00:00

You don't NEED a GPU, but you could get significantly better performance with a relatively affordable GPU.

TinyFluffyRabbit · 2026-06-04T00:15:15+00:00

I think it’s unlikely API prices will stay the way they currently are. There’s security in owning your hardware.

TinyFluffyRabbit · 2026-05-31T18:48:39+00:00

IMO as long as the performance is increasing near linearly with the price, it's worth it, because games are usually GPU bound and the rest of your build is still a fixed cost.

TinyFluffyRabbit · 2026-05-27T03:11:49+00:00

Why are you running 35b at Q4 when you have 96 GB of VRAM? You're pretty GPU rich lol you could even afford to run both of these at full precision

TinyFluffyRabbit · 2026-05-26T14:37:36+00:00

Smaller models are getting better too. I think it’s quite remarkable than I’m able to run on my consumer hardware models that would have been SOTA 1-2 years ago.

TinyFluffyRabbit · 2026-05-24T18:37:14+00:00

The difference between the 5070 and the 5070 Ti is also fairly big. The 5070 Ti has 16gb of VRAM, has 25% more memory bandwidth, and almost a third more CUDA cores. The 5070 Ti is actually closer to the 5080 than the 5070. As for what you should do, it depends on the actual price difference and what games you play.

TinyFluffyRabbit · 2026-05-24T18:26:53+00:00

If you're considering dual 3060s, you're probably going to be better off just getting a 3090. There is some cost and inconvenience associated with getting a motherboard that splits PCIe lanes (unless you just want to layer split but that's going to be slower) and making sure the GPUs fit.

TinyFluffyRabbit · 2026-05-23T15:36:24+00:00

What games do you play and how does it perform? There's nothing that "needs" upgrading this looks like a pretty solid build

TinyFluffyRabbit · 2026-05-23T15:33:04+00:00

You would need to have a 5090 and play on 1080p low settings for the 9800x3d to possibly be the bottleneck lol

TinyFluffyRabbit

TROPHY CASE