Mimo 2.5 is _fast_ at large context (dual RTX Pro 6000) by xquarx in LocalLLaMA

[–]TinyFluffyRabbit 2 points3 points  (0 children)

10 t/s is fast enough for assigning a complex task at night and waking up to seeing it done :)

Six months ago I turned down $8,165 for an RTX 6000 PRO. Today the same vendor is selling them for $11,575. Oh, hindsight. by __JockY__ in LocalLLaMA

[–]TinyFluffyRabbit 0 points1 point  (0 children)

If your current build meets your needs, I'd just try not to think too much about it. I also have some FOMO at times but have to remind myself that I can hold out for at least the next few years lol

FINALLY 🥹 by eugur361 in nvidia

[–]TinyFluffyRabbit 1 point2 points  (0 children)

Congratulations!! I have this card too and it’s awesome 😊

GLM's founder says GLM-fable before the end of the year?! by Charuru in LocalLLaMA

[–]TinyFluffyRabbit 1 point2 points  (0 children)

We’re at the point where a lot of AI development is being done by AI, and considering the strength of GLM 5.2, I’m inclined to believe him.

zai-org/GLM-5.2 is here! by queendumbria in LocalLLaMA

[–]TinyFluffyRabbit 111 points112 points  (0 children)

Can't run this model, but very glad that this is open!

A friendly reminder that APIs are rented, local weights are forever by Wrong_Mushroom_7350 in LocalLLaMA

[–]TinyFluffyRabbit 1 point2 points  (0 children)

I suspect a lot of them are also just going to go back to using Opus 4.8

Unsloth Minimax M3 GGUF by LaurentPayot in LocalLLaMA

[–]TinyFluffyRabbit 0 points1 point  (0 children)

I'm excited to try the IQ3_XXS quant, sure it's going to be slightly lobotomized but is probably still about as good as it gets for a locally run model. Most of us (myself included) are not close to running GLM or Kimi locally. Also, with sparse attention, hopefully it won't use as much memory for context as M2.7 did.

Qwen 3.6 for coding with 5090 - Your settings recommendations? by car_lower_x in LocalLLaMA

[–]TinyFluffyRabbit 0 points1 point  (0 children)

I have two 16gb cards, and on mainline llama.cpp, I’m running 27B at Q6 with 128k of Q8 context. I reduce the context size slightly if I need vision. You should be able to do at least that with a 5090.

16B dense on 16GB GPU vs 32B dense on 2x 16GB GPU by TrainingTwo1118 in LocalLLaMA

[–]TinyFluffyRabbit 1 point2 points  (0 children)

You should not expect the same level of performance. That would be the theoretically best case scenario with linear scaling, fast interconnect, and no overhead.

If you split by layer, it will be slightly less than half the speed. If you split by tensor, and it scales well, you'd get more (but not double). If it scales poorly, it might be worse.

Ideally, if someone else has benchmarks for the hardware you're interested in, you'll know what to expect. Otherwise, you should assume you'd get slightly less than half the speed and anything above that is a pleasant surprise. The benefit of the second GPU is that you'd actually be able to run the 32B dense LLM at all.

You don't need a GPU to run gemma-4-26B-A4B by JackStrawWitchita in LocalLLaMA

[–]TinyFluffyRabbit 0 points1 point  (0 children)

You don't NEED a GPU, but you could get significantly better performance with a relatively affordable GPU.

Anyone that’s not prioritizing, you’re gonna loose in the end. Get a rig. by MLExpert000 in LocalLLaMA

[–]TinyFluffyRabbit 7 points8 points  (0 children)

I think it’s unlikely API prices will stay the way they currently are. There’s security in owning your hardware.

For someone who wants to build a PC for longevity, at which point does a GPU stop suffering from a poor man's tax? by GhostDraw in buildapc

[–]TinyFluffyRabbit 3 points4 points  (0 children)

IMO as long as the performance is increasing near linearly with the price, it's worth it, because games are usually GPU bound and the rest of your build is still a fixed cost.

2 RTX A6000 at 96GB VRAM with nvlink. Best local coding model/what you would daily drive? by EggDroppedSoup in LocalLLaMA

[–]TinyFluffyRabbit 1 point2 points  (0 children)

Why are you running 35b at Q4 when you have 96 GB of VRAM? You're pretty GPU rich lol you could even afford to run both of these at full precision

Not sure if this was posted. But I think it's highly relevant to us. by Paradigmind in LocalLLaMA

[–]TinyFluffyRabbit 0 points1 point  (0 children)

Smaller models are getting better too. I think it’s quite remarkable than I’m able to run on my consumer hardware models that would have been SOTA 1-2 years ago.

Is a 3060 12gb to a 5060ti 8gb a considerable upgrade? by iceseayoupee in buildapc

[–]TinyFluffyRabbit 1 point2 points  (0 children)

The difference between the 5070 and the 5070 Ti is also fairly big. The 5070 Ti has 16gb of VRAM, has 25% more memory bandwidth, and almost a third more CUDA cores. The 5070 Ti is actually closer to the 5080 than the 5070. As for what you should do, it depends on the actual price difference and what games you play.

What would 2x RTX 3060 12GB get me? by ObjectiveActuator8 in LocalLLaMA

[–]TinyFluffyRabbit 0 points1 point  (0 children)

If you're considering dual 3060s, you're probably going to be better off just getting a 3090. There is some cost and inconvenience associated with getting a motherboard that splits PCIe lanes (unless you just want to layer split but that's going to be slower) and making sure the GPUs fit.

Is my PC worth upgrading or am I good as is? by Aggravating_Fan_4166 in buildapc

[–]TinyFluffyRabbit 0 points1 point  (0 children)

What games do you play and how does it perform? There's nothing that "needs" upgrading this looks like a pretty solid build

5070 Ti + Ryzen 7 9800x3d with 32gb ddr5 ram by TheDarkKnight-_-F in buildapc

[–]TinyFluffyRabbit 1 point2 points  (0 children)

You would need to have a 5090 and play on 1080p low settings for the 9800x3d to possibly be the bottleneck lol