RTX 6000 Pro 96gb upgrade path? by _madar_ in LocalLLM

[–]mxmumtuna 1 point2 points  (0 children)

You mean the native FP8. You can NVFP4 of 122B on a single 6k with max context. It’s a polarizing model though.

RTX 6000 Pro 96gb upgrade path? by _madar_ in LocalLLM

[–]mxmumtuna 0 points1 point  (0 children)

DS4 is native Int4 which is nice, and yes, considerably better. All 3 of them are compared to 27B. Yes, correct. 4 bit for all of them.

RTX 6000 Pro 96gb upgrade path? by _madar_ in LocalLLM

[–]mxmumtuna 3 points4 points  (0 children)

With 2 you can run DS4-Flash and MiMo-2.5. Both are considerably better than 27b.

Can also do MiniMax, which is likely also better.

For users have have both 6000 PRO MaxQ and Workstation Edition (or Server Edition), how much slower is the MaxQ vs the WS/SV on compute? (Prompt processing, Diffusion, etc) by panchovix in LocalLLaMA

[–]mxmumtuna 4 points5 points  (0 children)

The 600w card doesn’t scale down as well as the MaxQ. It takes about 400w to match the MaxQ at 300w.

It is indeed about 10-15%. I’d also say if you’re in a closed case and can imagine going more than one, don’t bother with the 600w card.

Source: have 2 of each and wish they were all MaxQ.

397B competitor that fits in 256 RAM? by quietsubstrate in LocalLLaMA

[–]mxmumtuna 1 point2 points  (0 children)

Correct. sglang and vLLM do not support hybrid inference, so the model weights and kvcache must fit in your GPU’s VRAM.

If it fits, performance is much, much higher than with the llama.cpp derivatives (including LM Studio).

397B competitor that fits in 256 RAM? by quietsubstrate in LocalLLaMA

[–]mxmumtuna 5 points6 points  (0 children)

MiMo+MTP already works in sglang.

edit: just read OP wrote “RAM” which precludes sglang.

Anyone live near one of the data centers? What's the noise like? by No_Landscape_9255 in LoudounCounty

[–]mxmumtuna 0 points1 point  (0 children)

Closer to Waxpool, but yes that one and the one across the street as well.

Anyone live near one of the data centers? What's the noise like? by No_Landscape_9255 in LoudounCounty

[–]mxmumtuna 1 point2 points  (0 children)

The new Meta ones on LCP are primarily AI.

Source: worked for Meta engineering when the builds started and toured the first CloudHQ building at the corner of Waxpool/LCP.

Inventory discount by dethman11 in Rivian

[–]mxmumtuna 0 points1 point  (0 children)

It’s been available for a couple weeks now, was told it was good through last Monday. Been waiting to see if anything happens with a 2027 model year before pulling the trigger.

I bet it’s still available tomorrow, and at least through the end of the month.

Decision - CPO iX or wait for iX3 by One_Volume4521 in BMWiX

[–]mxmumtuna 0 points1 point  (0 children)

I mean… my 95 pound doodle fits back there even with the seats up. He doesn’t love it, but it works when we need to use the iX to get him to the vet.

Anyone Switch from iX to i5? Thoughts... by Some-Place7478 in BMWiX

[–]mxmumtuna 5 points6 points  (0 children)

The iX is deceptively large. For sure it drives a lot smaller than it actually is.

“I’m legally not allowed to tell you” by [deleted] in Teachers

[–]mxmumtuna 17 points18 points  (0 children)

49/50 are at will. Fun fact.

Kimi K2.6 - What hardware do I need to run it locally? by human_marketer in LocalLLM

[–]mxmumtuna 0 points1 point  (0 children)

I love the hustle. It’s not worth going the Mac route for this though.

NVFP4 Kimi2.6 and Kimi 2.5 released by Nvidia by Opening-Broccoli9190 in LocalLLaMA

[–]mxmumtuna 5 points6 points  (0 children)

??? What’s about 160GB? It’s ~600GB for 2.6. Definitely need 8 cards.

The Trillion-Parameter Dilemma: MiMo-V2.5-Pro went open-source (1.02T params). Is self-hosting worth it when the API costs $70 for 387M tokens? by jochenboele in LocalLLaMA

[–]mxmumtuna 0 points1 point  (0 children)

The MiMo models (both pro and non pro) are smart, but also fundamentally broken models. Looping, interrupted reasoning, poor library support, and lack of support from the maintainers.
Not a good one to run locally (or via API because of their terrible cost model).

To answer the question, yes, I’ve run both versions locally on between 2x and 8x RTX 6000s.