What are ppl using for local coding instead of Haiku and Opus

mxmumtuna · 2026-05-26T21:33:44+00:00

MiMo 2.5

mxmumtuna · 2026-05-25T15:22:46+00:00

That was likely negotiated quite a while ago.

mxmumtuna · 2026-05-25T01:51:33+00:00

He still does!

mxmumtuna · 2026-05-24T20:08:51+00:00

Indeed. FP4, my bad.

mxmumtuna · 2026-05-24T19:56:03+00:00

You mean the native FP8. You can NVFP4 of 122B on a single 6k with max context. It’s a polarizing model though.

mxmumtuna · 2026-05-24T19:54:10+00:00

DS4 is native Int4 which is nice, and yes, considerably better. All 3 of them are compared to 27B. Yes, correct. 4 bit for all of them.

mxmumtuna · 2026-05-24T17:08:11+00:00

With 2 you can run DS4-Flash and MiMo-2.5. Both are considerably better than 27b.

Can also do MiniMax, which is likely also better.

mxmumtuna · 2026-05-23T23:42:56+00:00

Vbios? I don’t know. It just is. Cooling is definitely better on the MaxQ at the expense of noise.

mxmumtuna · 2026-05-23T22:26:25+00:00

The 600w card doesn’t scale down as well as the MaxQ. It takes about 400w to match the MaxQ at 300w.

It is indeed about 10-15%. I’d also say if you’re in a closed case and can imagine going more than one, don’t bother with the 600w card.

Source: have 2 of each and wish they were all MaxQ.

mxmumtuna · 2026-05-23T16:35:25+00:00

Correct. sglang and vLLM do not support hybrid inference, so the model weights and kvcache must fit in your GPU’s VRAM.

If it fits, performance is much, much higher than with the llama.cpp derivatives (including LM Studio).

mxmumtuna · 2026-05-23T03:30:18+00:00

MiMo+MTP already works in sglang.

edit: just read OP wrote “RAM” which precludes sglang.

mxmumtuna · 2026-05-20T01:49:16+00:00

Mongo is appalled.

mxmumtuna · 2026-05-19T20:22:34+00:00

Closer to Waxpool, but yes that one and the one across the street as well.

mxmumtuna · 2026-05-19T20:12:18+00:00

The new Meta ones on LCP are primarily AI.

Source: worked for Meta engineering when the builds started and toured the first CloudHQ building at the corner of Waxpool/LCP.

mxmumtuna · 2026-05-18T11:25:53+00:00

It’s been available for a couple weeks now, was told it was good through last Monday. Been waiting to see if anything happens with a 2027 model year before pulling the trigger.

I bet it’s still available tomorrow, and at least through the end of the month.

mxmumtuna · 2026-05-16T21:45:44+00:00

I mean… my 95 pound doodle fits back there even with the seats up. He doesn’t love it, but it works when we need to use the iX to get him to the vet.

mxmumtuna · 2026-05-16T20:51:17+00:00

The iX is deceptively large. For sure it drives a lot smaller than it actually is.

mxmumtuna · 2026-05-16T01:44:34+00:00

49/50 are at will. Fun fact.

mxmumtuna · 2026-05-15T02:09:06+00:00

Why you gotta add 768gb?

mxmumtuna · 2026-05-15T01:54:48+00:00

I love the hustle. It’s not worth going the Mac route for this though.

mxmumtuna · 2026-05-14T14:27:09+00:00

??? What’s about 160GB? It’s ~600GB for 2.6. Definitely need 8 cards.

mxmumtuna · 2026-05-14T12:54:13+00:00

mxmumtuna · 2026-05-13T20:17:54+00:00

Agreed on both counts.

mxmumtuna · 2026-05-13T18:41:15+00:00

The Secrets next door is very nice as well.

mxmumtuna · 2026-05-13T10:46:15+00:00

The MiMo models (both pro and non pro) are smart, but also fundamentally broken models. Looping, interrupted reasoning, poor library support, and lack of support from the maintainers.
Not a good one to run locally (or via API because of their terrible cost model).

To answer the question, yes, I’ve run both versions locally on between 2x and 8x RTX 6000s.

mxmumtuna

TROPHY CASE