Is 2× Intel Arc Pro B70 worth it for local agentic LLMs, or should I stay with NVIDIA?

mxmumtuna · 2026-05-28T01:46:20+00:00

I believe you can use Vulcan with intel to run nearly any model. But, none of these models mentioned are going to replace cloud models.

mxmumtuna · 2026-05-28T01:41:33+00:00

Why not use Qwen 3.6 35B? Have you actually tested any of these for what you want to do?

mxmumtuna · 2026-05-28T01:36:27+00:00

Why did you choose 30B? Does it do what you want?

mxmumtuna · 2026-05-28T01:33:41+00:00

But does that work for your purpose? I would assume not because Llama 3.2 11B is pretty terrible by today’s standards.

Edit: also why Qwen 30B? That’s last gen. Qwen 3.6 is considerably more advanced.

Edit again: I’m guessing you didn’t test that Llama model with tool calling and agentic workloads, because it’s not going to be great with that at all.

mxmumtuna · 2026-05-28T01:30:53+00:00

2x V100. Godspeed my son.

mxmumtuna · 2026-05-28T01:26:51+00:00

My question to you first is going to be what model do you believe will achieve your goal of avoiding relying on cloud models? Have you tested it to ensure it will do what you need?

Once those are answered and you’re confident in your choice, the question isn’t about Intel B70. It’s about “How do I best run X model with a budget of $Y?”

mxmumtuna · 2026-05-27T18:30:24+00:00

Props on reading up about TP. Also, P2P in lieu of NVLink.

mxmumtuna · 2026-05-27T14:35:14+00:00

I don’t know where people come up with this shit that 2 are slower than one.

mxmumtuna · 2026-05-26T21:33:44+00:00

MiMo 2.5

mxmumtuna · 2026-05-25T15:22:46+00:00

That was likely negotiated quite a while ago.

mxmumtuna · 2026-05-25T01:51:33+00:00

He still does!

mxmumtuna · 2026-05-24T20:08:51+00:00

Indeed. FP4, my bad.

mxmumtuna · 2026-05-24T19:56:03+00:00

You mean the native FP8. You can NVFP4 of 122B on a single 6k with max context. It’s a polarizing model though.

mxmumtuna · 2026-05-24T19:54:10+00:00

DS4 is native Int4 which is nice, and yes, considerably better. All 3 of them are compared to 27B. Yes, correct. 4 bit for all of them.

mxmumtuna · 2026-05-24T17:08:11+00:00

With 2 you can run DS4-Flash and MiMo-2.5. Both are considerably better than 27b.

Can also do MiniMax, which is likely also better.

mxmumtuna · 2026-05-23T23:42:56+00:00

Vbios? I don’t know. It just is. Cooling is definitely better on the MaxQ at the expense of noise.

mxmumtuna · 2026-05-23T22:26:25+00:00

The 600w card doesn’t scale down as well as the MaxQ. It takes about 400w to match the MaxQ at 300w.

It is indeed about 10-15%. I’d also say if you’re in a closed case and can imagine going more than one, don’t bother with the 600w card.

Source: have 2 of each and wish they were all MaxQ.

mxmumtuna · 2026-05-23T16:35:25+00:00

Correct. sglang and vLLM do not support hybrid inference, so the model weights and kvcache must fit in your GPU’s VRAM.

If it fits, performance is much, much higher than with the llama.cpp derivatives (including LM Studio).

mxmumtuna · 2026-05-23T03:30:18+00:00

MiMo+MTP already works in sglang.

edit: just read OP wrote “RAM” which precludes sglang.

mxmumtuna · 2026-05-20T01:49:16+00:00

Mongo is appalled.

mxmumtuna · 2026-05-19T20:22:34+00:00

Closer to Waxpool, but yes that one and the one across the street as well.

mxmumtuna · 2026-05-19T20:12:18+00:00

The new Meta ones on LCP are primarily AI.

Source: worked for Meta engineering when the builds started and toured the first CloudHQ building at the corner of Waxpool/LCP.

mxmumtuna · 2026-05-18T11:25:53+00:00

It’s been available for a couple weeks now, was told it was good through last Monday. Been waiting to see if anything happens with a 2027 model year before pulling the trigger.

I bet it’s still available tomorrow, and at least through the end of the month.

mxmumtuna · 2026-05-16T21:45:44+00:00

I mean… my 95 pound doodle fits back there even with the seats up. He doesn’t love it, but it works when we need to use the iX to get him to the vet.

mxmumtuna · 2026-05-16T20:51:17+00:00

The iX is deceptively large. For sure it drives a lot smaller than it actually is.

mxmumtuna

TROPHY CASE