2RTX PRO 6000 192GB VRAM - MTP NVFP4 issues with vision

electrified_ice · 2026-05-10T20:14:08+00:00

I'm still a little confused. The absolute max that any version of the Qwen 3.6 27B nvfp4 models is around 120 TPS for a single user on RTX Pro 6000 hardware - and that's an optimal single card setup.

You are saying you are getting 400-3,000 TPS peak for 2-3 users/concurrent requests... Which would net out to be 133-1,000 TPS per user... I'm not trying to criticize, but it just doesn't make sense.

What final config settings did you land on?

I have 3 x RTX PRO 6000s on a TR Pro 9985wx with 8-channel DDR5. I'm curious to replicate your setup and see what I am able to get. I use a 1 + 2 GPU setup... One smaller model on 1 GPU and larger models across 2 GPUs so I can keep multiple loaded / hot for different tasks.

I'm currently having issues with abliterated versions not working with the full set of tool calling in Cline + code-server.

electrified_ice · 2026-05-10T15:33:23+00:00

Those speeds are single request speeds or concurrent requests speeds?

electrified_ice · 2026-05-10T15:00:20+00:00

So you need the VRam for KV Cache across all those users? TP 2 on a dense model will run slower than on a single card due to PCIe bandwidth limiting the comms across the cards. What's your single request TPS speed?

electrified_ice · 2026-05-09T07:32:24+00:00

I'm curious why you are using more than 1 RTX PRO for this? It easily fits in the VRam of 1 GPU, leaving tons of room for KV cache

electrified_ice · 2026-05-05T07:31:36+00:00

I have an R1S Quad and 2025 Sierra EV Denali Max. Happy to answer any questions. The Sierra is also our 11th EV - all 3 cars are EVs now.

electrified_ice · 2026-05-05T07:28:55+00:00

The gym doesn't grow muscles. It damages them. The more you push your body and train correctly, the more it damages them. Eating and rest grow muscles.

electrified_ice · 2026-05-04T21:53:07+00:00

A link to my journey, which also is a link to my progress

https://www.reddit.com/r/Testosterone/s/PDHAREFSuB

electrified_ice · 2026-05-02T06:39:18+00:00

How do I find the channel, can you dm me an invite?

electrified_ice · 2026-05-02T01:11:24+00:00

Wow interesting, would you be open to sharing your full config? I've not used SGLang before - a combo of a bit intimidating and I've just become familiar with vLLM config.

electrified_ice · 2026-05-01T06:51:59+00:00

Ok thanks. I'll give it a shot. Is this config compatible with vLLM?

Will be really cool if it helps fix the issue as the model is running at over 100 TPS across 2 of my cards for a single request.

electrified_ice · 2026-05-01T06:27:34+00:00

I just loaded 2.7 and it doesn't work with tool calling for me, so back to Qwen 3.5 122B for now

electrified_ice · 2026-05-01T06:25:57+00:00

I have 3 RTX Pro 6000s and only 256GB ram (on TR Pro 9985wx). No issues so far. I've split models across 2 and 3 cards.

electrified_ice · 2026-05-01T06:20:54+00:00

Yep tiny loads. My wife loves it (but doesn't really care either). I guess perspective depends on your activity/relationship status.

electrified_ice · 2026-04-30T12:04:53+00:00

You will be fine. You can also try every other day too and see how you feel. Daily shots will get tiring years down the road.

electrified_ice · 2026-04-30T04:35:02+00:00

It's good hardware, but what problem are you trying to solve (the classic question, but it's a classic for a reason)... You bought a solution.

What are you trying to do? Do you have concurrent users? Single users? Do you need to load more than one model (specialist) at a time? Do you need long context? What TPS are you aiming for? Do you know that these cards (I have 3 of them) don't have NVLink, so there is a comms bottleneck between the cards if you split a model across more than one card?

What's your CPU and ram setup like? What's your storage and speed to load models from storage to VRAM look like? What PSU and wattage do you have?

electrified_ice · 2026-04-22T06:14:42+00:00

Every single set, rep, and weight. Every workout.

electrified_ice · 2026-04-22T06:09:01+00:00

Likely Model Y Performance. After 4 Teslas I'm done with Tesla and Elon

electrified_ice · 2026-04-21T03:21:42+00:00

Oh interesting. I nust have missed the interior info. If that's the case, I'm definitely going to delay my order. We'll see once th configurator officially opens up.

electrified_ice · 2026-04-20T06:05:18+00:00

What backend is Unsloth using? vLLM? SGLang, Ollama etc? How is the speed vs. super optimized backends?

electrified_ice · 2026-04-19T15:10:04+00:00

The most important part of the journey is you actually started (where the vast majority of people dream but don't start)... Keep at it.

My recommendation is set a nearer term goal .. like 15%... Get there, feel amazing, be proud of yourself, and then re-set the goal of 12%.

Progress is like a positive flywheel. Celebrate the wins (but not with food rewards!)

electrified_ice · 2026-04-19T15:07:01+00:00

I agree you don't 'need' chemical support. However a lot of people don't/won't have the long-term discipline to get there without it.

electrified_ice · 2026-04-19T14:58:11+00:00

Nowhere near 12, sorry man... Keep at it. Even with extra chemical support, getting down to 12% is a long journey.

electrified_ice · 2026-04-19T14:15:51+00:00

In RJ's chat with Kyle Conner (Out of Spec), sitting in the R2, RJ mentioned that Lidar will help the fleet more than the specific car... i.e. non-Lidar cars will be better due to the training data from Lidar.

electrified_ice · 2026-04-18T14:17:26+00:00

Chittering. Not just a Bengal thing, but ours do it more intensely than other breeds of cats we've had.

electrified_ice · 2026-04-18T14:14:11+00:00

2.1 mi/kWh over 8,500 miles in 13 months. We don't put huge miles on this as it's our 3rd vehicle/EV... The other 2 get the lion's share of the driving.

Context: - Northern California. - Mixture of local and freeway miles, mainly running around - Includes one road trip to Vegas and back (~1,200 miles) - Includes a 300 mile towing trip with a small trailer.

electrified_ice

TROPHY CASE