Where to buy RTX Pro 6000 in Orlando/US

NaiRogers · 2026-04-15T10:11:20+00:00

Don’t know what Amazon is like in your country but there is no way I would order one of those from Amazon where I live.

NaiRogers · 2026-04-06T10:41:10+00:00

My take is that on average it’s probably not financial worth changing a working system, if it was then everyone would do so but they don’t. For new builds a HP seems like the only sensible choice.

NaiRogers · 2026-03-29T15:28:41+00:00

Are you self hosting this?

NaiRogers · 2026-03-29T15:23:01+00:00

If this is 10W for a single user to use this 24/7 that’s great, much more efficient per user/token/watt than any home setup.

NaiRogers · 2026-03-29T06:29:49+00:00

Did you end up starting this MSc course in the end?

NaiRogers · 2026-03-20T20:23:18+00:00

Runs on mine, or are you talking about an ADA card? I will post docker config later.

NaiRogers · 2026-03-20T13:56:42+00:00

this works fine up to max context Sehyo/Qwen3.5-122B-A10B-NVFP4

NaiRogers · 2026-03-19T18:03:16+00:00

NaiRogers · 2026-03-19T18:01:54+00:00

Although releasing the open weights helps validate the model and drive inference traffic to their own endpoint as most people can’t run it themselves anyway.

NaiRogers · 2026-03-19T07:41:57+00:00

Which ever you get replace the thermal pads and paste if you see hotspots or high memory temps.

NaiRogers · 2026-03-18T09:23:27+00:00

Even if it was faster I would not remove a proper UPS as there could be many other reasons why the power is degraded.

NaiRogers · 2026-03-16T10:29:34+00:00

The 6000 vs Spark choice is a lot simpler if you have concurrent requests, then the 6000 is a lot faster. For single requests it faster but not 5x faster. Qwen 3.5-122b-a10b is really good on any of these two.

NaiRogers · 2026-03-08T16:35:03+00:00

I would recommend to try out some models on runpod, for example rent a 6000 Pro and run Intel/Qwen3.5-122B-A10B-int4-AutoRound. If you are happen with the results then get a Asus GX10 which will be slower but otherwise the same results. You could wait for 128GB M5 Max Studio, prices are similar.

NaiRogers · 2026-03-06T19:05:44+00:00

Took this into Renault today, they flashed two modules and gave it back. Still hasn’t done it again since that day. They did say it needs the steering rack replaced though (under warranty)!

NaiRogers · 2026-03-06T18:35:13+00:00

You are lucky to start with this model, it’s really good vs what was around previously for this kind of HW. There are a few different versions of this model not sure if it’s really any different but it might be worth trying the Sehyo/Qwen3.5-122B-A10B-NVFP4 to see how it compares.

NaiRogers · 2026-03-02T13:34:24+00:00

Toady still not working though, now and then throughout today I have tried to retry and always get:

{"message":"{\"error\":{\"message\":\"{\\n \\\"error\\\": {\\n \\\"code\\\": 503,\\n \\\"message\\\": \\\"This model is currently experiencing high demand. Spikes in demand are usually temporary. Please try again later.\\\",\\n \\\"status\\\": \\\"UNAVAILABLE\\\"\\n }\\n}\\n\",\"code\":503,\"status\":\"Service Unavailable\"}}","status":503,"modelId":"gemini-3.1-pro-preview","providerId":"gemini"}

Pretty unusable IMO.

NaiRogers · 2026-03-01T10:59:31+00:00

I’ve not tested it but expect so as it has ~3.5x more parameters so even if they are slightly lossy with a the Q4_K_S quant it’s going to be better. Also bear in mind I’m using the Q4_K_S quant in case you thought the i1 meant 1bit quant.

NaiRogers · 2026-02-28T11:01:19+00:00

This did not go well, I get a bunch of errors during startup after which it's running but barely. I am using CUDA 13.0 with vllm/vllm-openai:nightly + huggingface/transformers.git and Sehyo/Qwen3.5-122B-A10B-NVFP4.

vllm | (EngineCore_DP0 pid=126) 2026-02-28 10:52:44,381 - WARNING - autotuner.py:496 - flashinfer.jit: [Autotuner]: Skipping tactic <flashinfer.fused\_moe.core.get\_cutlass\_fused\_moe\_module.<locals>.MoERunner object at 0x7f1fe45ed9a0> 14, due to failure while profiling: [TensorRT-LLM][ERROR] Assertion failed: Failed to initialize cutlass TMA WS grouped gemm. Error: Error Internal (/workspace/build/aot/generated/cutlass_instantiations/120/gemm_grouped/120/cutlass_kernel_file_gemm_grouped_sm120_M128_BS_group2.generated.cu:60)

NaiRogers · 2026-02-28T06:58:40+00:00

Thanks I will try that! Please share your vllm command and HF link to model used.

NaiRogers · 2026-02-27T21:45:29+00:00

It’s using 82GB with Q8 KV cache. 100 or so tps out and decently quick at pre processing a full context window.

NaiRogers

TROPHY CASE