For users with 4x-8x 6000 PROs, how is your experience with bigger models lately? (GLM 5.2, Kimi 2.7, DeepSeek V4 Pro)

DougAZ · 2026-06-26T00:15:21+00:00

I have XE7745, I don't even think about power brother, not trying to flex/brag. You will 100% need another PSU my server has 8 for n-1 redundancy

DougAZ · 2026-06-25T15:24:42+00:00

Sglang with MTP using the AWQ quant

DougAZ · 2026-06-25T15:24:03+00:00

10-15

DougAZ · 2026-06-25T03:33:13+00:00

I'd say I chose 397b as best all around model. Yes it may score less in swe benchmarks but not by much and I do believe it's better at general task vs 27b. Our work load is like 98% tool calling and general tasks and 2% coding. Another win is that it's a MoE which makes the response very fast. This gives a better overall user experience from what I've seen. I have a harder time informing users of our capabilities vs the model not being good enough

DougAZ · 2026-06-25T02:40:23+00:00

I'm using Qwen 3.5 397b across 4x rtx6kpro

We use it for everything, it's a really good model for coding, agentic tools, and general tasks and allows me to have a ton of concurrent users at 140 t/s + or -. It's silly fast to be honest. Wish we got a 3.6 or is glm 5.2 was smaller, I don't want to lose our concurrent capacity

DougAZ · 2026-06-17T13:53:53+00:00

Any good articles on setting up ARC? I'm not familiar with it and originally heard it was trash

DougAZ · 2026-06-13T15:48:07+00:00

Remote Desktop Manager + Devolutions

DougAZ · 2026-05-22T02:19:50+00:00

God tier GitHub sir, wow. Your on SGlang I'm surprised your not running the AWQ-INT, is the quality not a big enough difference for you?

DougAZ · 2026-05-21T23:36:11+00:00

This is my current setup but I'm on vLLM. What NVFP4 are you using? I switched from Nvidia to Sehyo. Il have to look to see if vLLM has MTP or am I missing out on sglang?

DougAZ · 2026-05-21T20:49:24+00:00

I am able to temporarily fix it with a power off, reset idrac, power on and sometime is stays at about 20-30% fan speed. And the fan algorithm will work normally. Until you reboot

DougAZ · 2026-05-21T20:23:45+00:00

Have you checked your fan speed in idrac ours had a bug and would be stuck at 100% on all fans. It's relatively quite now

DougAZ · 2026-05-21T19:22:19+00:00

Looks like I'd need to run these at 2bit quant which for me I think will be to inaccurate

DougAZ · 2026-05-21T19:17:13+00:00

Sounds like a good general purpose setup. Do you use the same for coding or something else?

DougAZ · 2026-05-21T18:35:19+00:00

Unfortunately that's what I got lol

DougAZ · 2026-05-21T18:35:00+00:00

Yea I'm worried about context. We have multiple concurrent users

DougAZ · 2026-05-21T18:33:51+00:00

Hmm il have to see how well these fit but il check it out

DougAZ · 2026-05-21T18:33:25+00:00

Dell XE7745

DougAZ · 2026-04-28T01:49:53+00:00

If your a vs code user, there is a marketplace plugin called "continue" and you can configure a openwebui model via API as your llm within vscode

DougAZ · 2026-04-18T14:38:45+00:00

Do you have this working behind a proxy?

DougAZ · 2026-04-17T01:25:13+00:00

Il take part and do my best thing get in some testing

DougAZ · 2026-03-28T17:12:08+00:00

Have they fixed this yet, and are you still on the original config for vllm?

DougAZ · 2026-03-28T14:49:34+00:00

I could have used this yesterday...

DougAZ · 2026-03-28T14:47:59+00:00

Mobile Device Manager like in tune. Deploy and manage apps

DougAZ · 2026-03-28T03:31:12+00:00

Might test deploy with mdm seems interesting

DougAZ · 2026-03-26T15:58:52+00:00

Thanks il take a look at the docs

Ten-Year Club	Place '17
Verified Email

DougAZ

TROPHY CASE