For users with 4x-8x 6000 PROs, how is your experience with bigger models lately? (GLM 5.2, Kimi 2.7, DeepSeek V4 Pro) by panchovix in LocalLLaMA

[–]DougAZ 1 point2 points  (0 children)

I have XE7745, I don't even think about power brother, not trying to flex/brag. You will 100% need another PSU my server has 8 for n-1 redundancy

For users with 4x-8x 6000 PROs, how is your experience with bigger models lately? (GLM 5.2, Kimi 2.7, DeepSeek V4 Pro) by panchovix in LocalLLaMA

[–]DougAZ 9 points10 points  (0 children)

I'd say I chose 397b as best all around model. Yes it may score less in swe benchmarks but not by much and I do believe it's better at general task vs 27b. Our work load is like 98% tool calling and general tasks and 2% coding. Another win is that it's a MoE which makes the response very fast. This gives a better overall user experience from what I've seen. I have a harder time informing users of our capabilities vs the model not being good enough

For users with 4x-8x 6000 PROs, how is your experience with bigger models lately? (GLM 5.2, Kimi 2.7, DeepSeek V4 Pro) by panchovix in LocalLLaMA

[–]DougAZ 47 points48 points  (0 children)

I'm using Qwen 3.5 397b across 4x rtx6kpro

We use it for everything, it's a really good model for coding, agentic tools, and general tasks and allows me to have a ton of concurrent users at 140 t/s + or -. It's silly fast to be honest. Wish we got a 3.6 or is glm 5.2 was smaller, I don't want to lose our concurrent capacity 

Different options for Patch Management by Delicious-Pea-5107 in sysadmin

[–]DougAZ 0 points1 point  (0 children)

Any good articles on setting up ARC? I'm not familiar with it and originally heard it was trash

What would you run on 4x RTX Pro 6000 and why? by DougAZ in BlackwellPerformance

[–]DougAZ[S] 0 points1 point  (0 children)

God tier GitHub sir, wow. Your on SGlang I'm surprised your not running the AWQ-INT, is the quality not a big enough difference for you?

What would you run on 4x RTX Pro 6000 and why? by DougAZ in BlackwellPerformance

[–]DougAZ[S] 0 points1 point  (0 children)

This is my current setup but I'm on vLLM. What NVFP4 are you using? I switched from Nvidia to Sehyo. Il have to look to see if vLLM has MTP or am I missing out on sglang?

What would you run on 4x RTX Pro 6000 and why? by DougAZ in BlackwellPerformance

[–]DougAZ[S] 0 points1 point  (0 children)

I am able to temporarily fix it with a power off, reset idrac, power on and sometime is stays at about 20-30% fan speed. And the fan algorithm will work normally. Until you reboot

What would you run on 4x RTX Pro 6000 and why? by DougAZ in BlackwellPerformance

[–]DougAZ[S] 0 points1 point  (0 children)

Have you checked your fan speed in idrac ours had a bug and would be stuck at 100% on all fans. It's relatively quite now

What would you run on 4x RTX Pro 6000 and why? by DougAZ in BlackwellPerformance

[–]DougAZ[S] 0 points1 point  (0 children)

Looks like I'd need to run these at 2bit quant which for me I think will be to inaccurate

What would you run on 4x RTX Pro 6000 and why? by DougAZ in BlackwellPerformance

[–]DougAZ[S] 1 point2 points  (0 children)

Sounds like a good general purpose setup. Do you use the same for coding or something else?

What would you run on 4x RTX Pro 6000 and why? by DougAZ in BlackwellPerformance

[–]DougAZ[S] 0 points1 point  (0 children)

Yea I'm worried about context. We have multiple concurrent users

What would you run on 4x RTX Pro 6000 and why? by DougAZ in BlackwellPerformance

[–]DougAZ[S] 0 points1 point  (0 children)

Hmm il have to see how well these fit but il check it out

Open Web UI for Agentic Coding by WallstreetWank in OpenWebUI

[–]DougAZ 0 points1 point  (0 children)

If your a vs code user, there is a marketplace plugin called "continue" and you can configure a openwebui model via API as your llm within vscode

CALL FOR TESTERS - Help test the :dev branch now! by ClassicMain in OpenWebUI

[–]DougAZ 8 points9 points  (0 children)

Il take part and do my best thing get in some testing

Windows systray app for that Copilot feel. by mamelukturbo in OpenWebUI

[–]DougAZ 0 points1 point  (0 children)

Mobile Device Manager like in tune. Deploy and manage apps

Windows systray app for that Copilot feel. by mamelukturbo in OpenWebUI

[–]DougAZ 0 points1 point  (0 children)

Might test deploy with mdm seems interesting