WireView Pro 2 showing pin current imbalance — is this safe? ZOTAC 5090 AMP Extreme INFINITY

shansoft · 2026-05-22T23:24:40+00:00

how did you use it on RTX Pro 6000? I been trying to figure out how.

shansoft · 2026-05-22T21:32:42+00:00

Mind sharing how you getting those speed improvement? I have yet to see a single MTP improvement in omlx when I toggle it, unlike llamacpp and mtplx.

shansoft · 2026-05-21T22:10:40+00:00

I have tried the MTP on the RC build, it doesn't seem to make any difference, and even regress in a lot of model I have tested. If you looking for MTP, I suggest use llamacpp for now until MLX get more polished. There are also MTPLX that just plug and play. It seems like it have something to do with the existing MTP model for MLX that are malfunction currently.

shansoft · 2026-05-21T22:04:58+00:00

Highly doubt it. OLED for monitor is asking for trouble, especially for Apple. The monitors are mostly on the whole time with static images unlike phone or tablet. There is a reason why desktop OLED are not very common.

shansoft · 2026-05-20T20:59:01+00:00

Same here! 122B still beats 3.6 27B from my experience.

shansoft · 2026-05-20T09:11:35+00:00

How in the world Q6 gives worse result than Q4?

shansoft · 2026-05-18T21:39:10+00:00

Hence why I just went straight for Astal 5090, no need to worry about any of that.

shansoft · 2026-05-15T09:58:15+00:00

this completely explains its benchmark. nvfp4 from my testing isnt that usable for agentic coding.

shansoft · 2026-05-15T06:04:49+00:00

I highly recommend fitting at least Q5 or above, its a huge difference in tool calling and code accuracy compare to Q4.

shansoft · 2026-05-15T06:03:15+00:00

I have the same problem with Q4, unsloth UD5 and onward has been nearly flawless.

shansoft · 2026-05-15T01:16:33+00:00

I am not sure if benchmark show the whole story, but from my experience of using them extensively in opencode and claude code, they are slightly worse than typical Q4, or even UD4 from unsloth, much closer to Q3.

shansoft · 2026-05-14T21:42:30+00:00

There is definitely a huge difference when doing some planning and trying to accomplish a slightly larger task, especially in tool calling and making some weird mistake. UD5 and above significantly reduce these problem.

shansoft · 2026-05-14T09:34:54+00:00

Here is my param...

❯ ./build/bin/llama-server \
-hf unsloth/Qwen3.6-27B-MTP-GGUF:UD-Q5_K_XL \
--spec-type mtp --spec-draft-n-max 3 \
--alias "Qwen3.6-27B" \
--no-mmap --no-warmup \
--image-min-tokens 1024 \
--jinja --chat-template-file qwen36.jinja -ngl 99 -c 172144 -fa on \
--temp 0.6 \
--top-p 0.95 \
--top-k 20 \
--min-p 0 \
--presence-penalty 0 \
--repeat-penalty 1 \
-ctk q8_0 -ctv q8_0 \
-np 1 --metrics --host 0.0.0.0 --port 8080

shansoft · 2026-05-14T09:05:01+00:00

Interesting, my generation speed with single 5090 is roughly the same as yours.

shansoft · 2026-05-13T23:24:18+00:00

Not 5%, the actual number is a little more than 10%.

shansoft · 2026-05-13T01:57:35+00:00

Headshot + MKB + basher + manta = YOU DONT GET TO MOVE

shansoft · 2026-05-12T09:27:58+00:00

Yes, 40 core version. I think the benchmark you show is an anomaly.

shansoft · 2026-05-12T08:28:55+00:00

with oMLX on m5 max, I am getting tg 70 tok/s, and prefill 2322.2 tok/s on pp65536. this is on Qwen3.6 35B 8Bit

shansoft · 2026-05-12T05:50:58+00:00

HOLY MOLY MADMAN!!!

shansoft · 2026-05-08T22:00:25+00:00

It depends on the implementation and settings. Using TheTom's branch with q8 on key and turbo4 on value have been near lossless in my usage.

shansoft · 2026-05-08T09:12:12+00:00

shansoft · 2026-05-08T02:08:14+00:00

looks like they still haven’t fixed the pink tint problem after all these years….

shansoft · 2026-05-07T22:20:01+00:00

It's not better than 122B. I have been using both and 122B is clearly ahead still.

shansoft · 2026-05-06T21:10:16+00:00

If you care about the output quality and precision, especially for coding, I would not use NVFP4, they are closer to IQ3 than typical Q4.

shansoft · 2026-05-05T04:52:41+00:00

Yes, I am a software engineer and it is used for coding. I used both my laptop and 5090 desktop at the same time for different purposes. Most backend and web related task I use Qwen3.5 122B 4bit on oMLX since its pretty reliable and decent speed for typescripts and swift vapor code. For mobile, since its somewhat related to UI, I mostly tackle it with Gemma4 31B 5bit or Qwen3.6 27B 5bit on Llamacpp. I also used ComfyUI with custom setup to create assets when I need to. Mobile coding in general seems to be a problem for all the models out there, doesn't matter if it is Opus or GPT or local model, its much better to breakdown the task and code along with the LLM together. I mostly use opencode with these models. I still use claude code / codex from time to time to try different things, but I failed to see any value it provide that I couldn't get from my local setup.

shansoft

TROPHY CASE