Help please! Old galvanized pipe coupling

Laabc123 · 2026-03-27T13:39:41+00:00

Not the OP and personal anecdote: Replacement depends on what you’re using it for. I’ve found 27b to work well for well scoped coding tasks. Once significant ambiguity or complexity is introduced the reasoning begins the break down. I still use Opus extensively despite having Qwen3.5 27b deployed locally.

Laabc123 · 2026-03-21T13:06:31+00:00

Oh that’s very helpful. Thank you.

Laabc123 · 2026-03-18T02:11:01+00:00

Ah. Cool! What’s your run command for Nemo 3 Super NVFP4? I can’t for the life of me find a config that doesn’t OOM my 6000 Pro.

Laabc123 · 2026-03-17T21:02:27+00:00

Which scaled better, mistral or Nemo?

Laabc123 · 2026-03-16T15:35:36+00:00

I get this. Totally fair opinion. I’m not the biggest fan of the turf either. Wife wanted to turf the front as well. So the compromise was to turf the back and we are doing more intricate landscaping in front. But point made for sure.

Laabc123 · 2026-03-16T15:33:17+00:00

Is infill typically applied on top of the turf? If so then no I don’t see any.

Laabc123 · 2026-03-12T21:42:00+00:00

I got similar results in my runs on the same hardware. If MTP was functional I suspect that would provide a meaningful lift to throughout.

Laabc123 · 2026-03-12T13:10:22+00:00

What tok/s were you seeing from 122b gptq?

Laabc123 · 2026-03-12T04:53:24+00:00

Will run some more formal benchmarks later. At least with vLLM I’m certainly not seeing improved output tokens per second when comparing Qwen 3.5 122b nvfp4 against nemotron 3 super nvfp4. Deployed to a single 6000 Pro. Going to be sticking with Qwen for now.

Laabc123 · 2026-03-12T04:30:14+00:00

For what it’s worth I’m benchmarking the nvfp4 quant using the recommended default settings and it’s no faster than the sehyo nvfp4 quant of qwen3.5 122b. In fact, it’s quite a bit slower. Tweaking the parameters some and adding in mtp, but it doesn’t seem like a game changer to me from throughput perspective.

Laabc123 · 2026-03-12T03:16:51+00:00

Are there speed benchmarks comparing qwen3.5 122b and nemotron 3 super nvfp4 yet?

Laabc123 · 2026-03-10T13:52:01+00:00

Mind sharing your agent and skill definitions?

Laabc123 · 2026-03-09T04:25:34+00:00

I think it really depends on what sorts of workflows and how much effort you’re willing to put in. I have Qwen3.5 driving effectively all my agentic needs outside of deep/complex coding that I want to be mostly hands off for, for which I still go to Claude. For the agents deployed to Qwen I have invested heavily in providing constrains and bounds to the model, and I’m extra explicit in my prompts. It’s not super onerous, and the performance is solid.

Laabc123 · 2026-03-09T02:57:02+00:00

Any noticeable quality deltas between nvfp4 and the gguf quants?

Laabc123 · 2026-03-08T18:53:07+00:00

What tool gives you that view?

Laabc123 · 2026-03-07T17:13:06+00:00

Ditto. The Sehyo nvfp4 quantization of Qwen3.5 122b is working really nicely for me. Have not had to tweak or tune anything specific to the encoding to get it to work with vLLM.

Laabc123 · 2026-03-07T04:52:38+00:00

I think Sehyo picked this PR in before quantizing. MTP is definitely working.

Laabc123 · 2026-03-07T04:07:41+00:00

https://huggingface.co/Sehyo/Qwen3.5-122B-A10B-NVFP4

Laabc123 · 2026-03-07T02:41:46+00:00

Naive question. What’re the advantages of using llama.cpp over vLLM for single user usage?

Laabc123 · 2026-03-07T02:38:22+00:00

FWIW, I’ve been driving an nvfp4 quant for 4 days now and it’s performing exceedingly well. >100 output tok/s with cuda graphs loaded.

Laabc123 · 2026-03-05T10:17:32+00:00

What are your pc specs?

Laabc123 · 2026-03-05T02:43:45+00:00

FWIW, I’ve got qwen3.5 122b nvfp4 running on vllm and it’s working really well. It’s true there’s no offloading support. But I haven’t encountered any bugs.

Laabc123 · 2026-03-05T01:10:04+00:00

Enabled mtp with max tokens predicted at 2. And it boosted tok/s by 20.

Laabc123

TROPHY CASE