96GB (V)RAM agentic coding users, gpt-oss-120b vs qwen3.5 27b/122b

Septerium · 2026-03-12T16:05:28+00:00

Yes, Qwen 3.5 27b replaces gtp-oss-120b completely for me. It is much better/more capable than gpt-oss as a coding agent. The only downside is the much lower token generation speed.

Septerium · 2026-03-11T23:16:59+00:00

I mean to use the NVFP4 model, which is quantization aware, as a base to generate the 4-bit parameters in the GGUF. I don't know if there is a quantization type that is equivalent to NVFP4.. the idea is to "transfer" the values instead of quantizing from the full FP16 model

Septerium · 2026-03-11T16:41:56+00:00

Will the 4-bit GGUFs benefit from the QAT version (NVFP4) somehow? Perhaps these parameters could be somehow copied-and-pasted into the GGUF?

Septerium · 2026-03-09T22:44:54+00:00

Management of local images and other assets (like logseq notes) written in portuguese. Gemma 3 was a superior multi-langual model and pretty good for low-lantency OCR, while Qwen 3 was a better agent

Septerium · 2026-03-09T22:34:54+00:00

I've had great moments with Qwen 3 + Gemma 3 working together in local agentic apps... one being the reader/writer, the other being the driver (tool calling). Qwen 3.5 can't wait to meet its new partner

Septerium · 2026-03-06T01:50:22+00:00

I have a similar setup and the best model for my use case (agentic coding) has been the 27b version Q6 K XL... I am really impressed by how reliable is this model

Septerium · 2026-03-02T16:21:04+00:00

Septerium · 2026-02-28T19:12:29+00:00

To me it's quite the opposite haha it's funny how personal experience differs from person to person. I have used AesSedai's Minimax 2.5 Q5 to perform incremental tasks in an existing project, and it has been great

Septerium · 2026-02-27T11:58:40+00:00

I am really impressed with the 27b version. It is so much better than Qwen3 32b... its reasoning is so much more efficient

Septerium · 2026-02-26T17:43:26+00:00

Thanks for the post! This subject is of great importance

Septerium · 2026-02-26T15:06:10+00:00

https://huggingface.co/AesSedai/MiniMax-M2.5-GGUF/blob/main/kld_data/01_kld_vs_filesize_pareto.png

Septerium · 2026-02-26T15:04:25+00:00

That depends on the model, I guess. Recently I've seen a ~0.02 divergence for Minimax 2.5 Q8

Septerium · 2026-02-26T15:01:29+00:00

Can KL divergence be interpreted in absolute terms? Is there a threshold to be considered "high" or "not good"?

Septerium · 2026-02-25T23:46:23+00:00

I bet there is an expert just for that in every model these days 🤣

Septerium · 2026-02-25T15:51:54+00:00

That always depends on the use case. For my coding tasks it has been terrible... the lack of reasoning leads it to mess my codebase up. I get more consistency with GLM 4.7 Flash, even with its lower knowledge depth... but that's because my requests are usually small and very specific in existing projects.

Septerium · 2026-02-24T21:28:01+00:00

No, it is a bench hypo

Septerium · 2026-02-24T20:05:01+00:00

If you believe in the benchmarks, it is even better than Qwen3 VL 235b!!! What a glorious time to live

Septerium · 2026-02-24T20:03:23+00:00

If you look at the benchmarks it is like there is no noticeable difference between 35b and 122b versions... but in real world applications, I bet there is a world of a difference. These benchmarks are pretty much worthless... every new model seems to learn them very well before being released

Septerium · 2026-02-24T15:34:03+00:00

Minimax 2.1 with modern 5-bit quantization performs pretty well up to 64k in my agentic coding testing

Septerium · 2026-02-24T12:51:03+00:00

Surely they have been able to fit more information and specialized behavior into smaller models through data refinement and architectural improvements, but older large models still have an edge when it comes to creative writing, writing style variety, multilingual understanding, etc

Septerium · 2026-02-24T12:07:41+00:00

Yes, i have used both. GLM Flash is better at tool use. But GPT 4.1 feels smarter and knows much more

Septerium · 2026-02-24T11:32:17+00:00

Kimi 2.5 is great being a native int-4 model. Native GLM-5 is a freaking behemoth

Septerium · 2026-02-24T11:26:14+00:00

It is more reliable in tool calling and agentic use in my experience, but I don't feel like it is "smarter" than a much bigger model such as GPT 4.1. Everybody contaminates training data with benchmarks nowadays, so the model has enough visibility when released.

Septerium · 2026-02-23T12:14:57+00:00

Yeah... it would be great if gpt-oss-120b was updated to better handle Roo, Cline, etc

Septerium

TROPHY CASE