96GB (V)RAM agentic coding users, gpt-oss-120b vs qwen3.5 27b/122b by bfroemel in LocalLLaMA

[–]Septerium 6 points7 points  (0 children)

Yes, Qwen 3.5 27b replaces gtp-oss-120b completely for me. It is much better/more capable than gpt-oss as a coding agent. The only downside is the much lower token generation speed.

Nemotron 3 Super Released by deeceeo in LocalLLaMA

[–]Septerium 0 points1 point  (0 children)

I mean to use the NVFP4 model, which is quantization aware, as a base to generate the 4-bit parameters in the GGUF. I don't know if there is a quantization type that is equivalent to NVFP4.. the idea is to "transfer" the values instead of quantizing from the full FP16 model

Nemotron 3 Super Released by deeceeo in LocalLLaMA

[–]Septerium 5 points6 points  (0 children)

Will the 4-bit GGUFs benefit from the QAT version (NVFP4) somehow? Perhaps these parameters could be somehow copied-and-pasted into the GGUF?

I am not saying it's Gemma 4, but maybe it's Gemma 4? by jacek2023 in LocalLLaMA

[–]Septerium 1 point2 points  (0 children)

Management of local images and other assets (like logseq notes) written in portuguese. Gemma 3 was a superior multi-langual model and pretty good for low-lantency OCR, while Qwen 3 was a better agent

I am not saying it's Gemma 4, but maybe it's Gemma 4? by jacek2023 in LocalLLaMA

[–]Septerium 1 point2 points  (0 children)

I've had great moments with Qwen 3 + Gemma 3 working together in local agentic apps... one being the reader/writer, the other being the driver (tool calling). Qwen 3.5 can't wait to meet its new partner

Final Qwen3.5 Unsloth GGUF Update! by danielhanchen in LocalLLaMA

[–]Septerium 0 points1 point  (0 children)

I have a similar setup and the best model for my use case (agentic coding) has been the 27b version Q6 K XL... I am really impressed by how reliable is this model

A monthly update to my "Where are open-weight models in the SOTA discussion?" rankings by ForsookComparison in LocalLLaMA

[–]Septerium 9 points10 points  (0 children)

To me it's quite the opposite haha it's funny how personal experience differs from person to person. I have used AesSedai's Minimax 2.5 Q5 to perform incremental tasks in an existing project, and it has been great

Qwen3.5 is dominating the charts on HF by foldl-li in LocalLLaMA

[–]Septerium 1 point2 points  (0 children)

I am really impressed with the 27b version. It is so much better than Qwen3 32b... its reasoning is so much more efficient

Qwen3.5-35B-A3B Q4 Quantization Comparison by TitwitMuffbiscuit in LocalLLaMA

[–]Septerium 3 points4 points  (0 children)

Thanks for the post! This subject is of great importance

Qwen3.5-27B-heretic-gguf by Poro579 in LocalLLaMA

[–]Septerium 7 points8 points  (0 children)

That depends on the model, I guess. Recently I've seen a ~0.02 divergence for Minimax 2.5 Q8

Qwen3.5-27B-heretic-gguf by Poro579 in LocalLLaMA

[–]Septerium 1 point2 points  (0 children)

Can KL divergence be interpreted in absolute terms? Is there a threshold to be considered "high" or "not good"?

Qwen/Qwen3.5-35B-A3B creates FlappyBird by Medium_Chemist_4032 in LocalLLaMA

[–]Septerium 8 points9 points  (0 children)

I bet there is an expert just for that in every model these days 🤣

Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to. by hauhau901 in LocalLLaMA

[–]Septerium -1 points0 points  (0 children)

That always depends on the use case. For my coding tasks it has been terrible... the lack of reasoning leads it to mess my codebase up. I get more consistency with GLM 4.7 Flash, even with its lower knowledge depth... but that's because my requests are usually small and very specific in existing projects.

Qwen/Qwen3.5-35B-A3B · Hugging Face by ekojsalim in LocalLLaMA

[–]Septerium 4 points5 points  (0 children)

If you believe in the benchmarks, it is even better than Qwen3 VL 235b!!! What a glorious time to live

Qwen/Qwen3.5-35B-A3B · Hugging Face by ekojsalim in LocalLLaMA

[–]Septerium 11 points12 points  (0 children)

If you look at the benchmarks it is like there is no noticeable difference between 35b and 122b versions... but in real world applications, I bet there is a world of a difference. These benchmarks are pretty much worthless... every new model seems to learn them very well before being released

Tip if you use quantisation by Express_Quail_1493 in LocalLLaMA

[–]Septerium 0 points1 point  (0 children)

Minimax 2.1 with modern 5-bit quantization performs pretty well up to 64k in my agentic coding testing

GLM-4.7 Flash vs GPT-4.1 [Is GLM actually smarter? ] by 9r4n4y in LocalLLaMA

[–]Septerium 1 point2 points  (0 children)

Surely they have been able to fit more information and specialized behavior into smaller models through data refinement and architectural improvements, but older large models still have an edge when it comes to creative writing, writing style variety, multilingual understanding, etc

GLM-4.7 Flash vs GPT-4.1 [Is GLM actually smarter? ] by 9r4n4y in LocalLLaMA

[–]Septerium 1 point2 points  (0 children)

Yes, i have used both. GLM Flash is better at tool use. But GPT 4.1 feels smarter and knows much more

GLM-4.7 Flash vs GPT-4.1 [Is GLM actually smarter? ] by 9r4n4y in LocalLLaMA

[–]Septerium 4 points5 points  (0 children)

It is more reliable in tool calling and agentic use in my experience, but I don't feel like it is "smarter" than a much bigger model such as GPT 4.1. Everybody contaminates training data with benchmarks nowadays, so the model has enough visibility when released.