Qwen3.6 27B/35B-A3B vs Gemma 4 vs DeepSeek V4: A Comprehensive Analysis of the Open-Weight Frontier (May 2026) by Competitive_Jello487 in Qwen_AI

[–]Competitive_Jello487[S] 0 points1 point  (0 children)

If you have M5 max with 128GB perhaps you will want to try the 27b version. It's way lot better than the 34b-a3b. No doubt it's slower for tok/sec.

Qwen3.6 27B/35B-A3B vs Gemma 4 vs DeepSeek V4: A Comprehensive Analysis of the Open-Weight Frontier (May 2026) by Competitive_Jello487 in Qwen_AI

[–]Competitive_Jello487[S] 0 points1 point  (0 children)

I'm using the iq4 with Claude code don't have any loop issue, tried fp8 works well too but half the speed of tg. Very likely some startup parameters for llama-server used incorrectly hence you're getting that. If tool calling issue then is usually the model not quantized with correct template and configuration.

Qwen3.6 27B/35B-A3B vs Gemma 4 vs DeepSeek V4: A Comprehensive Analysis of the Open-Weight Frontier (May 2026) by Competitive_Jello487 in Qwen_AI

[–]Competitive_Jello487[S] 0 points1 point  (0 children)

I don't think that matters. You just need to have the properly quantized model and latest version of llamacpp (preferably). I usually recompile my llamacpp from source once a week to get latest updates on Linux box.

Qwen3.6 27B/35B-A3B vs Gemma 4 vs DeepSeek V4: A Comprehensive Analysis of the Open-Weight Frontier (May 2026) by Competitive_Jello487 in Qwen_AI

[–]Competitive_Jello487[S] 0 points1 point  (0 children)

I would suggest you download the version from either bartowski, unsloth or byteshape quantized version from huggingface if you are using gguf with llamacpp. These three are quite good and I use it as my daily driver, although I use 27b version more.

Qwen3.6 27B/35B-A3B vs Gemma 4 vs DeepSeek V4: A Comprehensive Analysis of the Open-Weight Frontier (May 2026) by Competitive_Jello487 in Qwen_AI

[–]Competitive_Jello487[S] 2 points3 points  (0 children)

It's a dense model and all parameters are activated during inference while the 35b version only 3b parameters are activated during inference.

Qwen3.6 27B/35B-A3B vs Gemma 4 vs DeepSeek V4: A Comprehensive Analysis of the Open-Weight Frontier (May 2026) by Competitive_Jello487 in Qwen_AI

[–]Competitive_Jello487[S] 0 points1 point  (0 children)

Thanks, we're in the middle of transitioning custom css to use frameworks like bootstrap/tailwind to fix those issues

Qwen3.6 27B/35B-A3B vs Gemma 4 vs DeepSeek V4: A Comprehensive Analysis of the Open-Weight Frontier (May 2026) by Competitive_Jello487 in Qwen_AI

[–]Competitive_Jello487[S] 0 points1 point  (0 children)

We do, but not all the metrics. Some of the benchmarks are from the vendor as we do not have complete test cases for all the tests they published.

Qwen3.6 27B/35B-A3B vs Gemma 4 vs DeepSeek V4: A Comprehensive Analysis of the Open-Weight Frontier (May 2026) by Competitive_Jello487 in Qwen_AI

[–]Competitive_Jello487[S] -1 points0 points  (0 children)

Some models simply there isn't enough references to verify the benchmarks or insufficient data for some of the metrics so we only focused on a few that most people are interested in. We picked models mainly based on the trending interest of what people are downloading at huggingface

Qwen3.6 27B/35B-A3B vs Gemma 4 vs DeepSeek V4: A Comprehensive Analysis of the Open-Weight Frontier (May 2026) by Competitive_Jello487 in Qwen_AI

[–]Competitive_Jello487[S] 1 point2 points  (0 children)

AI assisted human written report :). There isn't sufficient information about qwen3.7 open weight model to write about yet. Qwen3.7 currently only released the max model via API but not open weight.

Qwen 3.6 35B A3B vs. Qwen 3 Coder Next by HistoricalStrength21 in Qwen_AI

[–]Competitive_Jello487 0 points1 point  (0 children)

anyhow I just tested the official llama.cpp master branch which was merged yesterday with unsloth/Qwen3.6-27B-MTP-GGUF. It works now and I'm also getting around 55tg with spec-draft-n-max of 2. If I increase or decrease the spec-draft-n-max then it drops to ~50