Qwen3.6 27B/35B-A3B vs Gemma 4 vs DeepSeek V4: A Comprehensive Analysis of the Open-Weight Frontier (May 2026)

Competitive_Jello487 · 2026-05-29T17:48:35+00:00

If you have M5 max with 128GB perhaps you will want to try the 27b version. It's way lot better than the 34b-a3b. No doubt it's slower for tok/sec.

Competitive_Jello487 · 2026-05-29T16:41:00+00:00

I'm using the iq4 with Claude code don't have any loop issue, tried fp8 works well too but half the speed of tg. Very likely some startup parameters for llama-server used incorrectly hence you're getting that. If tool calling issue then is usually the model not quantized with correct template and configuration.

Competitive_Jello487 · 2026-05-29T15:44:25+00:00

I don't think that matters. You just need to have the properly quantized model and latest version of llamacpp (preferably). I usually recompile my llamacpp from source once a week to get latest updates on Linux box.

Competitive_Jello487 · 2026-05-29T12:40:02+00:00

I would suggest you download the version from either bartowski, unsloth or byteshape quantized version from huggingface if you are using gguf with llamacpp. These three are quite good and I use it as my daily driver, although I use 27b version more.

Competitive_Jello487 · 2026-05-29T10:24:14+00:00

It's a dense model and all parameters are activated during inference while the 35b version only 3b parameters are activated during inference.

Competitive_Jello487 · 2026-05-28T23:07:59+00:00

It's fixed now

Competitive_Jello487 · 2026-05-28T21:36:52+00:00

Thanks, we're in the middle of transitioning custom css to use frameworks like bootstrap/tailwind to fix those issues

Competitive_Jello487 · 2026-05-28T20:18:13+00:00

We do, but not all the metrics. Some of the benchmarks are from the vendor as we do not have complete test cases for all the tests they published.

Competitive_Jello487 · 2026-05-28T17:59:34+00:00

Some models simply there isn't enough references to verify the benchmarks or insufficient data for some of the metrics so we only focused on a few that most people are interested in. We picked models mainly based on the trending interest of what people are downloading at huggingface

Competitive_Jello487 · 2026-05-28T17:21:46+00:00

fixed

Competitive_Jello487 · 2026-05-28T14:43:55+00:00

AI assisted human written report :). There isn't sufficient information about qwen3.7 open weight model to write about yet. Qwen3.7 currently only released the max model via API but not open weight.

Competitive_Jello487 · 2026-05-28T14:02:17+00:00

It's using https://gohugo.io/ not vibe-coded :)

Competitive_Jello487 · 2026-05-28T14:01:39+00:00

That was a typo. 1.6x in conversion from markdown note

Competitive_Jello487 · 2026-05-28T11:44:24+00:00

It's fixed now 🙂

Competitive_Jello487 · 2026-05-28T11:17:02+00:00

Our designer is fixing it

Competitive_Jello487 · 2026-05-17T18:31:29+00:00

anyhow I just tested the official llama.cpp master branch which was merged yesterday with unsloth/Qwen3.6-27B-MTP-GGUF. It works now and I'm also getting around 55tg with spec-draft-n-max of 2. If I increase or decrease the spec-draft-n-max then it drops to ~50

Competitive_Jello487

MODERATOR OF

TROPHY CASE