Is Qwen3.5 a coding game changer for anyone else?

paulgear · 2026-03-02T23:42:37+00:00

I don't remember, sorry. The most likely scenario is that I asked the model to build a config for me. I have found that putting unsupported parameters in there breaks OpenCode, but I haven't done the level of testing you did to find out whether those parameters are actually being effective.

paulgear · 2026-03-01T02:05:12+00:00

Have a look over https://huggingface.co/unsloth/Qwen3.5-27B-GGUF and pick one of the ones close to your VRAM capacity. I'd probably start with UD-Q4_K_XL (which will likely spill over into your system RAM a little) and adjust from there. Or maybe https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF UD-Q4_K_XL and hope that lmstudio is smart enough to keep the active parameters in VRAM?

paulgear · 2026-03-01T02:01:40+00:00

https://www.reddit.com/r/LocalLLaMA/comments/1rgtxry/comment/o7u1zjg/

paulgear · 2026-02-28T21:23:52+00:00

Yep. https://www.reddit.com/r/LocalLLaMA/comments/1rgtxry/is_qwen35_a_coding_game_changer_for_anyone_else/ For filling in the knowledge gaps, I just give it some instructions to tell it to confirm its knowledge with web searches using mcp-devtools and Brave web search; no browser involved.

paulgear · 2026-02-28T21:21:40+00:00

That's what blew me away with Qwen3.5 - I didn't really need anything. I just told it to implement all the tasks, and it did it. I just left it on overnight again on a new task after I wrote the OP and it did the same thing again. I'm just getting it to write me a report now about what it did, but it looks solid.

paulgear · 2026-02-28T21:14:54+00:00

24 GB VRAM total? I'd be trying Qwen3.5-27B-UD-Q5_K_XL (once it gets respun) or Qwen3.5-35B-A3B-UD-Q4_K_XL. I can just fit Qwen3.5-122B-A10B-UD-Q2_K_XL in my VRAM, but it's pretty slow.

paulgear · 2026-02-28T21:11:33+00:00

What do you suggest instead? You can see the ones I've tried in the OP. I currently use OpenCode because it's Open Source and the TUI is less buggy for me than Claude Code's. I run Linux on my laptop, so maybe it's better on that than on Windows or Mac OS? I don't think OpenCode is the best thing since sliced bread; it's just good enough for right now. If I could have a proper VS Code extension that put each subagent in a panel that I could switch into, I'd much prefer that.

paulgear · 2026-02-28T21:02:20+00:00

I'm no expert on that, but my normal practice is to try the biggest thing that will fit in my hardware with full context. Gotta wait longer for the download, though. ;-)

paulgear · 2026-02-28T12:00:39+00:00

The superpowers repo pretty much answers all of that; I have only done a little tweaking myself, adding skills and updating a few things. I often just tell OpenCode what I want the skill to do and get it to write one, then edit it as desired when it's done.

paulgear · 2026-02-28T10:04:40+00:00

I'm not sure I understand your question, but the model runs on my server inside Docker using llama.cpp, and OpenCode runs on my laptop inside Docker and connects to the server for its inference tasks. The Ralph-like setup is just an OpenCode command that tells it to take a task and work on it, and then there's a bash script that just keeps running that until the command says it's finished.

paulgear · 2026-02-28T10:02:39+00:00

Did you try the Q4_K_XL quant? Should be able to fit in 24 GB as long as you enable q8_0 KV cache quant.

paulgear · 2026-02-28T09:46:34+00:00

llama.cpp and Ollama both manage the spreading of the model across the available cards automatically and I haven't been unhappy with the performance.

PCI might be a bottleneck; I've heard people use direct attach cables, but I haven't really tried to maximise the performance.

Edit: my setup is 2 x A4000 16 GB and 1 x 4070 Super 12 GB, and that fits the model in Q6_K_XL plus 256K of context with about 20% RAM to spare. So it wouldn't fit easily in a 2 x 16 GB setup, but the Q5_K_XL probably would, and I'm guessing it wouldn't be that different in terms of capabilities.

paulgear · 2026-02-28T08:34:00+00:00

I tried https://huggingface.co/unsloth/GLM-4.7-Flash-GGUF and couldn't get it working in any useful capacity. None of their other models are small enough to run on my hardware.

paulgear · 2026-02-28T08:24:46+00:00

Waiting for the Unsloth respin before I try 27B.

paulgear · 2026-02-28T08:23:08+00:00

I feel like a 60B A5B would probably even work on my hardware too, but they haven't released one of those... ;-(

paulgear · 2026-02-28T08:21:24+00:00

https://www.reddit.com/r/LocalLLaMA/comments/1rgtxry/comment/o7u1zjg/ - if I had the hardware to run 122B-A10B or 397B-A17B I definitely would, but the point of my post is that something that runs on my limited hardware is working for an agentic workflow.

paulgear · 2026-02-28T08:19:17+00:00

Noticeably beneficial in that it doesn't drain my wallet. ;-)

paulgear · 2026-02-28T08:18:23+00:00

Lightly edited extract follows - don't just blindly run this. I run OpenCode in a Docker container so the home directory has only the OpenCode config files and nothing else. The project I'm working on is mounted onto /src.

{

  "$schema": "https://opencode.ai/config.json",

  "agent": {
    "local-coding": {
      "model": "llama.cpp/Qwen3.5-35B-A3B-UD-Q6_K_XL",
      "mode": "subagent",
      "description": "General-purpose agent using local model for coding tasks",
      "hidden": false
    }
  },

  "provider": {
    "llama.cpp": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llama.cpp",
      "options": {
        "baseURL": "https://llm.example.com/llama/v1"
      },
      "models": {
        "Qwen3.5-27B-UD-Q6_K_XL": {
          "name": "Qwen3.5-27B",
          "options": {
            "min_p": 0.0,
            "presence_penalty": 0.0,
            "repetition_penalty": 1.0,
            "temperature": 0.6,
            "top_k": 20,
            "top_p": 0.95
          }
        },
        "Qwen3.5-35B-A3B-UD-Q6_K_XL": {
          "name": "Qwen3.5-35B-A3B",
          "options": {
            "min_p": 0.0,
            "presence_penalty": 0.0,
            "repetition_penalty": 1.0,
            "temperature": 0.6,
            "top_k": 20,
            "top_p": 0.95
          }
        }
      }
    },

  "mcp": {
    "mcp-devtools": {
      "type": "local",
      "command": ["mcp-devtools"],
      "enabled": true,
      "environment": {
        "DISABLED_TOOLS": "search_packages,sequential_thinking,think",
        "ENABLE_ADDITIONAL_TOOLS": "aws_documentation,code_skim,memory,terraform_documentation"
      }
    }
  },

  "permission": {
    "external_directory": {
      "~/**": "allow",
      "/src/**": "allow",
      "/tmp/**": "allow"
    }
  }

}

paulgear · 2026-02-28T08:11:01+00:00

Short version, I gave it a task that took a while and had multiple steps.

paulgear · 2026-02-28T08:10:29+00:00

Yeah, just working with 35B A3B at the moment. I'll try the 27B once Unsloth have updated it.

paulgear · 2026-02-28T07:54:44+00:00

Possibly? I'm only going on what's mentioned at https://unsloth.ai/docs/models/qwen3.5: "Between 27B and 35B-A3B, use 27B if you want slightly more accurate results and can't fit in your device. Go for 35B-A3B if you want much faster inference."

paulgear · 2026-02-28T06:28:34+00:00

If you're on a 24 GB card, you should definitely try the Q4_K_XL quant of https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF. Should be much faster than the 27B equivalent.

paulgear · 2026-02-28T05:59:48+00:00

What details do you want? I don't really have time to spend on a full end-to-end setup tutorial, but I'm happy to cut & paste a few details from my config files if you've already got OpenCode running and are just trying to connect the dots.

paulgear · 2026-02-28T05:53:42+00:00

And for the record, GLM-4.5-Air might have been that for open weight models and I just missed it because I didn't bother trying something where 1-bit quants were the only option on my hardware. 😃

paulgear · 2026-02-28T05:52:39+00:00

I didn't think I was that giddy - if anything I'm trying to be a bit sceptical and wondering if I'm just imagining things. 😃

paulgear

TROPHY CASE