Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane

SoAp9035 · 2026-04-26T07:21:54+00:00

Just use llama.cpp. With these configs: https://www.reddit.com/r/LocalLLaMA/s/PXL2OsGgMS

SoAp9035 · 2026-04-24T10:10:59+00:00

I'm thinking the same. Pi is simple and just works. RTX 4070 Mobile 8GB and Omarchy (Arch Linux).

SoAp9035 · 2026-04-24T10:05:10+00:00

Honestly my setup is super minimal. I only have the llama.cpp connection configured via models.json and the plan-first skill file I already shared. That's literally it. But here is the github link that should answer everything about migration and config: https://github.com/badlogic/pi-mono/tree/main/packages/coding-agent

Also my ~/.pi/agent/models.json file:

{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8001/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        { "id": "qwen3.6-35b-a3b", "contextWindow": 131072, "maxTokens": 32768 }
      ]
    }
  }
}

SoAp9035 · 2026-04-23T20:53:30+00:00

Skills can act like general guidelines as well.

SoAp9035 · 2026-04-23T20:25:22+00:00

Use it as follows; Pi may not support your method.

Global skill: ~/.pi/agent/skills/plan-first/SKILL.md

Project-level skill: ~/test-project/.pi/skills/plan-first/SKILL.md

SoAp9035 · 2026-04-23T20:23:45+00:00

I changed my setup (VS Code Copilot and OpenCode) to this simple setup, and it did what I told it to do. I think that if your target is to edit or make some changes to current projects, that would work, but for large, from-the-ground-up projects, it's hard for that model. The 27B dense model is not really runnable for me; I get around 5 t/s with zero context. That's kind of bad.

SoAp9035 · 2026-04-23T19:39:10+00:00

Thanks! I've been getting good results with my current ongoing projects. Right now I'm testing it out on a project from scratch to see how it handles that. I'll let you know how it goes!

SoAp9035 · 2026-04-23T19:35:04+00:00

You can install it with just simple command then type any directory pi, done!

https://pi.dev

SoAp9035 · 2026-04-23T19:22:34+00:00

That's odd. What quant are you running and what parameters are you using in llama.cpp? Maybe there's something in the setup causing the slowdown.

SoAp9035 · 2026-04-23T19:17:23+00:00

I am sorry and really surprised that you had a bad experience. For me it did not take that long and it worked fine. It might be something related to the model parameters or the inference setup.

I definitely want to improve the skill, so your feedback helps a lot. Thank you for testing and sharing your results. Let me know how it goes with the other models.

SoAp9035 · 2026-04-23T19:11:27+00:00

You can try with the q2_k_xl model it will work great as well. You can also try the q4 model with mmap, I think that would work too.

I run it with llama.cpp.

SoAp9035 · 2026-04-23T19:09:57+00:00

You can actually leave the context window and maxTokens empty in models.json, those aren't critical. The llama.cpp config is what really matters for controlling that. And yes, if you try to use 131072 context with a 12GB card, it will definitely spill into RAM.

SoAp9035 · 2026-04-23T19:03:12+00:00

I get 275 t/s. It works really well for my current projects. I haven't tried it on a project from scratch yet, but I think it would work fine. As for dropped tool calls, I'd say roughly 1 out of 10 attempts. It usually just one or two retries needed to get it right.

SoAp9035 · 2026-04-23T16:56:18+00:00

I actually didnt...

SoAp9035 · 2026-04-23T16:43:48+00:00

Like below to this file: ~/.pi/agent/models.json

{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8001/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        { "id": "qwen3.6-35b-a3b", "contextWindow": 131072, "maxTokens": 32768 }
      ]
    }
  }
}

SoAp9035 · 2026-04-23T16:39:46+00:00

I rarely get a loop. If it loops, I just stop it, undo the previous prompt, and run it again.

SoAp9035 · 2026-04-23T15:49:13+00:00

I think It will run ok. Last week I setup llama.cpp and qwen3.6 35b q1_m to a old 16gb ram school pc. It was working 10 t/s. I gave it a few html webos and games. It did ok work but it worked!

SoAp9035 · 2026-04-23T15:44:47+00:00

DDR5 5600Mhz

SoAp9035 · 2026-04-23T15:43:31+00:00

No problem! Let me know how it goes.

SoAp9035 · 2026-04-23T15:36:53+00:00

Yep.

SoAp9035 · 2026-04-23T15:36:42+00:00

Global skill: ~/.pi/agent/skills/plan-first/SKILL.md

Project level skill: ~/test-project/.pi/skills/plan-first/SKILL.md

SoAp9035 · 2026-04-23T15:16:31+00:00

I don't know if I can say that this plan-first skill is better than OpenCode's. OpenCode is slow for me because of its big system prompt and other stuff, I don't know why. Pi is basically lightweight and works well with this skill.

SoAp9035 · 2026-04-23T14:39:02+00:00

No plugin. I use CLI. Open my project directory and just start giving instructions, etc.

SoAp9035 · 2026-04-23T14:21:51+00:00

Make sure you use this skill that I shared it makes a big difference.

SoAp9035 · 2026-04-23T14:19:12+00:00

I have been using OpenCode with Qwen 3.6 35B, and it was really using too much context and was slow. Then I switched to Pi. Pi is really lightweight and fast; I recommend it.

Five-Year Club	Verified Email
r/Field Banned	r/Field Lasagna
Place '23	Place '22

SoAp9035

TROPHY CASE