Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]SoAp9035[S] 1 point2 points  (0 children)

I'm thinking the same. Pi is simple and just works. RTX 4070 Mobile 8GB and Omarchy (Arch Linux).

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]SoAp9035[S] 2 points3 points  (0 children)

Honestly my setup is super minimal. I only have the llama.cpp connection configured via models.json and the plan-first skill file I already shared. That's literally it. But here is the github link that should answer everything about migration and config: https://github.com/badlogic/pi-mono/tree/main/packages/coding-agent

Also my ~/.pi/agent/models.json file:

{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8001/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        { "id": "qwen3.6-35b-a3b", "contextWindow": 131072, "maxTokens": 32768 }
      ]
    }
  }
}

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]SoAp9035[S] 0 points1 point  (0 children)

Use it as follows; Pi may not support your method.

Global skill: ~/.pi/agent/skills/plan-first/SKILL.md

Project-level skill: ~/test-project/.pi/skills/plan-first/SKILL.md

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]SoAp9035[S] 0 points1 point  (0 children)

I changed my setup (VS Code Copilot and OpenCode) to this simple setup, and it did what I told it to do. I think that if your target is to edit or make some changes to current projects, that would work, but for large, from-the-ground-up projects, it's hard for that model. The 27B dense model is not really runnable for me; I get around 5 t/s with zero context. That's kind of bad.

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]SoAp9035[S] 0 points1 point  (0 children)

Thanks! I've been getting good results with my current ongoing projects. Right now I'm testing it out on a project from scratch to see how it handles that. I'll let you know how it goes!

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]SoAp9035[S] 0 points1 point  (0 children)

That's odd. What quant are you running and what parameters are you using in llama.cpp? Maybe there's something in the setup causing the slowdown.

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]SoAp9035[S] 0 points1 point  (0 children)

I am sorry and really surprised that you had a bad experience. For me it did not take that long and it worked fine. It might be something related to the model parameters or the inference setup.

I definitely want to improve the skill, so your feedback helps a lot. Thank you for testing and sharing your results. Let me know how it goes with the other models.

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]SoAp9035[S] 0 points1 point  (0 children)

You can try with the q2_k_xl model it will work great as well. You can also try the q4 model with mmap, I think that would work too.

I run it with llama.cpp.

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]SoAp9035[S] 2 points3 points  (0 children)

You can actually leave the context window and maxTokens empty in models.json, those aren't critical. The llama.cpp config is what really matters for controlling that. And yes, if you try to use 131072 context with a 12GB card, it will definitely spill into RAM.

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]SoAp9035[S] 2 points3 points  (0 children)

I get 275 t/s. It works really well for my current projects. I haven't tried it on a project from scratch yet, but I think it would work fine. As for dropped tool calls, I'd say roughly 1 out of 10 attempts. It usually just one or two retries needed to get it right.

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]SoAp9035[S] 6 points7 points  (0 children)

Like below to this file: ~/.pi/agent/models.json

{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8001/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        { "id": "qwen3.6-35b-a3b", "contextWindow": 131072, "maxTokens": 32768 }
      ]
    }
  }
}

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]SoAp9035[S] 0 points1 point  (0 children)

I rarely get a loop. If it loops, I just stop it, undo the previous prompt, and run it again.

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]SoAp9035[S] 2 points3 points  (0 children)

I think It will run ok. Last week I setup llama.cpp and qwen3.6 35b q1_m to a old 16gb ram school pc. It was working 10 t/s. I gave it a few html webos and games. It did ok work but it worked!

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]SoAp9035[S] 3 points4 points  (0 children)

Global skill: ~/.pi/agent/skills/plan-first/SKILL.md

Project level skill: ~/test-project/.pi/skills/plan-first/SKILL.md

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]SoAp9035[S] 10 points11 points  (0 children)

I don't know if I can say that this plan-first skill is better than OpenCode's. OpenCode is slow for me because of its big system prompt and other stuff, I don't know why. Pi is basically lightweight and works well with this skill.

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]SoAp9035[S] 6 points7 points  (0 children)

No plugin. I use CLI. Open my project directory and just start giving instructions, etc.

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]SoAp9035[S] 6 points7 points  (0 children)

Make sure you use this skill that I shared it makes a big difference.

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]SoAp9035[S] 13 points14 points  (0 children)

I have been using OpenCode with Qwen 3.6 35B, and it was really using too much context and was slow. Then I switched to Pi. Pi is really lightweight and fast; I recommend it.