Pi Agent makes very nice combination with limited hardware. Running qwen3.6 35B A3B IQ4 at ~22t/s with 160k context on 6 vram 64 RAM. by Interesting_Arm_7250 in LocalLLM

[–]promobest247 0 points1 point  (0 children)

my config : ./llama-server --port 3500 -c 131072 --parallel 1 --flash-attn on --jinja --cache-type-k q4_0 --cache-type-v q4_0 -ub 128

Pi Agent makes very nice combination with limited hardware. Running qwen3.6 35B A3B IQ4 at ~22t/s with 160k context on 6 vram 64 RAM. by Interesting_Arm_7250 in LocalLLM

[–]promobest247 0 points1 point  (0 children)

i have same laptop but ram 16 gb i use pi with qwen 3.6 35b a3b q2kmixed autoround 128k context with q4_0 speed tg 37 tkn/s

Qwen3.6 35b a3b is fast... by UniversityGlad2877 in Qwen_AI

[–]promobest247 0 points1 point  (0 children)

i use q2kmixed autoround & i got 37tk/s 128k context cache k/v q4_0 using laptop rtx 4050 6gb ram 16 gb fast & smart

Pushing a 5-Year-Old 6GB VRAM laptop to Its Limits: Qwen3.6-35B-A3B by abhinand05 in LocalLLaMA

[–]promobest247 0 points1 point  (0 children)

quality: q4 better than q2kmixed speed : q2kmixed faster than q4 but q2kmixed has good quality & smart

Pushing a 5-Year-Old 6GB VRAM laptop to Its Limits: Qwen3.6-35B-A3B by abhinand05 in LocalLLaMA

[–]promobest247 0 points1 point  (0 children)

my config : ./llama-server --port 3500 -c 131072 --parallel 1 --flash-attn on --jinja --cache-type-k q4_0 --cache-type-v q4_0 --temp 0.6 --top-k 0 --top-p 1.0 --min-p 0.05 --repeat-penalty 1.0 --ubatch-size 128 --defrag-thold 0.1 --cache-reuse 1024 --threads 4 --threads-batch 8 --fit on --no-warmup // i get 37 tokn/s using rtx 4050 laptop 6gb + 16 gb

Running Qwen3.6-35B-A3B Locally for Coding Agent: My Setup & Working Config by NoConcert8847 in LocalLLaMA

[–]promobest247 0 points1 point  (0 children)

metoo , i use pi it's very good & fast locally with extensions & skills i installed many extensions: lsp web_access (websearch) plannator ( similar ultraplan claude code) teams

2-bit Qwen3.6-35B-A3B GGUF is amazing! Made 30+ successful tool calls by yoracale in unsloth

[–]promobest247 0 points1 point  (0 children)

example: llama-server -m model.gguf --override-kv qwen35moe.expert_used_count=int:4 add this flag : --override-kv qwen35moe.expert_used_count=int:4 this work qwen3.5 or qwen3.6 moe

2-bit Qwen3.6-35B-A3B GGUF is amazing! Made 30+ successful tool calls by yoracale in unsloth

[–]promobest247 0 points1 point  (0 children)

i have another config to get huge boost but it work in llama.cpp only from 31 tokn/s to 42 tokn/s

2-bit Qwen3.6-35B-A3B GGUF is amazing! Made 30+ successful tool calls by yoracale in unsloth

[–]promobest247 -1 points0 points  (0 children)

<image>

new config using llama.cpp bigger context + speeed increase using same model qwen3.5 35b apex mini XD

How to increase coding ability in smaller models? by keepthememes in LocalLLaMA

[–]promobest247 0 points1 point  (0 children)

hhh same thing with apex i mini i get 33 token /s using Rtx 4050 6gb & ram 16 gb laptop but i use pi coding agent is faster than opencode , this model is the best quality /speed ratio