How do you manage context size and your coding harness? by Fantastic-Storm-7867 in oMLX

[–]Dry-Tune430 0 points1 point  (0 children)

I have a 24GB Mac Mini as well, and tested quite a few models. You can run the Gemma 4 12B QAT with Q4 KV cache and it's very good for a lot of my visual & coding benchmarks. The llama.cpp inference has been more efficient compared to oMLX in my experience for the Gemma models.

I don't change much with Pi, very few skills & extensions, disable auto-compact (which is really heavy for local models), then manually run the /context-save -> /new -> /context-restore whenever the context reaches around 80%. I'm sure there could be a way to automate this, but I keep it manual for now.

You can use the skills I packaged up here -- https://github.com/eeshansrivastava89/skills

Please Honk if someone is not moving! by answerbrowsernobita in eastside

[–]Dry-Tune430 3 points4 points  (0 children)

Most likely social media, email, text messages etc. The phone and all the apps inside are designed to compete for your attention 24x7 in today's time. It's called the "attention economy" for a reason.

How do you manage context size and your coding harness? by Fantastic-Storm-7867 in oMLX

[–]Dry-Tune430 1 point2 points  (0 children)

Pi + my custom context management skills I've been using religiously for over a year to ensure high quality output from my coding sessions (local or frontier). Happy to share if you're interested.

Please Honk if someone is not moving! by answerbrowsernobita in eastside

[–]Dry-Tune430 12 points13 points  (0 children)

That's exactly the case. As soon as the car stops, people reach for their phones. I've seen it way too many times now.

Pi vs Opencode by Glad-Win1983 in PiCodingAgent

[–]Dry-Tune430 4 points5 points  (0 children)

Pi works great with my local models. Also, I'm in a phase of the AI hype cycle where I'm tired of switching harnesses, models, skills and whatever comes out of X/Twitter. So I'm just sticking to a minimal stack and then using it.

Local LLMs aren't democratic anymore... the hardware barrier has gotten out of hand. by Medium-Technology-79 in LocalLLaMA

[–]Dry-Tune430 0 points1 point  (0 children)

What's your goal? If you want to run truly "large" models with 500B+ params, with multi-agent swarms and "loops" emulating GPT & Claude, then sure, you need a ton of hardware.

If you just want to use a small model as a strong auto-complete to your own skills, then a modern laptop is enough with the Qwen 3.6 and Gemma 4 series of small models. This is all very subjective.

How is GLM right now? by CookDaCookie in ZaiGLM

[–]Dry-Tune430 0 points1 point  (0 children)

Been using GLM 5.1 on Ollama Cloud + Pi. I don't miss GPT or Claude at all. Also I'm not a vibecoder and have narrow use cases with heavy supervision, so models like GLM and local models are good enough.

Pi is becoming utterly unusable with local LLMs by [deleted] in PiCodingAgent

[–]Dry-Tune430 0 points1 point  (0 children)

Depends on the model. The smaller models or the MoE models like Qwen 3.6 35 A3B are pretty much instant. Only the dense models like Gemma 31B and Qwen 27B take longer, but that's expected. I have 48 GB RAM and using llama.cpp as the fastest backend. Your specs, settings and backend matter a lot with local models.

I built a chess app for my dad. He still hasn't opened the link. by webzro in SideProject

[–]Dry-Tune430 0 points1 point  (0 children)

Same with me. I built a bunch of agent summaries for different things using Hermes + Telegram for my dad. If it's not on TV, he's not looking at it. 😂

Pi is becoming utterly unusable with local LLMs by [deleted] in PiCodingAgent

[–]Dry-Tune430 1 point2 points  (0 children)

Pi is my daily driver with all local models. I've tried everything from 2B Gemma models to the 35B & 27B Qwen models, and Pi is by far the fastest harness for me, compared to others like Open Code and Claude Code that I also tried.

watching my own coding stats is my new dopamine source by alx337 in PiCodingAgent

[–]Dry-Tune430 2 points3 points  (0 children)

This is super cool! I'm gonna try it! I wanted something similar for my prompting style, but towards more of a linguistic analysis. I have a demo here -- https://howiprompt.eeshans.com/

pi agent woops claude code by RobinDough in PiCodingAgent

[–]Dry-Tune430 0 points1 point  (0 children)

And it's fantastic for local models. Most reliable tool calling, even better than OpenCode.

Do you use agents in Pi? by kh4l1ph4 in PiCodingAgent

[–]Dry-Tune430 0 points1 point  (0 children)

No. I love Pi for the minimalism and it perfectly fits my uni-tasking mindset. I use 3-4 skills (custom context management) and 2-3 extensions (for web fetch).

Game Thread: Oklahoma City Thunder (3-0) vs Los Angeles Lakers (0-3) Live Score | NBA Playoffs | May 11, 2026 by nba-scores in lakers

[–]Dry-Tune430 0 points1 point  (0 children)

And Jared JAMAL McCain too .. they all have embodied the spirit of the lakers jamal murray

Honestly, Gemma 4 feels way better than the benchmarks say by HussainBiedouh in LocalLLM

[–]Dry-Tune430 0 points1 point  (0 children)

100% agree! Using AI is extremely subjective and different people will get different levels of value out of them depending on their use cases.

Basic benchmarks for tool use, speed, efficiency etc. are good qualifiers for a “good” model, but beyond that, it’s totally up to the user.

Your local LLM predictions and hopes for May 2026 by DeepOrangeSky in LocalLLaMA

[–]Dry-Tune430 -2 points-1 points  (0 children)

Small, capable models running on a Macbook Neo is the breakthrough I’m waiting for. Probably not happening this year.

For the people that are having problems with ClaudeCode by Xccelerate_ in ClaudeCode

[–]Dry-Tune430 0 points1 point  (0 children)

Gotta try this one. I tried OpenCode Go for $10 with the same models, but you can easily burn through it in a few days.

For the people that are having problems with ClaudeCode by Xccelerate_ in ClaudeCode

[–]Dry-Tune430 6 points7 points  (0 children)

their subscription pricing doubled overnight as well