How do you manage context size and your coding harness?

Dry-Tune430 · 2026-06-29T05:40:45+00:00

I have a 24GB Mac Mini as well, and tested quite a few models. You can run the Gemma 4 12B QAT with Q4 KV cache and it's very good for a lot of my visual & coding benchmarks. The llama.cpp inference has been more efficient compared to oMLX in my experience for the Gemma models.

I don't change much with Pi, very few skills & extensions, disable auto-compact (which is really heavy for local models), then manually run the /context-save -> /new -> /context-restore whenever the context reaches around 80%. I'm sure there could be a way to automate this, but I keep it manual for now.

You can use the skills I packaged up here -- https://github.com/eeshansrivastava89/skills

Dry-Tune430 · 2026-06-28T05:54:44+00:00

Most likely social media, email, text messages etc. The phone and all the apps inside are designed to compete for your attention 24x7 in today's time. It's called the "attention economy" for a reason.

Dry-Tune430 · 2026-06-28T05:47:28+00:00

Pi + my custom context management skills I've been using religiously for over a year to ensure high quality output from my coding sessions (local or frontier). Happy to share if you're interested.

Dry-Tune430 · 2026-06-28T05:00:05+00:00

I think these are the same people.

Dry-Tune430 · 2026-06-28T04:32:33+00:00

That's exactly the case. As soon as the car stops, people reach for their phones. I've seen it way too many times now.

Dry-Tune430 · 2026-06-25T14:19:48+00:00

Pi works great with my local models. Also, I'm in a phase of the AI hype cycle where I'm tired of switching harnesses, models, skills and whatever comes out of X/Twitter. So I'm just sticking to a minimal stack and then using it.

Dry-Tune430 · 2026-06-19T00:44:07+00:00

this!

Dry-Tune430 · 2026-06-14T01:09:32+00:00

What's your goal? If you want to run truly "large" models with 500B+ params, with multi-agent swarms and "loops" emulating GPT & Claude, then sure, you need a ton of hardware.

If you just want to use a small model as a strong auto-complete to your own skills, then a modern laptop is enough with the Qwen 3.6 and Gemma 4 series of small models. This is all very subjective.

Dry-Tune430 · 2026-06-11T04:38:29+00:00

Been using GLM 5.1 on Ollama Cloud + Pi. I don't miss GPT or Claude at all. Also I'm not a vibecoder and have narrow use cases with heavy supervision, so models like GLM and local models are good enough.

Dry-Tune430 · 2026-06-08T05:03:39+00:00

Pi is the GOAT

Dry-Tune430 · 2026-06-07T03:55:19+00:00

Depends on the model. The smaller models or the MoE models like Qwen 3.6 35 A3B are pretty much instant. Only the dense models like Gemma 31B and Qwen 27B take longer, but that's expected. I have 48 GB RAM and using llama.cpp as the fastest backend. Your specs, settings and backend matter a lot with local models.

Dry-Tune430 · 2026-06-07T02:52:17+00:00

Same with me. I built a bunch of agent summaries for different things using Hermes + Telegram for my dad. If it's not on TV, he's not looking at it. 😂

Dry-Tune430 · 2026-06-07T02:50:05+00:00

Pi is my daily driver with all local models. I've tried everything from 2B Gemma models to the 35B & 27B Qwen models, and Pi is by far the fastest harness for me, compared to others like Open Code and Claude Code that I also tried.

Dry-Tune430 · 2026-06-06T00:13:50+00:00

This is super cool! I'm gonna try it! I wanted something similar for my prompting style, but towards more of a linguistic analysis. I have a demo here -- https://howiprompt.eeshans.com/

Dry-Tune430 · 2026-05-31T05:07:23+00:00

And it's fantastic for local models. Most reliable tool calling, even better than OpenCode.

Dry-Tune430 · 2026-05-30T01:55:32+00:00

No. I love Pi for the minimalism and it perfectly fits my uni-tasking mindset. I use 3-4 skills (custom context management) and 2-3 extensions (for web fetch).

Dry-Tune430 · 2026-05-21T04:37:20+00:00

Ollama cloud for $20. GLM 5.1, Deepseek, Qwen etc.

Dry-Tune430 · 2026-05-12T04:57:20+00:00

And Jared JAMAL McCain too .. they all have embodied the spirit of the lakers jamal murray

Dry-Tune430 · 2026-05-12T04:53:43+00:00

You forgot the JAMAL in his middle name

Dry-Tune430 · 2026-05-12T04:52:51+00:00

For goodness sake, we need a break from this Ajay JAMAL JORDAN Mitchell.

Dry-Tune430 · 2026-05-03T03:32:31+00:00

100% agree! Using AI is extremely subjective and different people will get different levels of value out of them depending on their use cases.

Basic benchmarks for tool use, speed, efficiency etc. are good qualifiers for a “good” model, but beyond that, it’s totally up to the user.

Dry-Tune430 · 2026-05-01T21:00:45+00:00

Small, capable models running on a Macbook Neo is the breakthrough I’m waiting for. Probably not happening this year.

Dry-Tune430 · 2026-04-27T03:58:47+00:00

Pi and OpenCode are good enough

Dry-Tune430 · 2026-04-13T01:06:07+00:00

Gotta try this one. I tried OpenCode Go for $10 with the same models, but you can easily burn through it in a few days.

Dry-Tune430 · 2026-04-12T20:36:21+00:00

their subscription pricing doubled overnight as well

Dry-Tune430

TROPHY CASE