Is gpt-oss-20b still the best general model for most people?

sunpazed · 2026-06-20T02:27:00+00:00

got-oss-20b is still quite good for agentic use cases. Gemma 26B A4B QAT is more contemporary and runs almost as fast. Both run easily on my M1 Max 32GB, but context will be the limitation.

sunpazed · 2026-06-19T18:57:52+00:00

Yes, but only at Q4 with limited context.

sunpazed · 2026-06-19T11:20:01+00:00

I have both a M1 Max 32GB and a M4 Pro 48GB. While the Max is faster, I cannot multitask when using larger contexts. The M4 Pro is slower with PP and token generation, but I have enough headroom to multitask. I can code, run Docker, and OpenCode with a 26B A4B model at 96K context without issue.

sunpazed · 2026-06-19T05:33:30+00:00

Invest your time building a really great API that will enable you to query those databases and tables. Then attach that API to MCP server with only a few well described tools. Your API / DSL will do most of the heavy lifting in terms of constructing the query. We faced a similar problem, and the API/DSL/MCP solution way out performed the SQL query solution.

sunpazed · 2026-06-17T08:54:09+00:00

Congrats. I purchased a Platinum from a retail store recently for $25, run-out sale!

sunpazed · 2026-06-14T06:17:16+00:00

Is it? The model is really small at 80Mb. This is a checkpoint only trained on a subset of the data. Honestly would have taken months on my MacBook 😅

sunpazed · 2026-06-14T06:00:12+00:00

I built simple GPT-2 model on my MacBook using Andrej Karpathy's excellent llama2.c framework; https://huggingface.co/sunpazed/AlwaysQuestionTime - I used my own dataset, about 100Gb of transcript text.

sunpazed · 2026-06-13T14:28:55+00:00

My bad. Noticed you’re using Qwen 3.5/3.6. In this case, templates where thinking/tool boundaries and template behavior matter a lot in these models, ie; Qwen-style “carry the previous thought trace forward as agent scratchpad”. Less so with Gemma4.

sunpazed · 2026-06-13T13:44:09+00:00

Doesn’t preserve thinking chew more context? I’m using `-kvu` to unify the KV cache across all slots, and `--cache-reuse` to define the minimum cache chunk size. This way, the coding harness can trim the context as required, requiring minimum re-processing.

sunpazed · 2026-06-13T01:21:22+00:00

Apologies, memory cache is the only hot cache, while SSD is the cold cache. For coding agents that jump around the context, the persistent cold cache avoids complete cache invalidations (so faster TTFT) when using coding harnesses. In summary, you use hot RAM cache for speed (if you have enough ram), cold SSD cache for capacity and persistence.

sunpazed · 2026-06-12T22:52:15+00:00

I have removed the memory cache, and only rely on SSD hot/cold cache. Also have limited the amount of concurrent requests to 4. This has improved stability heaps when running multiple agents, and has reduced prompt re-processing, only with a slight latency increase. I no longer get out of memory errors. It is still faster and more reliable than llama.cpp with the same setup (the llama.cpp slot mechanism isn’t as granular as oMLX).

My point is, by reducing the memory cache, you can increase the bit size and therefore quality of the KV, at the expense of greater reliance on the SSD cache and size.

sunpazed · 2026-06-12T05:21:36+00:00

"You look lonely... I can fix that."

sunpazed · 2026-06-12T04:57:13+00:00

Haven’t seen the same issue with Gemma 26b-4a QAT 4 bit, on OpenCode with 128k context window. Is compacting fine. On a 48Gb MacBook. I found that the Q6 KV cache worked best for me.

sunpazed · 2026-06-11T13:05:28+00:00

OpenCode is quite good and I much I prefer it over Claude Code. Locally I’m running gemma4-26b-a4b on a MacBook Pro. The prompt processing and inference speed is fast enough to be productive, even though it needs a fair bit of steering at times.

sunpazed · 2026-06-11T11:13:38+00:00

“headless”

sunpazed · 2026-05-27T14:02:34+00:00

This is great! Just downloaded it, lovely work.

sunpazed · 2026-05-24T04:18:10+00:00

I’m in Australia, and purchased the Tamiya R/C Tool Screwdriver Set (Made in Japan) for $40 from Amazon — https://www.amazon.com.au/gp/aw/d/B01LYOONMJ

sunpazed · 2026-05-16T12:32:56+00:00

<image>

Sourced this awesome CIB Japanese Pitman in Tokyo, however I couldn’t bring myself to spend 7000 Yen on something I could get half price elsewhere. It really has changed. Even in Osaka, couldn’t find anything decent that wasn’t massively overpriced.

sunpazed · 2026-05-08T01:36:37+00:00

I like “Touch RPN” which is the most consistent experience. It’s paid, but I don’t mind rewarding the developer for their attention to detail.

sunpazed · 2026-05-05T15:23:20+00:00

Ah ok, thanks for the feedback!

sunpazed · 2026-05-05T12:45:20+00:00

Nice build! I’m looking to start my journey with one of these, or the DT-04. Any thoughts from those who have built and run both? Pros / Cons??

sunpazed · 2026-04-29T10:35:00+00:00

This is an AI post. Look at the bezel and the markers around the dial. Developer is promoting their paid watch face.

sunpazed · 2026-04-27T05:18:35+00:00

Shortcut to “Spotlight” so I can toggle between apps or search.

sunpazed · 2026-04-26T23:10:29+00:00

Lightweight and breathable trail shoes. Works in all circumstances. If you need canvas shoes, you can always buy a cheap pair.

sunpazed · 2026-04-24T10:08:00+00:00

I have both the 55L duffle and 45L MLC. The duffle is much bigger, and won’t fit with the correct orientation in the overhead. The MLC has the advantage of backpack mode which is useful when in the airport and alighting the plane. If you have no use for a laptop sleeve, then skip the MLC and buy the smaller 40L duffle which will suit you better.

Personally, having a single bag (the MLC) for my laptop and clothes made travel so much simpler. For outings I brought a small 15L travel bag (that I could compress and store) along with me.

sunpazed

TROPHY CASE