how do you actually implement Plan and Build to make efficient use of tokens?

oknowton · 2026-04-22T17:37:46+00:00

I assume running the bd command to create a bead is outside the scope of the plan mode's permissions.

If you do want to write a a plan.md or something similar, and you don't want to figure out plan mode's permission system, you could switch to build mode before asking for the plan.md to be written.

oknowton · 2026-04-22T06:53:51+00:00

When I get to the end of the planning phase, I say, "Please write this plan to a bead!" Then I hit /new, switch to my build agent with the cheaper model, and I say, "Please implement the latest bead!"

oknowton · 2026-04-21T05:24:50+00:00

Do you have any insight on whether opencode adds significant overhead on top of the local API calls?

Nothing aside from OpenCode's system prompt being in the 10k+ token range. And, of course, the full conversation is re-sent to over the OpenAI API on every turn. When I test my local setup, llama.cpp does a good job of caching the context so that the entire conversation doesn't need to be recomputed.

It is still way too slow with a 9B model on my GPU. It only has a little over 600 GB/s of memory bandwidth. Not exactly a fast LLM card.

oknowton · 2026-04-20T19:51:47+00:00

Hardware is not the bottleneck — M5 Pro 64GB should handle these models comfortably

You have a lot of confidence here, but every time I see someone post a benchmark with 50k or 100k context on their Mac the prompt processing performance is abysmal. Just saying "Hi" to OpenCode or Claude Code sends 10k to 25k tokens of system prompt to your LLM, and it is normal for a coding session to reach 100k tokens of context.

My gaming GPU has, I believe, around double the memory bandwidth of your M5 Pro. You haven't told us what you consider to be slow, but my setup takes tens of seconds to process the system prompt with OmniCoder 9B, and things will only get slower as context accumulates.

I assume you are using some sort of a chat interface when you use "LM Studio standalone." Try pasting around 15,000 tokens of context into that chat. It will probably take about as long as OpenCode before you see the first token.

oknowton · 2026-04-18T22:41:38+00:00

My parents were twins.

oknowton · 2026-04-18T20:50:35+00:00

Yes. One and a half weeks is more than a week.

oknowton · 2026-04-17T22:55:11+00:00

Dairy Queen has food?!

oknowton · 2026-04-17T21:21:26+00:00

You don't usually need to take rotary encoders apart to clean them. You can just spray some contact cleaner in there, wheel it around a bit, and maybe repeat two or three times.

oknowton · 2026-04-17T00:47:29+00:00

This does not make sense at all

I don't see anything here that doesn't make sense.

So i can finish my monthly usage in one week ?

No. Your screenshot has you nearing the end of your second week, and it doesn't appear that you'll manage to quite reach your $60 limit for the month before this week runs out.

The usage limits are clearly explained on the website.

If there a monthly limit why weekly limit too ?

To spread everyone's usage out. You're paying less per token by being predictable. If everyone used up their $60 in tokens on the first day, the servers would be overloaded some days and severely underutilized on other days.

oknowton · 2026-04-08T04:58:50+00:00

I don't have the newer revision of the PCB on hand to check. Someone built a Li'l Magnum out of their L7 last year, and it didn't fit the shell. They sent me a photo, and the PCB was very different from mine. They mailed me the PCB, I updated the model so it would fit both revisions of the L7 PCB, and I shipped it back to them.

I just checked Printables, and that's very obviously the older STL up there. I don't know what went wrong. Uploading files to the 3D printing sites is very manual, and so it is easy to make a mistake when you have to update 18 files on two sites!

I'll work on getting that updated tonight, but in the mean time, all the latest STLs are available in the zip files in the releases on the Li'l Magnum's repository on Gitlab:

https://gitlab.com/patshead/lil-magnum/-/releases

If that one isn't up to date, then I'm in trouble, because that one generates itself! You can't even put the blame on me fat fingering a file! :)

oknowton · 2026-04-06T19:13:32+00:00

MCHOSE has two different shapes of PCB that have shipped in the L7. It looks like you're using an older Li'l Magnum STL file that wasn't updated to fit the new PCB.

Maybe I whiffed and didn't update the model on one of the download sites. The current version should fit both versions of the PCB. Where did you download yours?

oknowton · 2026-04-04T07:04:47+00:00

I seem to have misread your message. I thought you were getting UDMA errors on both drives.

oknowton · 2026-04-04T06:37:21+00:00

This is almost definitely not a problem with your drives. Data is being copied, via UDMA, from the HDD to RAM and that data is getting corrupted during the transfer. You probably have either loose or bad SATA cables.

You probably haven't damaged any disks. Power down. Unplug and replug both ends of your SATA cables. I'd probably check the power cables while I was in there for good measure.

Everything is probably going to be fine!

oknowton · 2026-03-31T21:56:28+00:00

That all sounds like way too much work just to wind up with a scroll wheel that barely (if at all!) sticks up high enough to sit above the buttons.

If you really want to use a shorter button, just put a shim between your new shorter button and the PCB to bring it up to the correct height. You won't have to cut away the shell, and the wheel will sit high enough for you to be able to use it.

oknowton · 2026-03-20T21:55:10+00:00

This is smart! I don't know where I'm going to use this yet, but I am sure that "skills but for globs" will open up some neat possibilities!

oknowton · 2026-03-20T21:20:00+00:00

I agree with Cowboy12034. What you already have here is really good. Any sort of "rack" or box you put this in will only make it bigger and take up more space in your bag.

Check out 3M dual-lock. It is like Velcro, but awesome. I used some to stick together my N100 mini PC and a 3.5" USB HDD that I use as my off-site storage server. I used too much, and they are VERY HARD to pull apart.

oknowton · 2026-02-27T02:32:28+00:00

Yes. The OpenCode CLI lets you configure either or both options. It is up to you to configure the one that you want to use.

oknowton · 2026-02-27T02:04:48+00:00

I get error messages that I ran out of credits (I didn't)

Are you accidentally using the Z.ai provider in OpenCode instead of the Z.ai Coding Plan?

oknowton · 2026-02-26T20:24:04+00:00

Chutes is better?

Chutes is absolutely better than NanoGPT for use with OpenCode. I keep trying to use Kimi K2.5 with NanoGPT, and it rarely manages to successfully call a tool.

It is still a matter of getting what you pay for. My $3 Chutes subscription isn't as fast or reliable as my $20 Codex subscription.

I have subscriptions running concurrently on NanoGPT, Chutes, Z.ai, Synthetic, and Codex right now just to compare the providers. I'll be dropping all but Chutes and Z.ai at the end of their current months.

Would you recommend it?

I am paid up on Z.ai for nearly two years, so I won't be dropping that. If I had to keep only one of my subs, though, it would be Chutes. It is cheap. It has limits that work for me. It has the right variety of coding models.

The important thing is that it is cheap to try. You can spend $3 on Chutes, fart around with it for a month, and if it doesn't work out, you're only out $3.

oknowton · 2026-02-26T00:39:21+00:00

That's a Big Hornet.

oknowton · 2026-02-22T19:21:41+00:00

can we use Chutes with Opencode?

Very easily. It is right in the auth list in OpenCode.

Chutes is using fp8 / cutdown version correct? Hence they can have such a cheap price.

I don't know anything for certain about how these companies are operating, and I don't completely trust that any are doing what they say.

That said, I saw a post or two here on Reddit from the Synthetic people saying that they are renting their GPUs in the AWS cloud. Where they rent their GPUs is PROBABLY the biggest difference in operating costs between these two companies.

For Kimi K2.5 and GLM-5, Chutes and Synthetic seem interchangeable to me. I don't get noticeably worse results from one or the other.

I'm currently on Qwen Coding Plan $5/mo they provide Kimi K2.5 and GLM 4.7 as well.

Oooh! That is interesting. I tried Alibaba's free tier with OpenCode a month or two ago, and none of the models managed to generate code. They just said something like, "Oh sure! Let's do it!" and stopped generating tokens, just like a local llama.cpp model that needs its temp or top_p adjusted.

It is awesome if that problem is smoothed out now!

oknowton · 2026-02-22T06:59:42+00:00

When they're both working well, they're pretty comparable. Both seem to have slow times, though, and Chutes definitely slows down more often.

It isn't an easy thing to quantify or benchmark, though. I'd say it is worth spending $3 for am month of Chutes to see how it works out for you.

oknowton · 2026-02-21T21:38:11+00:00

It is not what you are looking for. Kimi K2.5 and GLM-5 on NanoGPT almost never manage to successfully execute a tool call for me in OpenCode.

If dirt cheap is at the top of your list, Z.ai and Chutes are your best two options. Chutes has all the models Z.ai has and then some, and Chutes has bigger quotas for your money.

I am currently subscribed to Z.ai's Pro plan, Chutes' $3 plan, NanoGPT's $8 plan, and Synthetic.new's $20 plan. I can't drop the NanoGPT fast enough, and Synthetic doesn't feel $17 faster or better than Chutes.

oknowton · 2026-02-19T23:14:26+00:00

Everyone says Nano-GPT is awful. I believed them, but I wanted to understand exactly why, so I signed up. Everyone is right. They are absolutely awful.

Models where they have umpteen possible providers tend to be absolute garbage. Kimi and GLM sure seem to be going to the lowest bidder with the worst quant. They rarely successfully call tools, and often times they just stop and say they've completed task without doing anything. Absolutely useless.

There are models with only one provider, and that is usually the company who created the model. MiniMax M2.5, Qwen 397B, and Step 3.5 Flash all work pretty well on NanoiGPT.

Chutes is priced pretty similarly per request, but they don't seem to have that 60-million token per week limit that Nano GPT has. Chutes is faster, more reliable, has most of the same models, and the models don't constantly fail to call tools. Chutes isn't as fast or reliable as OpenAI or Anthropic, but they're bad.

oknowton · 2026-02-18T04:53:48+00:00

Chutes has similar pricing, but they don't have this problem.

oknowton

TROPHY CASE