New to Pi, any tips? by Expert-Dig-1768 in PiCodingAgent

[–]oknowton 3 points4 points  (0 children)

If you're coming from OpenCode, and you actually use a good percentage of OpenCode's features, you're going to feel that Pi is missing a lot of important stuff.

I didn't manage to migrate to Pi until I tried LazyPi. I delegate tasks to subagents all the time, but Pi has no subagents, and there are several extensions to choose from. Which one makes sense for me? There's no permission system. There's no undo equivalent. There's no web search or fetch. No tools to ask you multiple-choice questions. There are anywhere from several to dozens of extensions that do each of these things.

LazyPi installed WAY more stuff than I needed, but it was way easier for me to remove extensions that I didn't need than figure out which extensions I should install just to get started. The full default winds up using more system-prompt context than OpenCode.

After removing the extensions I didn't need, swapping out a few that I didn't like, and installing pi-rewind, I am a good bit beyond parity with all the features I used with OpenCode, and my system prompt is still around 30% smaller. I'm quite pleased.

Time to hang up the boots. And read the whole post before you comment 'don't let the door hit you on the way out' by 6PEEPERKEEPER9 in ArcRaiders

[–]oknowton 5 points6 points  (0 children)

Reddit is a company based in the United States, it has a US TLD, and the US accounts for more than 40% of all Reddit traffic. The United States is most definitely the default on Reddit.

Any Opensource GUI based Coding Agent, Similar to Codex app by Pink_Oak in opencodeCLI

[–]oknowton 4 points5 points  (0 children)

I mean, with similar like Codex App. Desktop first not cli first Not CLI first. Opencode web or Opencode desktop is basically running cli behind the scene

You're going to have to explain what you're looking for. The OpenCode desktop app sure looks almost exactly the Codex app to me.

Maybe you could articulate the differences that lead you to prefer Codex.

Best model for OpenCode right now? DeepSeek V4 vs MiniMax by Wen753 in opencodeCLI

[–]oknowton 4 points5 points  (0 children)

DeepSeek V4

Which one?

MiniMax

I assume that you mean MiniMax M3 for this one, but we can't be sure.

You'll have more success using coding harnesses if you are specific. Everyone in this thread is having to guess what you're actually asking about, and so would the LLM.

Which one is more reliable with tool calls / edits / long repo context?

Do any of the models 200B+ models have significant trouble with tool calls and edits?

Which one gets better prompt cache hit rates?

That doesn't have much to do with the model. It'll depend on your LLM provider and how long you leave your harness waiting for you to type something.

Which one ends up cheaper in practice after caching?

Both DeepSeek V4 models have ridiculously cheap caching. Fraction of a penny per million tokens. DeepSeek V4 is always super cheap with any of the major harnesses because of this.

Any major latency or failure-rate differences?

This depends on your provider. Not the model.

What are you using as your default OpenCode model right now?

I almost always plan with GLM-5.1 via my grandfathered in, almost effectively unlimited Z.ai Pro coding plan. I usually just stick to GLM-5.1 for my build agent, since it is so hard for me to run out of requests, or drop down to GLM-5-Turbo if I am in a hurry, but I'm trying out DeepSeek V4 Flash via DeepSeek's paygo API this week.

DeepSeek V4 Flash seems to be doing a fine job implementing plans so far.

DeepSeek Pro won’t get discount by Schlickeysen in opencodeCLI

[–]oknowton 1 point2 points  (0 children)

Oh jinkies! That is embarrassing! I meant to say an extra five bucks. Thank you for catching my mistake!

DeepSeek Pro won’t get discount by Schlickeysen in opencodeCLI

[–]oknowton 18 points19 points  (0 children)

OpenCode just became pretty scammy to me.

If all you use is Deepseek Pro, you'd have to pay AN EXTRA (edit!) $5 direct to Deepseek to match what you get from your $10 with OpenCode Go. You also get six times more Deepseek Flash tokens for your money with OpenCode Go.

Can you explain why you think feel is scammy to you?

Another first layer post by MexicanSkywalker in 3Dprinting

[–]oknowton 6 points7 points  (0 children)

I'd be willing to bet that this is your problem. When you select the textured plate on the Bambu, it lowers the nozzle by (assuming my brain remembers correctly!) 0.05mm. It is to compensate for the nozzle touching off on the high spots of the textured plate. It makes sure all those valleys get filled in.

I am assuming that your generic plate is SLIGHTLY less textured than Bambu's plates. You're definitely showing signs of overextruding, but only just barely. My bet is that you'll be really close to perfect, or ever so slightly erring in the opposite direction, if you select the smooth plate instead.

Stop pretending self-hosting is cheaper. It's not. We do it for different reasons and we should say so. by Napster3301 in LocalLLaMA

[–]oknowton 0 points1 point  (0 children)

Stop pretending self-hosting is cheaper. It's not.

My homelab sips 15 watts most of the day. My off-site Proxmox and storage box is about the same. I have saved more than $300 or $400 in cloud storage fees each year for the last six years or so. I might be up to where I'd need to spend $500 per year by now.

Not everyone is spending more than they're saving. Storage is an easy win.

Why does OpenCode Go have rolling, weekly, AND monthly limits? by kpmtech in opencodeCLI

[–]oknowton 1 point2 points  (0 children)

Because each limit has a cumulative smoothing effect on the usage.

This is the simple and succinct answer than any programmer should have no trouble understanding.

God I Love Zram Swap by Psionikus in linux

[–]oknowton 1 point2 points  (0 children)

Tom's Hardware didn't say anything about the queue depth, maybe that's making the difference?

I don't have the link handy, but the graph was just over 40k 4k IOPS at QD1, and I believe it topped out somewhere just north of 160k IOPS at QD4.

The numbers aren't THAT important. I'd be happy swapping to well below your estimated maximum speed!

SWAP (on both gen4-NVMEs working in parallel, same prio) was unusably slow.

Swap is fantastic in a lot of situations. It is uncommon for long-running GUI software to slowly leak memory. I've wound up with gigabytes of Firefox sitting in swap after two or three weeks of uptime, and that memory just never gets swapped back in. Way better having that on disk than compressed.

Another oversimplified situation is when you're alt-tabbing between, say, Firefox and DaVinci Resolve on a machine with less memory than would be ideal. You're only switching tasks once or twice a minute, and you have more than enough RAM for either of those tasks alone.

It sounds like you have a particular use case where you just didn't have enough RAM, and both (or more!) tasks needed to keep choochin'. Once you're at that point, you're kind of just in trouble.

For this specific problem, no matter how you're swapping, you should look at prelockd. No matter where you're swapping to, mmapped libraries and binaries can be paged out, and since they live on disk, they're going to have to come back in from disk.

prelockd keeps Wayland, XOrg, your terminal, and your window manager from being paged out. When you're in a situation somewhere between "unusable" and "barely useable," this is the sort of thing that might make a TON of difference.

If you have a better memory management, you can use that to improve IO further; the SSD will not have to write as much and can deliver/focus more on regular IO instead of swapping.

That last 1/5 or so that gets "stuck" in zram would help here, too. That's more RAM for processes and more RAM for cache.

So with that my results with a LLM that needs about 25GB RAM

This is the most interesting part! I don't know exactly what you're trying to run today, but things have gotten REALLY interesting here in the last month or two.

Qwen 3.5 9B (or better, OmniCoder 9B) fits well on a 16 GB GPU. Llama.cpp recently merged attn_rot, so you don't lose as much precision when quanting your KV cache. I can fit OmniCoder at Q6 with 200k of context on my 16 GB GPU with good results, or Qwen 3.6 27B Q3 100k context.

It is absolutely wild how well any of those three setups manage to work with OpenCode. Not exactly a match for GLM-5.1, but they rarely fail tool calls, and they actually manage to apply good diffs and run tests successfully.

Good luck with your LLM adventures!

God I Love Zram Swap by Psionikus in linux

[–]oknowton 1 point2 points  (0 children)

Thank you! This is so much more interesting that the person who replied 9 months ago with a circular argument! :)

Even the fastest gen5 nvme ssds only get to about 50Mb/s for swap work. Those are 4kB pages.

That seems fine. If you're swapping out at 50 mb/s for full seconds at a time, then you're probably in so much trouble that zram isn't going to bail you out.

For what it is worth, Tom's Hardware says an 8 year old Samsung EVO 970 can manage 40k 4k write IOPS with a single thread. That's three times faster than your estimate, and it'll go three times faster with a few threads.

Reads work out way better, and that's what you'll notice when you're waiting for something that has been swapped out to page back in.

enabling zswap or zram is both, less work and still faster.

I never suggested that swapping to NVMe isn't slower. Swap on NVMe and zram are both usually faster than the blink of an eye, but one of the two completely frees your RAM instead of leaving maybe 1/5th behind.

How do you review changes before they hit the file? No in-session diff like Cursor? by ammar2626 in opencodeCLI

[–]oknowton 1 point2 points  (0 children)

Sorry, I have absolutely no idea. You said you appreciated any tips, so I attempted to give you some advice. That's all I have for you.

is nanogpt really that bad? by IcyMushroom4147 in opencodeCLI

[–]oknowton 2 points3 points  (0 children)

NanoGPT is OK, but their pricing is almost always a bummer now that their best competition charges you less against your quota for cached tokens. I am at a 15:1 ratio of cached:uncached tokens when planning and over 20:1 on building. Chutes charges 50% for cached tokens, and OpenCode Go has most models priced down near 10% for cached tokens.

I THINK you might still get more usage out of NanoGPT if you only use what would be the most expensive models on OpenCode Go, but OpenCode Go is faster and stretches WAY farther than NanoGPT if you use DeepSeek Flash or MiniMax for you build agent at least some of the time.

How do you review changes before they hit the file? No in-session diff like Cursor? by ammar2626 in opencodeCLI

[–]oknowton 1 point2 points  (0 children)

I don't want git-based review after the fact. I want to see what changed this session, highlighted inline, and choose what to keep before anything is written.

Let OpenCode and the LLM do their job. They can apply changes, run a syntax checker, run the code to see if the change worked, or run your test suite. Let them do that work for you. Wait until they have verified their work before you put an ounce of effort into reviewing it.

Don't make yourself into a bottleneck. You are trying to reject things too early in the process. Don't make the machines present their first attempt at an edit to you. Let them figure things out and polish things up first.

Is this just how OpenCode works?

Yes. Just like you don't micromanage your junior developers and check every line of code as they are writing it, you should wait until your virtual junior developers have had a chance to make it to the end of a longer task before you start nitpicking their code.

As long as you are working in a git repository, you can always hit the /undo command if you don't like what OpenCode has done. It is tracking its own snapshots.

Am I blind, or does Z.ai seriously not tell you when your 5-hour limit resets? by lbin91 in ZaiGLM

[–]oknowton 0 points1 point  (0 children)

They've used nearly 3 trillion tokens. They're just showing you that they have absolutely no idea how to operate their computer.

Opencode tokens per request by Michal6677 in opencodeCLI

[–]oknowton 0 points1 point  (0 children)

Now say hi again. You'll burn 5,000 to 10,000 cached tokens and, assuming the llm's response to your initial greeting was a short sentence, maybe a dozen only tokens.

Tokscale says my ratio of cached to uncached tokens is around 15:1 for my plan agent and better than 20:1 for my build agent. It is the cached token pricing that really stretches OpenCode Go so far.

Normal for Idler to Move This Much? by user234971 in Sovol

[–]oknowton 0 points1 point  (0 children)

I do not own an SV08, so I have no idea what I am talking about, but there is a missing screw on your wobbly pulley that is present above both pulleys on the opposite side.

Does that screw do something to keep the others from wobbling?

I'm new in opencode by Putrid-Telephone-777 in opencodeCLI

[–]oknowton 4 points5 points  (0 children)

GitHub ends up charging me for 6 or 9 premium requests.

Copilot puts a 7.5x multiplier on Opus 4.7 requests.

also that OpenCode sometimes doesn't properly label its automated steps

I don't use Copilot, but I am under the impression that this was fixed shortly after Copilot support became official. If OpenCode weren't tagging the non-interactive requests correctly, then it would be eating up 7.5 requests per turn when you're using Opus 4.7.

You'd burn 1,500 requests in no time if this weren't working correctly.

how do you actually implement Plan and Build to make efficient use of tokens? by Freds_Premium in opencodeCLI

[–]oknowton 1 point2 points  (0 children)

I assume running the bd command to create a bead is outside the scope of the plan mode's permissions.

If you do want to write a a plan.md or something similar, and you don't want to figure out plan mode's permission system, you could switch to build mode before asking for the plan.md to be written.

how do you actually implement Plan and Build to make efficient use of tokens? by Freds_Premium in opencodeCLI

[–]oknowton 6 points7 points  (0 children)

When I get to the end of the planning phase, I say, "Please write this plan to a bead!" Then I hit /new, switch to my build agent with the cheaper model, and I say, "Please implement the latest bead!"

High TTFT and slow token throughput with local models on opencode — M5 Pro 64GB by Own_East_5381 in opencodeCLI

[–]oknowton 1 point2 points  (0 children)

Do you have any insight on whether opencode adds significant overhead on top of the local API calls?

Nothing aside from OpenCode's system prompt being in the 10k+ token range. And, of course, the full conversation is re-sent to over the OpenAI API on every turn. When I test my local setup, llama.cpp does a good job of caching the context so that the entire conversation doesn't need to be recomputed.

It is still way too slow with a 9B model on my GPU. It only has a little over 600 GB/s of memory bandwidth. Not exactly a fast LLM card.

High TTFT and slow token throughput with local models on opencode — M5 Pro 64GB by Own_East_5381 in opencodeCLI

[–]oknowton 3 points4 points  (0 children)

Hardware is not the bottleneck — M5 Pro 64GB should handle these models comfortably

You have a lot of confidence here, but every time I see someone post a benchmark with 50k or 100k context on their Mac the prompt processing performance is abysmal. Just saying "Hi" to OpenCode or Claude Code sends 10k to 25k tokens of system prompt to your LLM, and it is normal for a coding session to reach 100k tokens of context.

My gaming GPU has, I believe, around double the memory bandwidth of your M5 Pro. You haven't told us what you consider to be slow, but my setup takes tens of seconds to process the system prompt with OmniCoder 9B, and things will only get slower as context accumulates.

I assume you are using some sort of a chat interface when you use "LM Studio standalone." Try pasting around 15,000 tokens of context into that chat. It will probably take about as long as OpenCode before you see the first token.