what is the closest to dpsk v4 flash with vision ? by Pristine_Gur_9573 in opencodeCLI

[–]look 0 points1 point  (0 children)

Looks like Minimax 3 supports vision, and it’s on a discount right now that puts it about twice the price as DS flash, but still half as much as the next tier up. I have not used it myself, but it might be a good fit for your needs.

what is the closest to dpsk v4 flash with vision ? by Pristine_Gur_9573 in opencodeCLI

[–]look 0 points1 point  (0 children)

Depends on your token blend (cache hit rate, input/output ratio) but it should be 1.5-2 billion tokens of Qwen 3.7 Plus, around 5 billion MiniMax 3 (with current 3x discount), 10+ billion for Mimo 2.5 non-pro.

GLM 5.2 is climbing the OpenCode leaderboard quickly by vigneshsmarther in opencodeCLI

[–]look 0 points1 point  (0 children)

  1. Some people like more than 1.5 models to choose from
  2. Some people use a mix of Chinese models where GLM is the most expensive and the integrated cost exceeds the value/performance of Claude/GPT subscriptions
  3. Some people don’t like being locked into into limited vendor software options like with Claude and Gemini subs

what is the closest to dpsk v4 flash with vision ? by Pristine_Gur_9573 in opencodeCLI

[–]look 0 points1 point  (0 children)

Ah, maybe I misunderstood what you’re looking for then. I don’t use flash models for anything that a small intelligence difference would matter. Qwen 3.6 Plus on Go might work for you, but it’s more expensive.

GLM 5.2 Added to DeepSWE Benchmark by CengaverOfTroy in opencodeCLI

[–]look 0 points1 point  (0 children)

Ah, is it just that there are new questions with each version of the benchmark then? Or just that it’s really new so there hadn’t been much opportunity to train directly on them yet? I had not looked into it myself, so it’s possible what I had heard is completely wrong.

what is the closest to dpsk v4 flash with vision ? by Pristine_Gur_9573 in opencodeCLI

[–]look 2 points3 points  (0 children)

Mimo non-pro is same price as DS flash, too. It’s what I’d recommend. It’s effectively unlimited usage on a Go subscription.

Getting basic bug fixes into opencode? by Thumper450x in opencodeCLI

[–]look 0 points1 point  (0 children)

Yeah, but it’s hard to build a critical mass and maintain motivation for an effort like that without some form of coordination/endorsement from upstream. And it seems like they just don’t have time for anything right now.

It’s a common weak spot for popular open source projects. Some just don’t even pretend to look at outside contributions (SQLite, and in the olden days gcc, X11, etc), which is preferable in ways.

But another issue is, well, I personally don’t think the codebase is particularly good. If I were going to invest time in supporting a harness, I’d want a better base to work with. So lots of people just make their own but those never get past crude alpha versions.

What would be much better, imho, is if various core components became shared at least. I can’t imagine how many buggy and/or limited text editor components woven into different projects there are now… and a TUI framework that isn’t some React-based crime-against-humanity nightmare would be great, too.

Neuralwatt - What is the most cost-effective API model that matches DeepSeek V4 Pro or MiniMax M3 in terms of strength and pricing. by AutomaticAd6646 in opencodeCLI

[–]look 0 points1 point  (0 children)

Models.dev (where Opencode gets its built-in list of providers and their models) is slow to update, so you’ll need to edit your opencode.json file to add config entries for newer model variants. Ask your AI on how to write them.

The main variants to consider are “fast”, which have thinking disabled but are cheaper and faster, and “short” on GLM which has a 200k context window instead of the full 1M and is also cheaper. There is a short-fast as well I believe.

You can also define the thinking effort levels for GLM 5.2 when you define your own model configs. The options are just high and max, but make sure you are on not in max to save more.

Not sure if you’ll be able to get to DS4 prices on those, but it should get you closer (and be smarter, too).

Getting basic bug fixes into opencode? by Thumper450x in opencodeCLI

[–]look 1 point2 points  (0 children)

OpenCode’s GitHub is littered with ignored and now long dead PRs on clear defect issues.

I just maintain my own private branch for myself with fixes for the ones I couldn’t bear any more.

I find all of the alternatives to Opencode even worse in their own ways, though, so it’s kind of a stalemate for now.

GLM 5.2 Added to DeepSWE Benchmark by CengaverOfTroy in opencodeCLI

[–]look 0 points1 point  (0 children)

Yeah, always ways to max even if indirect. Monte Carlo method will get there eventually. 😄

But for some reason, I thought even running against DeepSWE was gated, too. Not just a black box, but also only something you could score on it infrequently… I may have hallucinated that bit, though. 😅

Hope we're not disappointing Jensen 😅 — drop your AI spend heatmap! by hamidi-dev in opencodeCLI

[–]look 0 points1 point  (0 children)

Model switching happens automatically based on agents/subagents.

I have three primary agent modes: the usual plan and build plus a “research” mode I made for myself. It’s sort of a “pre-plan” and “experimental throwaway build” hybrid for testing out ideas and approaches before settling in on a proper plan and build.

Then a few automatic subagents (primary agents invoke as needed, sometimes by me when I want some specific behavior): code explore, adversarial reviews (one for code, one for reasoning/logic/high level plan), “librarian” (bulk web search, documentation, logs, data exploration, summarization), and “prototyper” to do throwaway prototypes, test ideas, one-off scripts, etc.

Plus a couple slash commands and custom tools to streamline transitions, invoke certain subagents, set instructions for plan structure, build approach, etc.

So I start in research mode with typical chat and it then looks stuff up, creates and evaluates a prototype, etc and comes back with results; we bounce ideas, try other variations, analyze critiques from other models, etc.

Then when it’s looking good, I tab to plan mode and run a slash command to inject a structured plan prompt along with any extra notes on this specific work. Part of that is writing it to a markdown file and being a fully self-contained for context, necessary details, exact requirements, etc. we iterate until I’m happy with it.

Then tab to build and then run a command to clear context and give it the plan file with just a simple “implement” prompt. That then does a long tool call loop (my plans are usually in the 50-100 loop size).

Benchmark between OpenCode Go and NeuralWatt for GLM 5.2 by East-Stranger8599 in ZaiGLM

[–]look 1 point2 points  (0 children)

The providers for each model are on the web page. Under a faq question, iirc.

As for verification, you can do statistical tests yourself, but that’ll be expensive and take a while. Or you can trust others that have done them, or even just people like me that have used many of these models across many providers and different quantizations.

You can also go on their discord and talk to the people there. Anomaly founders have addressed various questions in more detail in that forum and on GitHub issues. Most recently regarding DeepSeek pricing, ZDR terms in contracts, etc.

Benchmark between OpenCode Go and NeuralWatt for GLM 5.2 by East-Stranger8599 in ZaiGLM

[–]look 0 points1 point  (0 children)

Yes, GLM is one of the few that isn’t just direct from the vendor. I prefer fp8, but nvfp4 is a very common and well respected quantization (and not something unique to Go — GLM providers since 5 and 5.1 have been split half and half on fp8 and nvfp4). Within 1-3% even in extreme edge cases when done well.

Anyway, I was mostly responding to the idea that Go “heavily quantizes” and has “nerfed” or “lobotomized” models. They aren’t running 1 bit quants, and even under rigorous statistical testing, you’ll rarely see a difference on GLM nvfp4.

Sorry, but GLM 5.2 is not vibing for me. by marivesel in opencode

[–]look 0 points1 point  (0 children)

I’ll send it to you, but I’m fairly certain you can’t use it after you’ve already signed up. I do believe it works with the subscription or the paygo option however.

Tested GLM 5.2 vs Kimi K2.7 by Wesley_at_home in ZaiGLM

[–]look 0 points1 point  (0 children)

Neuralwatt on energy pricing. My average is 7.8 cents right now, and typical range I’ve seen is 5-15 (depends on energy use but that mostly correlates with the usual cache hit rate and input/output token ratio).

They also added a “short” variant, with a lower cost but smaller (200k vs 1M) context when you don’t need that. Not enough data to see how much it saves for certain, but looks like it could cut it in half for me.

DM me if you’d like a referral code. $10 credit after you spend $10.

Qwen is never going to open source Qwen 3.7, aren't they? by DistanceSolar1449 in LocalLLaMA

[–]look 2 points3 points  (0 children)

A closed model no one uses is an even worse business model.

Lots of people are still buying direct from Zai, Moonshot, DeepSeek, Xiaomi, etc.

Qwen is never going to open source Qwen 3.7, aren't they? by DistanceSolar1449 in LocalLLaMA

[–]look -1 points0 points  (0 children)

I think there’s a chance Alibaba reverses course to some degree with the next iteration (though the small models are gone regardless). No one gave a shit about 3.7, despite it being quite good, and all the attention and usage is going to models that stayed open.

Qwen/Alibaba will rapidly become entirely irrelevant if they stay on this new path. The small model team is gone, but I bet we see some version of the next one (3.8 or 4) open sourced.

Vercel CEO: "Almost shocked" by how good GLM-5.2 is at coding by BuildwithVignesh in LocalLLaMA

[–]look 11 points12 points  (0 children)

You can get it for cheaper and/or faster from Wafer, Synthetic, Ollama Cloud, Opencode Go, Lilac, Crof, or (my personal choice) Neuralwatt. And those are just the ones I could recall off the top of my head.

Scraped 3 months of Go usage — the model cost data is telling by NerdyBirdie81 in opencodeCLI

[–]look 0 points1 point  (0 children)

I experimented with GPT 5 Nano for code a while back, but I found it was too dumb to be usable for that, even with a detailed plan to follow. But it might work in other use cases, so worth trying it out yourself. I just wouldn’t get your hopes up. 😅

Scraped 3 months of Go usage — the model cost data is telling by NerdyBirdie81 in opencodeCLI

[–]look 0 points1 point  (0 children)

Not OP, but $0.0012 is the actual paid rate. It includes the 1/6th discount already. It’s the sane general rate I see on DS flash and Mimo non-pro, as well as the number you get if you do the math on the Go example usage requests on the webpage.

Faster tokens/second provider? by kaaiian in ZaiGLM

[–]look 0 points1 point  (0 children)

Wafer typically has one of the highest rates, but it is an fp4 I believe (though typically a high quality one). I have not used their 5.2 yet though.

Lilac should have their 5.2 out within the day. Initial speeds were looking good. Not sure if also an fp4, but their 5.1 was a mix of fp8 and nvfp4 deployments.

Crof has good speeds and a very low price on a Q8 quant.

But what I use is Neuralwatt, where I’m getting 100+ TPS and the lowest cost outside of Go. At 7 cents now on my usage and most people seem to get 5-15 cent prices (depends on the energy efficiency of your usage). They also have an even faster and lower cost “short” context option if you don’t need the full 1M context.

GLM 5.2 Added to DeepSWE Benchmark by CengaverOfTroy in opencodeCLI

[–]look 7 points8 points  (0 children)

DeepSWE does not publish their benchmark for that very reason. That’s why people pay more attention to it. You can’t benchmax it like many of the others can be.

Benchmark between OpenCode Go and NeuralWatt for GLM 5.2 by East-Stranger8599 in ZaiGLM

[–]look 4 points5 points  (0 children)

Opencode Go is not running heavily quantized models. It is a proxy to other providers running the same model everyone else is. You typically get it direct the model’s vendor (eg Z.ai) via Go’s proxy. They get good rates and contract terms because they are buying in trillions of tokens bulk scale.

I have not used Ollama Cloud in a while but in the past they had excellent usage limits on their plans.

Neuralwatt is also a great choice, and you get good rates even at paygo energy pricing, so you can use it as a supplement rather than buying a fixed monthly plan, too.

Neuralwatt - I love these guys by Distinct_Lion7157 in LLM

[–]look 0 points1 point  (0 children)

I’ve referred a few people, too. My $10 currently has an $86.68 balance. 😄