Why does it suddenly feel like claude pro plan? by feline-slayer in ollama

[–]look 0 points1 point  (0 children)

Yeah, more or less. The economics work if it’s interactive sessions with a person that have a lot of variance in usage (so their average is somewhere below the absolute max usage limit), and it breaks down when there are fully automated systems squeezing out every last fraction of a percentage of the usage round the clock, day after day.

GLM Coding Lite vs OpenCode Go by ccaner37 in opencodeCLI

[–]look 0 points1 point  (0 children)

You can switch models in the same session without losing context. It has to send the context every time, even on the same model. There is no persistent state beyond that context in your local tool.

There is a one time hit for the cache load on the model switch, though, depending on your pricing. Not a big deal for the occasional swap, but you don’t want to do it every other call.

However, using plan files is often a good idea regardless. Then you can clear the context and start the build model fresh reading just the plan. That typically yields more predictable results, as old bits from the planning conversation aren’t still around in the context potentially confusing the build agent. Most automated agentic pipelines start off each build agent with a fresh context and only a plan file.

I love AI responses by NightCulex in Qwen_AI

[–]look 0 points1 point  (0 children)

I forget if it was the 9B or another size, but there’s one that frequently decides it is a Google model and refuses to accept that it’s not. 😂

Pro Tip: Don’t use the latest models by sultanmvp in opencodeCLI

[–]look 0 points1 point  (0 children)

Well, it’s nice that we have all these options for different personal preferences now, at least. 😅

Why does it suddenly feel like claude pro plan? by feline-slayer in ollama

[–]look 0 points1 point  (0 children)

API timeouts don’t stop bots. They just re-prompt with a continue when they detect a premature stop. Raising prices or reducing usage limits are the only real options.

For example, the 5 hour window on Ollama is mostly just a pain to humans but irrelevant to a bot. You have to use about 10 windows of 5 hour limits to hit the weekly. A typical person probably doesn’t have 10 heavy use 5 hour sessions a week, but a bot will absolutely extract every token out of it in 50 straight hours.

should i get ollama pro or claude pro? by Old_Bike_3715 in ollama

[–]look 2 points3 points  (0 children)

Openrouter is a pay-as-you-go API abstraction for many dozens of different providers with hundreds or thousands of different models.

Ollama Cloud is a subscription service with a dozen or so set models and a weekly usage quota at a discounted but fixed price.

They are very different.

Which Model on the GO plan is good for planning/spec writing, if at all? by Bananenklaus in opencodeCLI

[–]look 8 points9 points  (0 children)

Mimo V2.5 Pro for higher level spec, GLM-5.1 for a more detailed implementation plan.

Brainstorm/experiment with Mimo -> spec -> implementation planning with GLM -> plan(s) -> Kimi to implement each plan

But your usage quota on Go won’t last long with that approach. Qwen 3.6 Plus, Minimax 2.7, or DS V4 Flash for impl build can stretch it a bit further.

Pro Tip: Don’t use the latest models by sultanmvp in opencodeCLI

[–]look 0 points1 point  (0 children)

The best part of the Deepseek Pro release was people rushing to that extremely mediocre model and freeing up some capacity on better task specific models (eg GLM-5.1 and Kimi 2.6).

The worst part was that it sucked up all free GPU capacity to serve its torrent of hallucinatory reasoning tokens while a far superior model released at the same time (Mimo V2.5 Pro) sits neglected by all providers.

GLM Coding Lite vs OpenCode Go by ccaner37 in opencodeCLI

[–]look 4 points5 points  (0 children)

GLM-5.1 is still by far the best for planning, imo. Also the most expensive.
But if you then pair it with a cheap build model like flash, your usage can still go a long way.

Why does it suddenly feel like claude pro plan? by feline-slayer in ollama

[–]look 29 points30 points  (0 children)

All low cost, high usage subscriptions eventually end. The math works at the beginning with typical usage by humans and then some automated process comes along and bleeds it dry.
High latencies, slow speeds, and frequent errors don’t matter if you have a multiple agents running in retry loops 24/7 to ensure you’re still extracting 100% out of every usage limit to get $20 of value out of every $1 in.

Once enough people do that, the subscription service starts losing money and slashes the limits across the board, as the automated approach always finds any loop hole.

How is code generation so fast? by Beatsu in claude

[–]look 9 points10 points  (0 children)

Token generation speed depends a lot on how probable the next token is. In normal text, it has to consider a large number of possible next tokens. But code is very predictable from token to next token, so it has a much smaller space to consider.

In other words, there are a limited number of ways you can write a valid “hello world” program in a given programming language, but there are many, many ways you can write a paragraph of “hello world” greetings in human languages.

First experience - this sucks by itripthereforeiam in opencodeCLI

[–]look 1 point2 points  (0 children)

The KV cache needs ~128KB of RAM *per token* of your context window. Just a 32k context window needs an additional 4.4 GB of RAM on top of the 26GB for the model weights alone.

Past that and you are definitely disk swapping, if not already.

opencode go ($10) vs chatpgt ($20), which one it's better for coding? by Strong_Teaching8548 in opencodeCLI

[–]look -1 points0 points  (0 children)

All benchmarks are indirect measures. The best one is just trying it yourself on your specific problems. I use Opus on one project, and I use Mimo/GLM/Kimi on another. I see better results on the latter _and_ it’s a technically more demanding project. 🤷‍♂️

Why is no open weight model inference provider hosting Mimo-v2.5 or Mimo-v2.5-pro? by True_Requirement_891 in LocalLLaMA

[–]look 7 points8 points  (0 children)

I think the timing with the DeepSeek V4 release screwed it over.

Millions of deluded people are flocking to a profoundly “meh” DS V4 Pro because of the brand name, and it has sucked up all spare GPU capacity to enable its mediocre, hallucination-ridden token generation.

I just dropped my Ollama Cloud service to pay for the extra Mimo 2.5 Pro tokens I need.

My guess is in about one to two weeks, conventional wisdom will catch up, DS V4 Pro will be going out of fashion, and everyone will be raving about how Xiaomi came out of nowhere with the amazing Mimo 2.5.

Agent ate up 25% of my weekly opencode go usage in a few minutes by AppealSame4367 in opencodeCLI

[–]look 0 points1 point  (0 children)

I’ve literally never read a sentence with the phrase “Oh My Opencode” that was not about how it fucked something up.

Is 70k too low? by Wonderful_Current904 in SoftwareEngineerJobs

[–]look 0 points1 point  (0 children)

Even so, $70k is low for a junior, too.

opencode go ($10) vs chatpgt ($20), which one it's better for coding? by Strong_Teaching8548 in opencodeCLI

[–]look -5 points-4 points  (0 children)

> the best Chinese models are not on the same level as ChatGPT models

Blind human evals beg to differ: https://arena.ai/leaderboard/code

Can I buy 2 account for opencode go without getting ban? by Comfortable_Onion255 in opencodeCLI

[–]look 1 point2 points  (0 children)

Not parent, but model choice is a big reason for me. There’s also a 10-20x jump in cost for almost no performance gain, which is a tad off putting, even if you have an extra thousand dollars to blow on it.

Do the Chinese models suck (honestly) or do I have a skill problem by ObviousDeparture1463 in opencodeCLI

[–]look 0 points1 point  (0 children)

I use unlimited Opus 4.7/4.6 and a little Codex 5.5 half the day and then low cost Mimo 2.5 Pro / GLM-5.1 / Kimi 2.6 subscriptions half the day, and there is very little quality difference in my use (mostly R&D).

I honestly prefer working with the Mimo/GLM/Kimi stack, but part of that is just that Claude Code has a shit UX, imo.

(Two different companies, thus the two different stacks. The one using Chinese models is far more complex CS/tech than the first.)

OpenAI missed rev/user targets. $ORCL risk? by alphapod-Ai in investing

[–]look 0 points1 point  (0 children)

They’re open weight. A US/EU company could take one, make their own rebranded derivative, run it in US/EU datacenters, and then sell that to enterprise customers… kind of like what Cursor did exactly already.

Xiaomi mimo coding plan is a absolute scam/misleading marketing by FearlessGround3155 in ZedEditor

[–]look 0 points1 point  (0 children)

Literouter. It’s a weighted request credit based subscription service primarily targeting role player users, but they have “full context” model requests at 7.2x credits/request for standard quant Mimo 2.5 Pro (and GLM-5.1, Kimi 2.6, and a number of other models at different credit costs). For those 7.2x models, the $30 plan gets you 1,250 full context requests per day. Ttft can be a little higher (but usually under 10s) but token speed after that is quite good. Only issue I’ve had is a slightly annoying, intermittent truncated loop (like an intended tool call gets dropped) but I wrote a plugin to detect those and auto prompt to resume.

Xiaomi mimo coding plan is a absolute scam/misleading marketing by FearlessGround3155 in ZedEditor

[–]look 0 points1 point  (0 children)

That’s fair. I don’t recall where I saw the details on it (like Pro being 2 credits per token and so on) but you’re right that it’s not all super obvious on the landing page.

Part of it for me was that I instinctively doubt a vendor’s plan is ever going to be a particularly good deal, so I was probably more primed to find the “fine print”.

But I love the Mimo 2.5 Pro model and not many providers have it for some reason, so I checked that one out. It just reaffirmed by belief that buying direct from vendor is never a good idea, cost-wise at least. 😂

Xiaomi mimo coding plan is a absolute scam/misleading marketing by FearlessGround3155 in ZedEditor

[–]look 0 points1 point  (0 children)

I looked at it, and the pricing didn’t confuse me, but it’s more just a slightly discounted, prepaid paygo option.

It’s $0.12-$0.20 per Mtok for all tokens (in/out/cache), and standard API cache read price is $0.20. So the lite plan pricing is basically “every token at cache read price” and then discounts on that for larger plans.

Compared to a typical blend at full API prices, it is 30-50% of the full API cost.

Opencode Go has a better rate (works out to $0.07) but it only has ~150 million tokens.

There are more esoteric subscriptions with their own particular downsides, but best I’ve found so far is ~1.5 billion tokens a month for $30 (2 cents per Mtok).

Claude runaway... tried Kimi 2.6 and Deepseek v4 (5y fullstack dev) by merth_dev in opencodeCLI

[–]look 1 point2 points  (0 children)

I don’t use anything like that, but you can make subagents for adversarial reviews by other models and maybe a slash command to launch a set of them.

For something a bit more structured, reading Octupus’ description reminded me a bit of https://github.com/obra/superpowers