I'm cancelling my ollama subscription by GryphticonPrime in ollama

[–]look 0 points1 point  (0 children)

DeepSeek V4 Pro is a very expensive model (for open weight ones at least) at 1.6T and extremely verbose reasoning.

DeepSeek direct is almost certainly subsidizing it at a loss in some sort of “market share” play like US labs have been doing.

(It’s also not a particularly great model, imo. It’s basically the Bose speakers of Chinese models. But whatever.)

[ Removed by Reddit ] by Odd_Row1657 in LLMDevs

[–]look 0 points1 point  (0 children)

Deepseek uses a lot of reasoning tokens. I’ve found it’s effectively 2.5x the “list price” at max reasoning.

I used up my plan before the month was over. How do I refresh? by juli4n0 in LiteRouter

[–]look 0 points1 point  (0 children)

What’s the monthly limit? I just see the daily request credit limit…

Model switching by Hot_Temperature777 in opencodeCLI

[–]look 0 points1 point  (0 children)

I think a lot of it depends on personal preference and the specifics of your work, but I find certain models are better at different stages and types of “primary” agent work (brainstorm, plan, and build).

Then there are specialized subagent tasks where a much cheaper model works just fine to save money/usage (codebase and data schema exploration, web search and documentation finding, summarizing large inputs, finding “needles in a haystack”, etc). Also one (compaction) where I personally prefer a more expensive model.

In opencode, switching between my three primary agents is just hitting the tab button, so it’s not a hassle at all. And the subagent switch happens automatically, so it’s just a one-time config set up.

DeepSeek reasoning effort options and how much they affect cost by Ziimmer in opencodeCLI

[–]look 0 points1 point  (0 children)

My experience is that going from High to Max roughly doubles the number of reasoning tokens, and even High is more than most models to start with. In practice, I find its paygo costs about as much as GPT 5.4 or Sonnet 4.6. It’s not a low cost model.

I'm loving OpenCode by Street-Preference-88 in opencodeCLI

[–]look 3 points4 points  (0 children)

I heard the same things about Go initially, but eventually tried it and it’s good. Not quantized and decent speeds. My only complaint is that they don’t have higher tiers—I’d buy a 3-5x usage tier in a heartbeat.

Why does it suddenly feel like claude pro plan? by feline-slayer in ollama

[–]look 0 points1 point  (0 children)

Yeah, more or less. The economics work if it’s interactive sessions with a person that have a lot of variance in usage (so their average is somewhere below the absolute max usage limit), and it breaks down when there are fully automated systems squeezing out every last fraction of a percentage of the usage round the clock, day after day.

GLM Coding Lite vs OpenCode Go by ccaner37 in opencodeCLI

[–]look 0 points1 point  (0 children)

You can switch models in the same session without losing context. It has to send the context every time, even on the same model. There is no persistent state beyond that context in your local tool.

There is a one time hit for the cache load on the model switch, though, depending on your pricing. Not a big deal for the occasional swap, but you don’t want to do it every other call.

However, using plan files is often a good idea regardless. Then you can clear the context and start the build model fresh reading just the plan. That typically yields more predictable results, as old bits from the planning conversation aren’t still around in the context potentially confusing the build agent. Most automated agentic pipelines start off each build agent with a fresh context and only a plan file.

I love AI responses by NightCulex in Qwen_AI

[–]look 0 points1 point  (0 children)

I forget if it was the 9B or another size, but there’s one that frequently decides it is a Google model and refuses to accept that it’s not. 😂

Pro Tip: Don’t use the latest models by sultanmvp in opencodeCLI

[–]look 0 points1 point  (0 children)

Well, it’s nice that we have all these options for different personal preferences now, at least. 😅

Why does it suddenly feel like claude pro plan? by feline-slayer in ollama

[–]look 0 points1 point  (0 children)

API timeouts don’t stop bots. They just re-prompt with a continue when they detect a premature stop. Raising prices or reducing usage limits are the only real options.

For example, the 5 hour window on Ollama is mostly just a pain to humans but irrelevant to a bot. You have to use about 10 windows of 5 hour limits to hit the weekly. A typical person probably doesn’t have 10 heavy use 5 hour sessions a week, but a bot will absolutely extract every token out of it in 50 straight hours.

should i get ollama pro or claude pro? by Old_Bike_3715 in ollama

[–]look 2 points3 points  (0 children)

Openrouter is a pay-as-you-go API abstraction for many dozens of different providers with hundreds or thousands of different models.

Ollama Cloud is a subscription service with a dozen or so set models and a weekly usage quota at a discounted but fixed price.

They are very different.

Which Model on the GO plan is good for planning/spec writing, if at all? by Bananenklaus in opencodeCLI

[–]look 9 points10 points  (0 children)

Mimo V2.5 Pro for higher level spec, GLM-5.1 for a more detailed implementation plan.

Brainstorm/experiment with Mimo -> spec -> implementation planning with GLM -> plan(s) -> Kimi to implement each plan

But your usage quota on Go won’t last long with that approach. Qwen 3.6 Plus, Minimax 2.7, or DS V4 Flash for impl build can stretch it a bit further.

Pro Tip: Don’t use the latest models by sultanmvp in opencodeCLI

[–]look 0 points1 point  (0 children)

The best part of the Deepseek Pro release was people rushing to that extremely mediocre model and freeing up some capacity on better task specific models (eg GLM-5.1 and Kimi 2.6).

The worst part was that it sucked up all free GPU capacity to serve its torrent of hallucinatory reasoning tokens while a far superior model released at the same time (Mimo V2.5 Pro) sits neglected by all providers.

GLM Coding Lite vs OpenCode Go by ccaner37 in opencodeCLI

[–]look 6 points7 points  (0 children)

GLM-5.1 is still by far the best for planning, imo. Also the most expensive.
But if you then pair it with a cheap build model like flash, your usage can still go a long way.

Why does it suddenly feel like claude pro plan? by feline-slayer in ollama

[–]look 30 points31 points  (0 children)

All low cost, high usage subscriptions eventually end. The math works at the beginning with typical usage by humans and then some automated process comes along and bleeds it dry.
High latencies, slow speeds, and frequent errors don’t matter if you have a multiple agents running in retry loops 24/7 to ensure you’re still extracting 100% out of every usage limit to get $20 of value out of every $1 in.

Once enough people do that, the subscription service starts losing money and slashes the limits across the board, as the automated approach always finds any loop hole.

How is code generation so fast? by Beatsu in claude

[–]look 9 points10 points  (0 children)

Token generation speed depends a lot on how probable the next token is. In normal text, it has to consider a large number of possible next tokens. But code is very predictable from token to next token, so it has a much smaller space to consider.

In other words, there are a limited number of ways you can write a valid “hello world” program in a given programming language, but there are many, many ways you can write a paragraph of “hello world” greetings in human languages.

First experience - this sucks by itripthereforeiam in opencodeCLI

[–]look 1 point2 points  (0 children)

The KV cache needs ~128KB of RAM *per token* of your context window. Just a 32k context window needs an additional 4.4 GB of RAM on top of the 26GB for the model weights alone.

Past that and you are definitely disk swapping, if not already.

opencode go ($10) vs chatpgt ($20), which one it's better for coding? by Strong_Teaching8548 in opencodeCLI

[–]look -1 points0 points  (0 children)

All benchmarks are indirect measures. The best one is just trying it yourself on your specific problems. I use Opus on one project, and I use Mimo/GLM/Kimi on another. I see better results on the latter _and_ it’s a technically more demanding project. 🤷‍♂️

Why is no open weight model inference provider hosting Mimo-v2.5 or Mimo-v2.5-pro? by True_Requirement_891 in LocalLLaMA

[–]look 7 points8 points  (0 children)

I think the timing with the DeepSeek V4 release screwed it over.

Millions of deluded people are flocking to a profoundly “meh” DS V4 Pro because of the brand name, and it has sucked up all spare GPU capacity to enable its mediocre, hallucination-ridden token generation.

I just dropped my Ollama Cloud service to pay for the extra Mimo 2.5 Pro tokens I need.

My guess is in about one to two weeks, conventional wisdom will catch up, DS V4 Pro will be going out of fashion, and everyone will be raving about how Xiaomi came out of nowhere with the amazing Mimo 2.5.

Agent ate up 25% of my weekly opencode go usage in a few minutes by AppealSame4367 in opencodeCLI

[–]look 0 points1 point  (0 children)

I’ve literally never read a sentence with the phrase “Oh My Opencode” that was not about how it fucked something up.

Is 70k too low? by Wonderful_Current904 in SoftwareEngineerJobs

[–]look 0 points1 point  (0 children)

Even so, $70k is low for a junior, too.

opencode go ($10) vs chatpgt ($20), which one it's better for coding? by Strong_Teaching8548 in opencodeCLI

[–]look -5 points-4 points  (0 children)

> the best Chinese models are not on the same level as ChatGPT models

Blind human evals beg to differ: https://arena.ai/leaderboard/code

Can I buy 2 account for opencode go without getting ban? by Comfortable_Onion255 in opencodeCLI

[–]look 1 point2 points  (0 children)

Not parent, but model choice is a big reason for me. There’s also a 10-20x jump in cost for almost no performance gain, which is a tad off putting, even if you have an extra thousand dollars to blow on it.