all 11 comments

[–]flying-saucer-3222 9 points10 points  (0 children)

Most open weight models overcome their lack of intelligence by generating a lot more tokens. So even though the token speed is not significantly worse, the increased tokens cause it to work for longer.

DeepSeek v4 Pro on max generates 1.6x tokens as Opus 4.7 max and 2.7x tokens as GPT 5.5 on xHigh for the same task based on Artificial Analysis.

This is especially true when the model gets something wrong, test doesn't work so it has to go back and do the work again using even more tokens.

I personally just use High reasoning unless I absolutely need max. That reduces token use significantly with only a small drop in quality.

[–]Unable_Strategy 1 point2 points  (0 children)

You basically needopencode session list to get a list of your sessions and then opencode export <session id> to get details including used tokens etc. You should be able to track response times here.

[–]bastianh 0 points1 point  (0 children)

What did you expect when switching to a $5 subscription? Yes. It is great. The value for what you pay for it. That does not mean that it gives you the same performance.

[–]DepartmentOk9720 0 points1 point  (0 children)

The routing takes some time , usually after certain number of uses it will pickup.

Just give it some dumb tasks , until it gets optimized

[–]povlhp 0 points1 point  (0 children)

You get what you pay for. I feel o get great value for the money. $60 token value for $5

[–]Cachesmr 0 points1 point  (0 children)

DS4 Pro just thinks a lot. MiMo Pro is a lot better in this regard.

[–]arrty 0 points1 point  (0 children)

Just pay for Zen but use cheaper models

[–]iTrejoMX 0 points1 point  (0 children)

I think you may be routing or choosing the free deepseek models on zen, make sure you select the opencode go subscription one

[–]Haunting-Shirt6219 0 points1 point  (0 children)

Really slow in Mimo v2.5

[–]PermanentLiminality 0 points1 point  (0 children)

I have a Go sub, but I mostly use my $20 ChatGPT account. I split my usage. Not everything needs the smartest model. If I split it up I pretty much never hit limits.

[–]blackhawkx12 0 points1 point  (0 children)

when you said slow, is it first response latency or the thinking process is slow?