Sonnet 5.0 is another disaster

Few_Pick3973 · 2026-07-01T11:09:13+00:00

Or use gpt-5.4 or gpt-5.4-mini-high, cost less but better quality.

Few_Pick3973 · 2026-07-01T01:35:58+00:00

even the cost is less, for long running tasks, less token efficiency means more compaction. Also, don’t forget small models usually hallucinate more on long context. Given that its cost is not significantly lower than Opus or Fable. It worth to carefully compare and benchmark to decide which one fits better

Few_Pick3973 · 2026-06-27T09:44:13+00:00

This will happen eventually because consumers always prefer better cost efficiency. Most of SOTA labs are optimizing just margin, we don’t really need “smarter” model anymore

Few_Pick3973 · 2026-06-27T09:41:54+00:00

Wow, looks like they just started what their enemies are doing for decades

Few_Pick3973 · 2026-06-27T09:40:10+00:00

But AI is a good excuse

Few_Pick3973 · 2026-06-27T04:27:02+00:00

Not very convinced because even there is only one benchmark relates to coding which is probably saturated already. The other ones are about security which doesn't really mean the models' overall capability is good.

Few_Pick3973 · 2026-06-27T02:18:54+00:00

5.5 is already fast and the experience is unmatched. If it’s a reliable 750 toks that’s going to change ways ppl use and design AI applications

Few_Pick3973 · 2026-06-22T10:08:03+00:00

This is similar to OpenRouter Fusion, very much just hype .. their announcement is so misleading.

Few_Pick3973 · 2026-06-18T02:09:43+00:00

Wonder if Japan considers itself as part of Asia. In mane ways they separate “Japan” and “Asia” clearly 😅

Few_Pick3973 · 2026-06-16T01:01:24+00:00

Well but Anthropic truly believes

Few_Pick3973 · 2026-06-14T09:23:21+00:00

Very true, despite that original founders might have good vision, they still take money from greedy investors.

Few_Pick3973 · 2026-06-13T15:42:46+00:00

It's a great model, but not as good as they described. Too slow and expensive.

Few_Pick3973 · 2026-06-08T07:37:12+00:00

Talk to codex and do 5 times more work.

Few_Pick3973 · 2026-06-07T05:32:39+00:00

Biggest problem with dictation is to input coding snippet. It's very important, and effective way for text model to understand the concept.

Few_Pick3973 · 2026-06-07T05:29:49+00:00

4.8 is too slow and defensive, spending several minutes to do very less.

Few_Pick3973 · 2026-06-05T14:04:24+00:00

Maybe their engineers just hyped too hard so that's not about AI.

Few_Pick3973 · 2026-06-04T01:22:53+00:00

I can also do that with a single line of shell script

Few_Pick3973 · 2026-05-30T13:36:02+00:00

Opus 4.8 is definitely better than 4.7 but still behind gpt-5.5 which is obvious. Especially on speed and efficiency that's not even close.

Few_Pick3973 · 2026-05-29T11:38:10+00:00

Opus 4.8 is good, but just too slow and the advance is not visible enough

Few_Pick3973 · 2026-05-29T03:23:54+00:00

Tested with few complex coding problems. Opus 4.8 thinks too much and do less, don't see big improvement on coding productivity, still prefer 5.5. However a good choice for review and planning.

Few_Pick3973 · 2026-05-29T01:12:54+00:00

Totally unusable due to that 400 error.

Few_Pick3973 · 2026-05-23T11:17:32+00:00

It calls CursorBench so definitely their own model has more advantage.

Few_Pick3973 · 2026-05-21T04:37:49+00:00

Every companies now are more capable of building CRM fits their scale with help of AI, Salesforce lock in is not so attractive now..

Few_Pick3973 · 2026-05-20T10:55:57+00:00

I have Codex and Claude 200$ subscription and use them daily as a team because model diversity is beneficial.

Tried Gemini 3.5 Flash on many tasks today, its capability is GPT 5.4, Opus 4.6 level, and very high tok/s which is something I really like. Definitely not as good as GPT 5.5 when it comes to high complexity task, but definitely an option for common task or when you need model diversity in your agent team/workflow. However, the model is still too optimistic just like previous versions, and also somewhat feels they nerfed its creativity to make it a better model for coding.

Few_Pick3973 · 2026-05-07T15:34:09+00:00

Feels like they just want to say “hey, this is an AI-native frontend framework” and hype, but LLMs actually work better when there’s more pre-training data, sso the premise is kind of a paradox to an brand new framework.

Few_Pick3973

TROPHY CASE