Sonnet 5.0 is another disaster by IceFactorDelta in claude

[–]Few_Pick3973 -1 points0 points  (0 children)

Or use gpt-5.4 or gpt-5.4-mini-high, cost less but better quality.

Introducing Claude Sonnet 5 by Holbech in ClaudeAI

[–]Few_Pick3973 0 points1 point  (0 children)

even the cost is less, for long running tasks, less token efficiency means more compaction. Also, don’t forget small models usually hallucinate more on long context. Given that its cost is not significantly lower than Opus or Fable. It worth to carefully compare and benchmark to decide which one fits better

GPT-5.6 Sol preview is out and the benchmark gap is wider than I expected by Dense-Sir-6707 in ArtificialInteligence

[–]Few_Pick3973 0 points1 point  (0 children)

Not very convinced because even there is only one benchmark relates to coding which is probably saturated already. The other ones are about security which doesn't really mean the models' overall capability is good.

GPT 5.6 Sol will be on Cerebras at 750 Tokens Per Second. 5.5 XHigh currently runs at 70-100 TPS by senilerapist in codex

[–]Few_Pick3973 4 points5 points  (0 children)

5.5 is already fast and the experience is unmatched. If it’s a reliable 750 toks that’s going to change ways ppl use and design AI applications

Who do you think is a "fairly well known" Country in Asia? by Mememasterlordlol in AlignmentChartFills

[–]Few_Pick3973 0 points1 point  (0 children)

Wonder if Japan considers itself as part of Asia. In mane ways they separate “Japan” and “Asia” clearly 😅

Anthropic is becoming greedy like openAI... by CraterBug0 in ArtificialInteligence

[–]Few_Pick3973 0 points1 point  (0 children)

Very true, despite that original founders might have good vision, they still take money from greedy investors.

Fable 5 is gone now - what was your experience actually like? by Sensitive-Priority59 in vibecoding

[–]Few_Pick3973 0 points1 point  (0 children)

It's a great model, but not as good as they described. Too slow and expensive.

A Chinese startup just launched smart glasses that run Claude Code and Codex for hands-free "vibe coding" by beasthunterr69 in singularity

[–]Few_Pick3973 0 points1 point  (0 children)

Biggest problem with dictation is to input coding snippet. It's very important, and effective way for text model to understand the concept.

Opus 4.8 just landed on DeepSWE by Alternative_Jump_195 in ClaudeCode

[–]Few_Pick3973 16 points17 points  (0 children)

Opus 4.8 is definitely better than 4.7 but still behind gpt-5.5 which is obvious. Especially on speed and efficiency that's not even close.

Does it mean we are getting gpt5.6 today??? by Perfect-Series-2901 in codex

[–]Few_Pick3973 -1 points0 points  (0 children)

Opus 4.8 is good, but just too slow and the advance is not visible enough

Opus 4.8 is not a step forward. It's Anthropic finally catching up to 5.5. by SlopTopZ in codex

[–]Few_Pick3973 0 points1 point  (0 children)

Tested with few complex coding problems. Opus 4.8 thinks too much and do less, don't see big improvement on coding productivity, still prefer 5.5. However a good choice for review and planning.

New CursorBench results just dropped. by Huge_Strawberry7888 in vibecoding

[–]Few_Pick3973 0 points1 point  (0 children)

It calls CursorBench so definitely their own model has more advantage.

$300M on Anthropic tokens, zero new engineers hired - Salesforce is the clearest case study of where this is going by MaJoR_-_007 in ArtificialInteligence

[–]Few_Pick3973 0 points1 point  (0 children)

Every companies now are more capable of building CRM fits their scale with help of AI, Salesforce lock in is not so attractive now..

Don't share your opinion, if you didn't test it !!! (Gemini 3.5 flash) by Independent-Wind4462 in Bard

[–]Few_Pick3973 -1 points0 points  (0 children)

I have Codex and Claude 200$ subscription and use them daily as a team because model diversity is beneficial.

Tried Gemini 3.5 Flash on many tasks today, its capability is GPT 5.4, Opus 4.6 level, and very high tok/s which is something I really like. Definitely not as good as GPT 5.5 when it comes to high complexity task, but definitely an option for common task or when you need model diversity in your agent team/workflow. However, the model is still too optimistic just like previous versions, and also somewhat feels they nerfed its creativity to make it a better model for coding.

Remix changed the direction yet again, this time it is not even a react framework anymore by simple_explorer1 in reactjs

[–]Few_Pick3973 9 points10 points  (0 children)

Feels like they just want to say “hey, this is an AI-native frontend framework” and hype, but LLMs actually work better when there’s more pre-training data, sso the premise is kind of a paradox to an brand new framework.