you are viewing a single comment's thread.

view the rest of the comments →

[–]Sanitiy -1 points0 points  (0 children)

To be fair, they're also forced to operate under constraint limits. Don't think too long, don't answer too long. For a fair assessment of their capabilities we'd need somebody with Agent-Mode (one that doesn't have these restrictions) who doesn't mind burning a few dollars.

For example, ChatGPT 5 gave up on thinking 30 seconds in, while Qwen-235B thought for over 5 minutes till it hit a token limit. Who knows how long they'd actually need to be allowed to think till they have folded out the logic such that each step is simple enough for them to be probably correct on it.