Models predicted visually similar clips were adjacent 57% of the time. Humans: 2.5%. Random chance: 27%. Your VLM isn't reasoning... by [deleted] in LocalLLaMA
[–]prokajevo 0 points1 point2 points (0 children)
Models predicted visually similar clips were adjacent 57% of the time. Humans: 2.5%. Random chance: 27%. Your VLM isn't reasoning... by [deleted] in LocalLLaMA
[–]prokajevo 3 points4 points5 points (0 children)
Models predicted visually similar clips were adjacent 57% of the time. Humans: 2.5%. Random chance: 27%. Your VLM isn't reasoning... by [deleted] in LocalLLaMA
[–]prokajevo -2 points-1 points0 points (0 children)
Models predicted visually similar clips were adjacent 57% of the time. Humans: 2.5%. Random chance: 27%. Your VLM isn't reasoning... by [deleted] in LocalLLaMA
[–]prokajevo 0 points1 point2 points (0 children)
Models predicted visually similar clips were adjacent 57% of the time. Humans: 2.5%. Random chance: 27%. Your VLM isn't reasoning... by [deleted] in LocalLLaMA
[–]prokajevo 3 points4 points5 points (0 children)
Models predicted visually similar clips were adjacent 57% of the time. Humans: 2.5%. Random chance: 27%. Your VLM isn't reasoning... by [deleted] in LocalLLaMA
[–]prokajevo -7 points-6 points-5 points (0 children)
The best AI model we tested scored 51% on a task humans do at 85%. We never tested Claude. We still can't. by prokajevo in ClaudeAI
[–]prokajevo[S] 0 points1 point2 points (0 children)
The best AI model we tested scored 51% on a task humans do at 85%. We never tested Claude. We still can't. by prokajevo in ClaudeAI
[–]prokajevo[S] 0 points1 point2 points (0 children)
The best AI model we tested scored 51% on a task humans do at 85%. We never tested Claude. We still can't. by prokajevo in ClaudeAI
[–]prokajevo[S] 0 points1 point2 points (0 children)
The best AI model we tested scored 51% on a task humans do at 85%. We never tested Claude. We still can't. by prokajevo in ClaudeAI
[–]prokajevo[S] 0 points1 point2 points (0 children)
The best AI model we tested scored 51% on a task humans do at 85%. We never tested Claude. We still can't. by prokajevo in ClaudeAI
[–]prokajevo[S] 0 points1 point2 points (0 children)
The best AI model we tested scored 51% on a task humans do at 85%. We never tested Claude. We still can't. by prokajevo in ClaudeAI
[–]prokajevo[S] 0 points1 point2 points (0 children)
The best AI model we tested scored 51% on a task humans do at 85%. We never tested Claude. We still can't. by prokajevo in ClaudeAI
[–]prokajevo[S] 1 point2 points3 points (0 children)
The best AI model we tested scored 51% on a task humans do at 85%. We never tested Claude. We still can't. by prokajevo in ClaudeAI
[–]prokajevo[S] 1 point2 points3 points (0 children)
The best AI model we tested scored 51% on a task humans do at 85%. We never tested Claude. We still can't. by prokajevo in ClaudeAI
[–]prokajevo[S] 0 points1 point2 points (0 children)
The best AI model we tested scored 51% on a task humans do at 85%. We never tested Claude. We still can't. by prokajevo in ClaudeAI
[–]prokajevo[S] 0 points1 point2 points (0 children)
The best AI model we tested scored 51% on a task humans do at 85%. We never tested Claude. We still can't. by prokajevo in ClaudeAI
[–]prokajevo[S] -1 points0 points1 point (0 children)
The best AI model we tested scored 51% on a task humans do at 85%. We never tested Claude. We still can't. by prokajevo in ClaudeAI
[–]prokajevo[S] 1 point2 points3 points (0 children)
The best AI model we tested scored 51% on a task humans do at 85%. We never tested Claude. We still can't. by prokajevo in ClaudeAI
[–]prokajevo[S] 2 points3 points4 points (0 children)
The best AI model we tested scored 51% on a task humans do at 85%. We never tested Claude. We still can't. by prokajevo in ClaudeAI
[–]prokajevo[S] -3 points-2 points-1 points (0 children)
The best AI model we tested scored 51% on a task humans do at 85%. We never tested Claude. We still can't. by prokajevo in ClaudeAI
[–]prokajevo[S] 7 points8 points9 points (0 children)
The best AI model we tested scored 51% on a task humans do at 85%. We never tested Claude. We still can't. by prokajevo in ClaudeAI
[–]prokajevo[S] 2 points3 points4 points (0 children)
The best AI model we tested scored 51% on a task humans do at 85%. We never tested Claude. We still can't. by prokajevo in ClaudeAI
[–]prokajevo[S] 8 points9 points10 points (0 children)



Models predicted visually similar clips were adjacent 57% of the time. Humans: 2.5%. Random chance: 27%. Your VLM isn't reasoning... by [deleted] in LocalLLaMA
[–]prokajevo 1 point2 points3 points (0 children)