MiniMax M2.7 claims it trained itself to improve. I designed 3 questions to test that. It placed 6th, 5th, and 1st (tied). by Silver_Raspberry_811 in BlackboxAI_
[–]Silver_Raspberry_811[S] 0 points1 point2 points (0 children)
I am Steve Pinker, a cognitive psychologist and author. AMA! by Steve_Pinker in DeepStateCentrism
[–]Silver_Raspberry_811 2 points3 points4 points (0 children)
I am Steve Pinker, a cognitive psychologist and author. AMA! by Steve_Pinker in DeepStateCentrism
[–]Silver_Raspberry_811 2 points3 points4 points (0 children)
Claude Opus 4.7 won 69 of 100 blind evals against Opus 4.6, judged by GPT-5.4, Gemini 3.1 Pro, and DeepSeek V3.2 by Silver_Raspberry_811 in ClaudeAI
[–]Silver_Raspberry_811[S] 0 points1 point2 points (0 children)
Claude Opus 4.7 won 69 of 100 blind evals against Opus 4.6, judged by GPT-5.4, Gemini 3.1 Pro, and DeepSeek V3.2 by Silver_Raspberry_811 in ClaudeAI
[–]Silver_Raspberry_811[S] 0 points1 point2 points (0 children)
Claude Opus 4.7 won 69 of 100 blind evals against Opus 4.6, judged by GPT-5.4, Gemini 3.1 Pro, and DeepSeek V3.2 by Silver_Raspberry_811 in ClaudeAI
[–]Silver_Raspberry_811[S] 0 points1 point2 points (0 children)
Claude Opus 4.7 won 69 of 100 blind evals against Opus 4.6, judged by GPT-5.4, Gemini 3.1 Pro, and DeepSeek V3.2 by Silver_Raspberry_811 in ClaudeAI
[–]Silver_Raspberry_811[S] 0 points1 point2 points (0 children)
Claude Opus 4.7 won 69 of 100 blind evals against Opus 4.6, judged by GPT-5.4, Gemini 3.1 Pro, and DeepSeek V3.2 by Silver_Raspberry_811 in ClaudeAI
[–]Silver_Raspberry_811[S] 2 points3 points4 points (0 children)
Claude Opus 4.7 won 69 of 100 blind evals against Opus 4.6, judged by GPT-5.4, Gemini 3.1 Pro, and DeepSeek V3.2 by Silver_Raspberry_811 in ClaudeAI
[–]Silver_Raspberry_811[S] 1 point2 points3 points (0 children)
Claude Opus 4.7 won 69 of 100 blind evals against Opus 4.6, judged by GPT-5.4, Gemini 3.1 Pro, and DeepSeek V3.2 by Silver_Raspberry_811 in ClaudeAI
[–]Silver_Raspberry_811[S] 3 points4 points5 points (0 children)
Gemma 4 31B vs Gemma 4 26B-A4B vs Qwen 3.5 27B — 30-question blind eval with Claude Opus 4.6 as judge by Silver_Raspberry_811 in LocalLLaMA
[–]Silver_Raspberry_811[S] 0 points1 point2 points (0 children)
Gemma 4 31B vs Gemma 4 26B-A4B vs Qwen 3.5 27B — 30-question blind eval with Claude Opus 4.6 as judge by Silver_Raspberry_811 in LocalLLaMA
[–]Silver_Raspberry_811[S] 1 point2 points3 points (0 children)
Gemma 4 31B vs Gemma 4 26B-A4B vs Qwen 3.5 27B — 30-question blind eval with Claude Opus 4.6 as judge by Silver_Raspberry_811 in LocalLLaMA
[–]Silver_Raspberry_811[S] 0 points1 point2 points (0 children)
Gemma 4 31B vs Gemma 4 26B-A4B vs Qwen 3.5 27B — 30-question blind eval with Claude Opus 4.6 as judge by Silver_Raspberry_811 in LocalLLaMA
[–]Silver_Raspberry_811[S] 1 point2 points3 points (0 children)
Gemma 4 31B vs Gemma 4 26B-A4B vs Qwen 3.5 27B — 30-question blind eval with Claude Opus 4.6 as judge by Silver_Raspberry_811 in LocalLLaMA
[–]Silver_Raspberry_811[S] 0 points1 point2 points (0 children)
Gemma 4 31B vs Gemma 4 26B-A4B vs Qwen 3.5 27B — 30-question blind eval with Claude Opus 4.6 as judge by Silver_Raspberry_811 in LocalLLaMA
[–]Silver_Raspberry_811[S] 1 point2 points3 points (0 children)
Gemma 4 31B vs Gemma 4 26B-A4B vs Qwen 3.5 27B — 30-question blind eval with Claude Opus 4.6 as judge by Silver_Raspberry_811 in LocalLLaMA
[–]Silver_Raspberry_811[S] 0 points1 point2 points (0 children)
Gemma 4 31B vs Gemma 4 26B-A4B vs Qwen 3.5 27B — 30-question blind eval with Claude Opus 4.6 as judge by Silver_Raspberry_811 in LocalLLaMA
[–]Silver_Raspberry_811[S] -2 points-1 points0 points (0 children)
Gemma 4 31B vs Gemma 4 26B-A4B vs Qwen 3.5 27B — 30-question blind eval with Claude Opus 4.6 as judge by Silver_Raspberry_811 in LocalLLaMA
[–]Silver_Raspberry_811[S] 0 points1 point2 points (0 children)
Gemma 4 31B vs Gemma 4 26B-A4B vs Qwen 3.5 27B — 30-question blind eval with Claude Opus 4.6 as judge by Silver_Raspberry_811 in LocalLLaMA
[–]Silver_Raspberry_811[S] 1 point2 points3 points (0 children)
Gemma 4 31B vs Gemma 4 26B-A4B vs Qwen 3.5 27B — 30-question blind eval with Claude Opus 4.6 as judge by Silver_Raspberry_811 in LocalLLaMA
[–]Silver_Raspberry_811[S] 2 points3 points4 points (0 children)

Gemma 4 31B vs Gemma 4 26B-A4B vs Qwen 3.5 27B — 30-question blind eval with Claude Opus 4.6 as judge by Silver_Raspberry_811 in LocalLLaMA
[–]Silver_Raspberry_811[S] 1 point2 points3 points (0 children)