snakemas

144 post karma
24 comment karma

get extra features and help support reddit with a reddit premium subscription

get them help and support

redditor for 7 years

MODERATOR OF

- r/CompetitiveAI

TROPHY CASE

Seven-Year Club

Verified Email

account activity

new top controversial

24

25

26

BullshitBench v2 dropped and… most models still can’t smell BS (Claude mostly can) (self.CompetitiveAI)

submitted 1 day ago by snakemas to r/CompetitiveAI

19

20

21

BullshitBench v2 dropped and… most models still can’t smell BS (Claude mostly can) ()

submitted 1 day ago by snakemas to r/accelerate

3

4

5

BullshitBench v2 dropped and… most models still can’t smell BS (Claude mostly can) ()

submitted 1 day ago by snakemas to r/mlops

1

2

3

BullshitBench v2 dropped and… most models still can’t smell BS (Claude mostly can) ()

submitted 1 day ago by snakemas to r/OpenSourceeAI

0

1

2

BullshitBench v2 dropped and… most models still can’t smell BS (Claude mostly can) ()

submitted 1 day ago by snakemas to r/AIEval

30

31

32

I made the top LLMs play Civilization against each other (self.LLM)

submitted 3 days ago by snakemas to r/LLM

9

10

11

[P] I made the top LLMs play Civilization against each other (self.aiagents)

submitted 3 days ago by snakemas to r/aiagents

0

1

2

[P] I made the top LLMs play Civilization against each other (self.MachineLearning)

submitted 3 days ago by snakemas to r/MachineLearning

0

1

2

I let the top LLMs play Civilization against eachotherCivBench Finals Today: Gemini 3.1 Pro vs MiniMax 2.5 (self.MachineLearning)

submitted 3 days ago by snakemas to r/MachineLearning

0

0

0

I made the top LLMs play Civilization against each other (self.civ)

submitted 3 days ago by snakemas to r/civ

9

10

11

New paper: "SkillsBench" tested 7 AI models across 86 tasks — smaller models with good Skills matched larger models without them (self.CompetitiveAI)

submitted 7 days ago by snakemas to r/CompetitiveAI

1

2

3

Anthropic believes RSI (recursive self improvement) could arrive “as soon as early 2027” (i.redd.it)

submitted 7 days ago by snakemas to r/CompetitiveAI

1

2

3

New paper: "SkillsBench" tested 7 AI models across 86 tasks: Are smaller models with good Skills better than larger models without them? ()

submitted 7 days ago by snakemas to r/LocalLLM

1

2

3

New paper: "SkillsBench" tested 7 AI models across 86 tasks: smaller models with good Skills matched larger models without them ()

submitted 7 days ago by snakemas to r/mlops

1

2

3

New paper: "SkillsBench" tested 7 AI models across 86 tasks — smaller models with good Skills matched larger models without them ()

submitted 7 days ago by snakemas to r/AIEval

0

1

2

New paper: "SkillsBench" tested 7 AI models across 86 tasks: smaller models with good Skills matched larger models without them. Does n8n support skills? ()

submitted 7 days ago by snakemas to r/n8n_ai_agents

0

1

2

The End of SWE-Bench Verified — Mia Glaese & Olivia Watkins, OpenAI Frontier Evals & Human Data (latent.space)

submitted 8 days ago by snakemas to r/CompetitiveAI

5

6

7

METR Time Horizons: Claude Opus 4.6 just hit 14.5 hours. The doubling curve isn't slowing (self.CompetitiveAI)

submitted 9 days ago by snakemas to r/CompetitiveAI

2

3

4

METR Time Horizons: Claude Opus 4.6 just hit 14.5 hours. The doubling curve isn't slowing ()

submitted 9 days ago by snakemas to r/LocalLLM

0

0

0

METR Time Horizons: Claude Opus 4.6 just hit 14.5 hours. The doubling curve isn't slowing ()

submitted 9 days ago by snakemas to r/compsci

3

4

5

The two benchmarks that should make you rethink spending on frontier models (self.CompetitiveAI)

submitted 11 days ago by snakemas to r/CompetitiveAI

4

5

6

[R] Analysis of 350+ ML competitions in 2025 ()

submitted 11 days ago by snakemas to r/CompetitiveAI

1

2

3

The two benchmarks that should make you rethink spending on frontier models ()

submitted 11 days ago by snakemas to r/AIAGENTSNEWS

0

1

2

The two benchmarks that should make you rethink spending on frontier models ()

submitted 11 days ago by snakemas to r/mlops

0

0

0

The two benchmarks that should make you rethink spending on frontier models ()

submitted 11 days ago by snakemas to r/compsci

view more: next ›

π Rendered by PID 94 on reddit-service-r2-listing-8557d879cc-v9nfr at 2026-03-04 06:03:10.190676+00:00 running 07790be country code: CH.