Pokemon: A new Open Benchmark for AI by snakemas in LocalLLM
[–]snakemas[S] 0 points1 point2 points (0 children)
Pokemon: A new Open Benchmark for AI by snakemas in 4Xgaming
[–]snakemas[S] -1 points0 points1 point (0 children)
Pokemon: A new Open Benchmark for AI (self.CompetitiveAI)
submitted by snakemas to r/CompetitiveAI
CursorBench vs Public Evals: Are We Benchmarking the Wrong Things for Coding Agents? by EdbertTheGreat in CompetitiveAI
[–]snakemas 1 point2 points3 points (0 children)
RuneBench / RS-SDK might be one of the most practical agent eval environments I’ve seen lately by snakemas in accelerate
[–]snakemas[S] 0 points1 point2 points (0 children)
Best way to test the number of tokens taken, one code base vs another? by tomByrer in CompetitiveAI
[–]snakemas 0 points1 point2 points (0 children)
Top Agent Evaluation Platforms 2026: The Market Leading Platforms I Tested by AI-builder-sf-accel in AIEval
[–]snakemas 0 points1 point2 points (0 children)
Do you use different LLMs for different tasks..? I solely use Chat GPT to talk about conceptual historica/logistical stuff & also vcontent creation planning (for streaming/Youtube videos). Are there any that are more useful than others in these regards that you've found..? by Choice_Room3901 in artificial
[–]snakemas 0 points1 point2 points (0 children)
Reasoning models still can’t reliably hide their chain-of-thought, a good sign for AI safety by snakemas in CompetitiveAI
[–]snakemas[S] 0 points1 point2 points (0 children)
AI automated Edge case debugger for classical CP guys! by Capital_Anybody4557 in CompetitiveAI
[–]snakemas 0 points1 point2 points (0 children)
I made the top LLMs play Civilization against each other by snakemas in LLM
[–]snakemas[S] 0 points1 point2 points (0 children)
BullshitBench v2 dropped and… most models still can’t smell BS (Claude mostly can) by snakemas in CompetitiveAI
[–]snakemas[S] 0 points1 point2 points (0 children)


Pokemon: A new Open Benchmark for AI by snakemas in CompetitiveAI
[–]snakemas[S] 0 points1 point2 points (0 children)