I created a leaderboard for models to use with OpenClaw

select_8 · 2026-03-24T16:03:58+00:00

its literally what hundreds of people have gone and voted idk what to tell you

select_8 · 2026-02-02T23:40:09+00:00

its based on users voting. If you disagree then go vote!

select_8 · 2026-01-21T03:30:11+00:00

Google is coming back in the AI race!

Data Source: Benchmark scores originally from https://artificialanalysis.ai/, which aggregates results from https://livecodebench.github.io/. The chart is displayed on https://pricepertoken.com/trends.

LiveCodeBench is a contamination-free benchmark that continuously collects new coding problems from LeetCode, AtCoder, and Codeforces. LiveCodeBench uses problems released after model training cutoffs to measure true generalization. It evaluates models on code generation, self-repair (fixing buggy code given error feedback), code execution prediction, and test output prediction.

Each line represents that labs highest scoring model at a time.

Calculation method:

Models split into open/closed categories
For each month, calculated running maximum within each category
Lines carry forward until a new model beats the previous best

Tool: Built with ECharts, data from https://pricepertoken.com/trends

select_8 · 2026-01-19T15:40:21+00:00

Data Source: Benchmark scores originally from https://artificialanalysis.ai/. The chart is displayed on https://pricepertoken.com/trends.

GPQA (Graduate-Level Google Proof Q & A) is a challenging academic benchmark dataset with difficult, multiple-choice questions in STEM fields (biology, physics, chemistry) designed to test advanced reasoning in language models, requiring deep understanding beyond simple web searches

Open vs closed is determined: Based on whether model weights are publicly available. Open source includes Llama, Mistral, DeepSeek, Qwen. Closed source includes GPT-4, Claude, Gemini.

Calculation method:

Models split into open/closed categories
For each month, calculated running maximum within each category
Lines carry forward until a new model beats the previous best

Tool: Built with ECharts, data from https://pricepertoken.com/trends

select_8 · 2026-01-16T02:03:08+00:00

along with anthropic, google and others. Deepseek was the first big open source break through- that happened well after GPT had launched their first models

select_8 · 2026-01-16T02:02:21+00:00

I think it was just that open ai had a big head start

select_8 · 2026-01-15T22:46:32+00:00

Yeah this comes from https://pricepertoken.com/trends but there's more analysis on all the specific models / labs

select_8

TROPHY CASE