What does LM Arena Elo actually measure?

It's an Elo rating computed from pairwise human preference votes on freeform chat prompts. Higher Elo means humans preferred this model's responses more often in head-to-head comparisons. It is a real-world signal but it does not predict accuracy on structured workloads: see our LMArena Elo explainer for the 5 failure modes.

How often is the LM leaderboard updated?

We refresh data every 24 hours from official provider pricing pages and weekly from Artificial Analysis benchmarks. LMArena Elo is pulled as it publishes.

Which LM has the best price-to-Elo ratio?

DeepSeek V4.5 (MIT, ~1471 Elo) at $0.50/$1.10 per 1M tokens, or DeepSeek V4 Pro at 1467 Elo and about $0.44/$0.87 on its launch promo (Apache 2.0): a fraction of the closed-source frontier for a comparable Elo band. Meta Llama 5 leads open multilingual work.

Are open-weight LMs competitive in May 2026?

Yes. DeepSeek V4 Pro, Gemma 4, Qwen 3.6 Plus, and NVIDIA Nemotron 3 Nano Omni all sit within 35-40 Elo of frontier closed models. For most workloads outside the absolute top tier the gap is no longer the decisive factor.

Updated Jul 20, 2026

LM Leaderboard, August 2026

Large language models ranked by LMSys Arena Elo, MMLU, HumanEval, MATH, pricing, and inference speed. Refreshed regularly with live data from official provider pricing pages, Artificial Analysis, and the Arena.

What is the top LM on the Arena right now?

LMArena (formerly LMSYS Chatbot Arena) tracks pairwise human votes across hundreds of thousands of conversations. Our August 2026 snapshot below ranks 363 language models on Arena Elo plus the standard MMLU / HumanEval / MATH benchmark suite. The Arena re-ranks roughly weekly as votes accumulate; what you see is the most recent snapshot verified against the public Arena and Artificial Analysis.

363 models

#	Model	Quality	Arena ELO	Speed	Price	Context	Value	Released
1	Anthropic: Claude Fable 5 Anthropic · Frontier agentic coding & knowledge work	100	1525	58 t/s	$10 / $50	1M	3.3	Jun 2026
2	Anthropic: Claude Opus 4.8 Anthropic · Coding, agents & computer use	99	1512	72 t/s	$5 / $25	1M	6.6	May 2026
3	OpenAI: GPT-5.5 Pro OpenAI · Reasoning at any cost	98	1510	68 t/s	$30 / $180	1M	0.9	Apr 2026
4	OpenAI: GPT-5.6 New OpenAI · General frontier reasoning & tools	98	1514	96 t/s	$5 / $30	400K	5.6	Jul 2026
5	MoonshotAI: Kimi K3 NewOSS Moonshot AI · Open-weight coding frontier	98	1500	55 t/s	$3 / $15	1M	10.9	Jul 2026
6	OpenAI: GPT-5.5 OpenAI · Frontier general purpose	97	1506	70 t/s	$5 / $30	1M	5.5	Apr 2026
7	OpenAI: GPT-5.4 Pro OpenAI · Complex analysis	97	—	—	$30 / $180	1M	0.9	Mar 2026
8	OpenAI: GPT-5.2 Pro OpenAI · Complex analysis	97	—	—	$21 / $168	400K	1.0	Dec 2025
9	Anthropic: Claude Opus 4.7 (Fast) Anthropic · Complex analysis	97	—	—	$30 / $150	1M	1.1	May 2026
10	Google: Gemini 3.2 Pro Google · Long-context & multimodal	97	1508	128 t/s	$2 / $12	2M	13.9	Jul 2026
11	Anthropic: Claude Opus 4.7 Anthropic · Coding & agentic workflows	96	1505	68 t/s	$5 / $25	1M	6.4	Apr 2026
12	OpenAI: o3 Deep Research OpenAI · Deep research	96	—	—	$10 / $40	200K	3.8	Oct 2025
13	OpenAI: o4 Mini Deep Research OpenAI · Deep research	96	—	—	$2 / $8	200K	19.2	Oct 2025
14	OpenAI: o3 Pro OpenAI · Hard reasoning	96	—	—	$20 / $80	200K	1.9	Jun 2025
15	Google: Gemini 3.1 Pro Preview Custom Tools Google · Speed & cost	96	1505	—	$2 / $12	1M	13.7	Feb 2026
16	Google: Gemini 3.1 Pro Preview Google · Science & long-context	96	1505	131 t/s	$2 / $12	1M	13.7	Apr 2026
17	Anthropic: Claude Opus 4.6 Anthropic · General purpose	95	1490	—	$5 / $25	1M	6.3	Feb 2026
18	Anthropic: Claude Opus 4.5 Anthropic · General purpose	95	—	—	$5 / $25	200K	6.3	Nov 2025
19	Anthropic: Claude Opus 4.6 (Fast) Anthropic · Complex analysis	95	—	—	$30 / $150	1M	1.1	Apr 2026
20	Google: Nano Banana Pro (Gemini 3 Pro Image Preview) Google · Image generation	94	—	—	$2 / $12	66K	13.4	Nov 2025
21	Anthropic: Claude Opus 4.1 Anthropic · Multimodal	94	—	—	$15 / $75	200K	2.1	Aug 2025
22	OpenAI: o3 OpenAI · Hard reasoning	94	1370	68 t/s	$10 / $40	200K	3.8	Apr 2025
23	Qwen: Qwen3.7 Max Alibaba Cloud · Long autonomous agentic runs	94	1488	90 t/s	$2.5 / $7.5	1M	18.8	May 2026
24	xAI: Grok 4.3 xAI · Agentic tasks & real-time info	93	1496	83 t/s	$1.25 / $2.5	1M	49.6	May 2026
25	OpenAI: GPT-5.4 OpenAI · General purpose	93	1495	—	$2.5 / $15	1M	10.6	Mar 2026
26	OpenAI: GPT-5.3 Chat OpenAI · General purpose	93	—	—	$1.75 / $14	128K	11.8	Mar 2026
27	OpenAI: GPT-5.3-Codex OpenAI · Code generation	93	—	—	$1.75 / $14	400K	11.8	Feb 2026
28	OpenAI: GPT-5.2-Codex OpenAI · Code generation	93	—	—	$1.75 / $14	400K	11.8	Jan 2026
29	OpenAI: GPT-5.2 Chat OpenAI · General purpose	93	—	—	$1.75 / $14	128K	11.8	Dec 2025
30	OpenAI: GPT-5.2 OpenAI · General purpose	93	—	—	$1.75 / $14	400K	11.8	Dec 2025
31	OpenAI: GPT-5.1-Codex-Max OpenAI · Code generation	93	—	—	$1.25 / $10	400K	16.5	Dec 2025
32	OpenAI: GPT-5.1 OpenAI · General purpose	93	—	—	$1.25 / $10	400K	16.5	Nov 2025
33	OpenAI: GPT-5.1 Chat OpenAI · General purpose	93	—	—	$1.25 / $10	128K	16.5	Nov 2025
34	OpenAI: GPT-5.1-Codex OpenAI · Code generation	93	—	—	$1.25 / $10	400K	16.5	Nov 2025
35	OpenAI: o1-pro OpenAI · Hard reasoning	93	—	—	$150 / $600	200K	0.2	Mar 2025
36	OpenAI: GPT-4 (older v0314) OpenAI · Complex analysis	93	—	—	$30 / $60	8K	2.1	May 2023
37	OpenAI: GPT-4 OpenAI · Multimodal	93	—	—	$30 / $60	8K	2.1	May 2023
38	xAI: Grok 4.20 xAI · General purpose	93	1496	—	$1.25 / $2.5	2M	49.6	Mar 2026
39	OpenAI: GPT-5.4 Image 2 OpenAI · Complex analysis	93	—	—	$8 / $15	272K	8.1	Apr 2026
40	Anthropic: Claude Sonnet 5 Anthropic · Balanced agents & coding value	93	1479	98 t/s	$3 / $15	1M	10.3	Jun 2026
41	MoonshotAI: Kimi K2.6 Moonshot AI · Frontier quality at low cost	92	1466	48 t/s	$0.73 / $3.49	256K	43.6	Apr 2026
42	Google: Gemini 2.5 Pro Google · Multimodal + value	92	1345	87 t/s	$1.25 / $10	1M	16.4	Mar 2025
43	DeepSeek: DeepSeek V4.5 NewOSS DeepSeek · Open-weight reasoning value	92	1471	62 t/s	$0.5 / $1.1	256K	115.0	Jul 2026
44	Anthropic: Claude Opus 4 Anthropic · Complex analysis	91	1360	52 t/s	$15 / $75	200K	2.0	May 2025
45	TNG: DeepSeek R1T2 ChimeraOSS · Hard reasoning	91	—	—	$0.3 / $1.1	164K	130.0	Jul 2025
46	Google: Gemini 2.5 Pro Preview 06-05 Google · Speed & cost	91	—	—	$1.25 / $10	1M	16.2	Jun 2025
47	DeepSeek: R1 0528OSS DeepSeek · Hard reasoning	91	—	—	$0.5 / $2.15	164K	68.7	May 2025
48	Google: Gemini 2.5 Pro Preview 05-06 Google · Speed & cost	91	—	—	$1.25 / $10	1M	16.2	May 2025
49	DeepSeek: R1 Distill Qwen 32BOSS DeepSeek · Hard reasoning	91	—	—	$0.29 / $0.29	33K	313.8	Jan 2025
50	DeepSeek: R1 Distill Llama 70BOSS DeepSeek · Hard reasoning	91	—	—	$0.7 / $0.8	131K	121.3	Jan 2025
51	DeepSeek: R1OSS DeepSeek · Hard reasoning	91	—	—	$0.7 / $2.5	64K	56.9	Jan 2025
52	MoonshotAI: Kimi K2.7 CodeOSS Moonshot AI · Open-weight agentic coding	91	—	55 t/s	$0.73 / $3.49	256K	43.1	Jun 2026
53	Nex AGI: Nexus N2 ProOSS · Open-weight reasoning & tool use	91	—	50 t/s	$0.2 / $0.8	262K	182.0	Jun 2026
54	Meta: Llama 5OSS Meta · Open-weight general & multilingual	91	1466	76 t/s	$0.8 / $2.4	1M	56.9	Jun 2026
55	DeepSeek: DeepSeek V4 ProOSS DeepSeek · Open-source value leader	90	1467	33 t/s	$1.74 / $3.48	1M	34.5	Apr 2026
56	Anthropic: Claude Sonnet 4.6 Anthropic · Coding & balance	90	1467	73 t/s	$3 / $15	1M	10.0	Feb 2026
57	OpenAI: GPT-5 OpenAI · General purpose	90	1455	—	$1.25 / $10	400K	16.0	Aug 2025
58	xAI: Grok 3 Beta xAI · General purpose	90	—	—	$3 / $15	131K	10.0	Apr 2025
59	Qwen: Qwen3.6 Max PreviewOSS Alibaba Cloud · Open-source	90	—	—	$1.04 / $6.24	262K	24.7	Apr 2026
60	OpenAI: GPT-4.1 OpenAI · Long context	89	1310	120 t/s	$2 / $8	1M	17.8	Apr 2025
61	MoonshotAI: Kimi K2.5 Moonshot AI · Speed & cost	89	1452	—	$0.4 / $1.9	262K	77.4	Jan 2026
62	MiniMax: MiniMax M3OSS MiniMax · Open-weight agentic coding	89	1455	80 t/s	$0.6 / $2.4	1M	59.3	Jun 2026
63	Z.ai: GLM 5.2OSS · Open-weight agentic coding (provisional)	89	—	—	$0.98 / $3.08	200K	43.8	Jun 2026
64	Z.ai: GLM 5.1OSS · Open-weight agentic & tool use	88	1467	48 t/s	$0.98 / $3.08	200K	43.3	Apr 2026
65	OpenAI: GPT-5 Image OpenAI · Multimodal	88	—	—	$10 / $10	400K	8.8	Oct 2025
66	OpenAI: GPT-5 Pro OpenAI · Complex analysis	88	—	—	$15 / $120	400K	1.3	Oct 2025
67	Anthropic: Claude Sonnet 4.5 Anthropic · General purpose	88	—	—	$3 / $15	1M	9.8	Sep 2025
68	OpenAI: GPT-4o Audio OpenAI · General purpose	88	—	—	$2.5 / $10	128K	14.1	Aug 2025
69	OpenAI: GPT-4o Search Preview OpenAI · Search + citations	88	—	—	$2.5 / $10	128K	14.1	Mar 2025
70	OpenAI: o1 OpenAI · Hard reasoning	88	—	—	$15 / $60	200K	2.3	Dec 2024
71	OpenAI: GPT-4o (2024-11-20) OpenAI · General purpose	88	—	—	$2.5 / $10	128K	14.1	Nov 2024
72	OpenAI: GPT-4o OpenAI · General purpose	88	—	—	$2.5 / $10	128K	14.1	May 2024
73	OpenAI: GPT-4o (extended) OpenAI · Multimodal	88	—	—	$6 / $18	128K	7.3	May 2024
74	OpenAI: GPT-4o (2024-05-13) OpenAI · General purpose	88	—	—	$5 / $15	128K	8.8	May 2024
75	OpenAI: GPT-4 Turbo OpenAI · Multimodal	88	—	—	$10 / $30	128K	4.4	Apr 2024
76	OpenAI: GPT-4 Turbo Preview OpenAI · Complex analysis	88	—	—	$10 / $30	128K	4.4	Jan 2024
77	OpenAI: GPT-4 Turbo (older v1106) OpenAI · Multimodal	88	—	—	$10 / $30	128K	4.4	Nov 2023
78	Z.ai: GLM 5OSS · Open-source	88	1450	—	$0.6 / $1.92	80K	69.8	Feb 2026
79	Anthropic: Claude Sonnet 4 Anthropic · Coding & balance	88	1320	95 t/s	$3 / $15	200K	9.8	May 2025
80	OpenAI: o3 Mini OpenAI · Reasoning & math	88	1305	155 t/s	$1.1 / $4.4	200K	32.0	Jan 2025
81	xAI: Grok 3 xAI · Real-time info	87	1330	82 t/s	$3 / $15	131K	9.7	Feb 2025
82	DeepSeek: DeepSeek V3.2OSS DeepSeek · Open-source	87	1455	—	$0.252 / $0.378	164K	276.2	Dec 2025
83	Nex AGI: DeepSeek V3.1 Nex N1OSS · Open-source	86	—	—	$0.135 / $0.5	131K	270.9	Dec 2025
84	DeepSeek: DeepSeek V3.2 SpecialeOSS DeepSeek · Open-source	86	—	—	$0.287 / $0.431	164K	239.6	Dec 2025
85	DeepSeek: DeepSeek V3.2 ExpOSS DeepSeek · Open-source	86	—	—	$0.27 / $0.41	164K	252.9	Sep 2025
86	DeepSeek: DeepSeek V3.1 TerminusOSS DeepSeek · Open-source	86	—	—	$0.27 / $0.95	164K	141.0	Sep 2025
87	DeepSeek: DeepSeek V3.1OSS DeepSeek · Open-source	86	—	—	$0.21 / $0.79	33K	172.0	Aug 2025
88	DeepSeek: DeepSeek V3 0324OSS DeepSeek · Open-source	86	—	—	$0.2 / $0.77	164K	177.3	Mar 2025
89	Anthropic: Claude 3.7 Sonnet Anthropic · General purpose	86	—	—	$3 / $15	200K	9.6	Feb 2025
90	Anthropic: Claude 3.7 Sonnet (thinking) Anthropic · Hard reasoning	86	—	—	$3 / $15	200K	9.6	Feb 2025
91	DeepSeek: DeepSeek V3OSS DeepSeek · Best open-source value	86	1310	62 t/s	$0.27 / $1.1	128K	125.5	Mar 2025
92	Qwen: Qwen3.6 Plus Alibaba Cloud · Multilingual & APAC	86	1448	124 t/s	$1.4 / $5.6	256K	24.6	Apr 2026
93	OpenAI: GPT-4o (2024-08-06) OpenAI · General purpose	85	1285	109 t/s	$2.5 / $10	128K	13.6	May 2024
94	Mistral: Mistral Large 3 2512OSS Mistral AI · Open-source	85	—	—	$0.5 / $1.5	262K	85.0	Dec 2025
95	Mistral Large 2407OSS Mistral AI · Open-source	85	—	—	$2 / $6	131K	21.3	Nov 2024
96	Mistral LargeOSS Mistral AI · Open-source	85	—	—	$2 / $6	128K	21.3	Feb 2024
97	Google: Gemini 3.5 Flash Google · Speed & cost	84	—	—	$1.5 / $9	1M	16.0	May 2026
98	Nex AGI: Nexus N2 miniOSS · Accessible open-weight agentics	84	—	110 t/s	$0.05 / $0.2	262K	672.0	Jun 2026
99	OpenAI: GPT-5.4 Mini OpenAI · Speed & cost	83	—	—	$0.75 / $4.5	400K	31.6	Mar 2026
100	OpenAI: GPT-5 Mini OpenAI · Speed & cost	83	—	—	$0.25 / $2	400K	73.8	Aug 2025

Page 1 of 4 · 1–100 of 363

Quality = composite benchmark (MMLU, HumanEval, MATH)Arena ELO = LMSYS Chatbot Arena ratingValue = quality per dollarPrice = input / output per 1M tokens

How the LLM leaderboard works

We pull official provider pricing every 24 hours, Artificial Analysis benchmark snapshots weekly, and LMSys Arena Elo as it publishes. The composite quality index is a 0-100 normalization over MMLU Pro, HumanEval, and MATH, weighted by recency and cross-validated against Arena Elo. We do not accept vendor-supplied numbers without an independent reference.

Where the leaderboard is wrong

No leaderboard predicts your production accuracy. LMSys Arena rewards style and short-conversation polish; a top-Arena model can still under-perform on your specific function-calling schema or long-context retrieval workload. Build an internal eval harness before you commit. See our LMArena Elo explained and LLM routing writeups for the deep-dive.

Related rankings

AI Model Leaderboard: same data, broader entry point
Models Leaderboard
GenAI Leaderboard
AI Vendor Lock-in Leaderboard