Updated Jun 9, 2026

LLM Leaderboard — June 2026

Large language models ranked by LMSys Arena Elo, MMLU, HumanEval, MATH, pricing, and inference speed. Refreshed regularly with live data from official provider pricing pages, Artificial Analysis, and the Arena.

What is "the best LLM" in June 2026?

The honest answer is "depends on the workload." For chat and general reasoning, the LMSys text Arena leader rotates monthly — the June 2026 snapshot below shows the current top model and its Elo. For coding-specific work, the LMSys coding Arena has its own leader. For value (quality-per-dollar), open-weight models under permissive licenses still win by a wide margin. The race at the top is tighter than at any point since the original GPT-4 launch — and switching costs are now the buyer's biggest risk, not capability.

353 models
#ModelQualityArena ELOSpeedPriceContextValueReleased
1

Anthropic · Frontier agentic coding & knowledge work

100
152558 t/s$10 / $501M3.3Jun 2026
2

Anthropic · Coding, agents & computer use

99
151272 t/s$5 / $251M6.6May 2026
3

OpenAI · Reasoning at any cost

98
151068 t/s$30 / $1801M0.9Apr 2026
4

OpenAI · Frontier general purpose

97
150670 t/s$5 / $301M5.5Apr 2026
5

OpenAI · Complex analysis

97
$30 / $1801M0.9Mar 2026
6

OpenAI · Complex analysis

97
$21 / $168400K1.0Dec 2025
7

Anthropic · Complex analysis

97
$30 / $1501M1.1May 2026
8

Anthropic · Coding & agentic workflows

96
150568 t/s$5 / $251M6.4Apr 2026
9

OpenAI · Deep research

96
$10 / $40200K3.8Oct 2025
10

OpenAI · Deep research

96
$2 / $8200K19.2Oct 2025
11

OpenAI · Hard reasoning

96
$20 / $80200K1.9Jun 2025
12

Google · Speed & cost

96
1505$2 / $121M13.7Feb 2026
13

Google · Science & long-context

96
1505131 t/s$2 / $121M13.7Apr 2026
14

Anthropic · General purpose

95
1490$5 / $251M6.3Feb 2026
15

Anthropic · General purpose

95
$5 / $25200K6.3Nov 2025
16

Anthropic · Complex analysis

95
$30 / $1501M1.1Apr 2026
17

Google · Image generation

94
$2 / $1266K13.4Nov 2025
18

Anthropic · Multimodal

94
$15 / $75200K2.1Aug 2025
19

OpenAI · Hard reasoning

94
137068 t/s$10 / $40200K3.8Apr 2025
20

Alibaba Cloud · Long autonomous agentic runs

94
148890 t/s$2.5 / $7.51M18.8May 2026
21

xAI · Agentic tasks & real-time info

93
149683 t/s$1.25 / $2.51M49.6May 2026
22

OpenAI · General purpose

93
1495$2.5 / $151M10.6Mar 2026
23

OpenAI · General purpose

93
$1.75 / $14128K11.8Mar 2026
24

OpenAI · Code generation

93
$1.75 / $14400K11.8Feb 2026
25

OpenAI · Code generation

93
$1.75 / $14400K11.8Jan 2026
26

OpenAI · General purpose

93
$1.75 / $14128K11.8Dec 2025
27

OpenAI · General purpose

93
$1.75 / $14400K11.8Dec 2025
28

OpenAI · Code generation

93
$1.25 / $10400K16.5Dec 2025
29

OpenAI · General purpose

93
$1.25 / $10400K16.5Nov 2025
30

OpenAI · General purpose

93
$1.25 / $10128K16.5Nov 2025
31

OpenAI · Code generation

93
$1.25 / $10400K16.5Nov 2025
32

OpenAI · Hard reasoning

93
$150 / $600200K0.2Mar 2025
33

OpenAI · Complex analysis

93
$30 / $608K2.1May 2023
34

OpenAI · Multimodal

93
$30 / $608K2.1May 2023
35

xAI · General purpose

93
1496$1.25 / $2.52M49.6Mar 2026
36

OpenAI · Complex analysis

93
$8 / $15272K8.1Apr 2026
37

Moonshot AI · Frontier quality at low cost

92
146648 t/s$0.73 / $3.49256K43.6Apr 2026
38

Google · Multimodal + value

92
134587 t/s$1.25 / $101M16.4Mar 2025
39

Anthropic · Complex analysis

91
136052 t/s$15 / $75200K2.0May 2025
40

· Hard reasoning

91
$0.3 / $1.1164K130.0Jul 2025
41

Google · Speed & cost

91
$1.25 / $101M16.2Jun 2025
42

DeepSeek · Hard reasoning

91
$0.5 / $2.15164K68.7May 2025
43

Google · Speed & cost

91
$1.25 / $101M16.2May 2025
44

DeepSeek · Hard reasoning

91
$0.29 / $0.2933K313.8Jan 2025
45

DeepSeek · Hard reasoning

91
$0.7 / $0.8131K121.3Jan 2025
46

DeepSeek · Hard reasoning

91
$0.7 / $2.564K56.9Jan 2025
47

DeepSeek · Open-source value leader

90
146733 t/s$1.74 / $3.481M34.5Apr 2026
48

Anthropic · Coding & balance

90
146773 t/s$3 / $151M10.0Feb 2026
49

OpenAI · General purpose

90
1455$1.25 / $10400K16.0Aug 2025
50

xAI · General purpose

90
$3 / $15131K10.0Apr 2025
51

Alibaba Cloud · Open-source

90
$1.04 / $6.24262K24.7Apr 2026
52

OpenAI · Long context

89
1310120 t/s$2 / $81M17.8Apr 2025
53

Moonshot AI · Speed & cost

89
1452$0.4 / $1.9262K77.4Jan 2026
54

· Open-weight agentic coding

89
145580 t/s$0.6 / $2.41M59.3Jun 2026
55

· Open-weight agentic & tool use

88
146748 t/s$0.98 / $3.08200K43.3Apr 2026
56

OpenAI · Multimodal

88
$10 / $10400K8.8Oct 2025
57

OpenAI · Complex analysis

88
$15 / $120400K1.3Oct 2025
58

Anthropic · General purpose

88
$3 / $151M9.8Sep 2025
59

OpenAI · General purpose

88
$2.5 / $10128K14.1Aug 2025
60

OpenAI · Search + citations

88
$2.5 / $10128K14.1Mar 2025
61

OpenAI · Hard reasoning

88
$15 / $60200K2.3Dec 2024
62

OpenAI · General purpose

88
$2.5 / $10128K14.1Nov 2024
63

OpenAI · General purpose

88
$2.5 / $10128K14.1May 2024
64

OpenAI · Multimodal

88
$6 / $18128K7.3May 2024
65

OpenAI · General purpose

88
$5 / $15128K8.8May 2024
66

OpenAI · Multimodal

88
$10 / $30128K4.4Apr 2024
67

OpenAI · Complex analysis

88
$10 / $30128K4.4Jan 2024
68

OpenAI · Multimodal

88
$10 / $30128K4.4Nov 2023
69

· Open-source

88
1450$0.6 / $1.9280K69.8Feb 2026
70

Anthropic · Coding & balance

88
132095 t/s$3 / $15200K9.8May 2025
71

OpenAI · Reasoning & math

88
1305155 t/s$1.1 / $4.4200K32.0Jan 2025
72

xAI · Real-time info

87
133082 t/s$3 / $15131K9.7Feb 2025
73

DeepSeek · Open-source

87
1455$0.252 / $0.378164K276.2Dec 2025
74

· Open-source

86
$0.135 / $0.5131K270.9Dec 2025
75

DeepSeek · Open-source

86
$0.287 / $0.431164K239.6Dec 2025
76

DeepSeek · Open-source

86
$0.27 / $0.41164K252.9Sep 2025
77

DeepSeek · Open-source

86
$0.27 / $0.95164K141.0Sep 2025
78

DeepSeek · Open-source

86
$0.21 / $0.7933K172.0Aug 2025
79

DeepSeek · Open-source

86
$0.2 / $0.77164K177.3Mar 2025
80

Anthropic · General purpose

86
$3 / $15200K9.6Feb 2025
81

Anthropic · Hard reasoning

86
$3 / $15200K9.6Feb 2025
82

DeepSeek · Best open-source value

86
131062 t/s$0.27 / $1.1128K125.5Mar 2025
83

Alibaba Cloud · Multilingual & APAC

86
1448124 t/s$1.4 / $5.6256K24.6Apr 2026
84

OpenAI · General purpose

85
1285109 t/s$2.5 / $10128K13.6May 2024
85

Mistral AI · Open-source

85
$0.5 / $1.5262K85.0Dec 2025
86

Mistral AI · Open-source

85
$2 / $6131K21.3Nov 2024
87

Mistral AI · Open-source

85
$2 / $6128K21.3Feb 2024
88

Google · Speed & cost

84
$1.5 / $91M16.0May 2026
89

OpenAI · Speed & cost

83
$0.75 / $4.5400K31.6Mar 2026
90

OpenAI · Speed & cost

83
$0.25 / $2400K73.8Aug 2025
91

Alibaba Cloud · Open-source

82
$0.04 / $0.15256K863.2Mar 2026
92

Alibaba Cloud · Open-source

82
$0.139 / $1262K144.0Feb 2026
93

Alibaba Cloud · Open-source

82
$0.195 / $1.56262K93.4Feb 2026
94

Alibaba Cloud · Open-source

82
$0.26 / $2.08262K70.1Feb 2026
95

Alibaba Cloud · Speed & cost

82
$0.065 / $0.261M504.6Feb 2026
96

Alibaba Cloud · Open-source

82
$0.26 / $1.561M90.1Feb 2026
97

Alibaba Cloud · Open-source

82
$0.39 / $2.34262K60.1Feb 2026
98

Alibaba Cloud · Hard reasoning

82
$0.78 / $3.9262K35.0Feb 2026
99

Alibaba Cloud · Code generation

82
$0.11 / $0.8262K180.2Feb 2026
100

Alibaba Cloud · Open-source

82
$0.104 / $0.416131K315.4Oct 2025
Page 1 of 4 · 1100 of 353
Quality = composite benchmark (MMLU, HumanEval, MATH)Arena ELO = LMSYS Chatbot Arena ratingValue = quality per dollarPrice = input / output per 1M tokens

How the LLM leaderboard works

We pull official provider pricing every 24 hours, Artificial Analysis benchmark snapshots weekly, and LMSys Arena Elo as it publishes. The composite quality index is a 0-100 normalization over MMLU Pro, HumanEval, and MATH, weighted by recency and cross-validated against Arena Elo. We do not accept vendor-supplied numbers without an independent reference.

Where the leaderboard is wrong

No leaderboard predicts your production accuracy. LMSys Arena rewards style and short-conversation polish; a top-Arena model can still under-perform on your specific function-calling schema or long-context retrieval workload. Build an internal eval harness before you commit. See our LMArena Elo explained and LLM routing writeups for the deep-dive.

Related rankings