Best LLMs for Writing in 2026

Aggregated benchmark data across EQ-Bench Creative Writing, LMArena Text, and Artificial Analysis — covering 31 models, updated weekly.

Last updated: June 15, 2026 · 31 models tracked · 3 tiers: Premium · Mid-Range · Budget

The short answer

Updated June 2026 To write well, you also need a voice.md

The best LLM for writing right now is Claude Fable 5 (Anthropic), which tops the EQ-Bench Creative Writing leaderboard ahead of Claude Opus 4.7 and GPT-5.5. For the best quality per dollar, Kimi K2.6 (Moonshot AI) delivers near-frontier writing at a fraction of the cost. Full ranking across 31 models below.

1 Claude Fable 5 Anthropic EQ Creative 2189
2 Claude Opus 4.7 Anthropic EQ Creative 2184
3 GPT-5.5 OpenAI EQ Creative 2028

How we rank models for writing

This ranking combines three independent data sources to give the most complete picture of writing quality across frontier LLMs. No single benchmark captures the full picture — so we aggregate:

Show benchmark details

EQ Creative EQ-Bench Creative Writing — specialist benchmark using trained raters to assess narrative quality, emotional depth, prose style, and character voice. Elo scale ~1300–2220. The most relevant signal for marketing copy, long-form content, and creative work.
Arena Text LMArena Text — crowd-sourced human preference leaderboard. Broad signal across all text tasks: a model that consistently wins votes is generally pleasant, clear, and useful to read. Elo scale ~1346–1510.
EQ General EQ-Bench General — measures emotional intelligence in roleplay scenarios. A proxy for character voice quality and tonal control — useful for brand voice work. Note: high EQ-General does not automatically mean strong creative writing; interpret alongside EQ Creative.
Speed Artificial Analysis — median output tokens per second across providers. Matters for iterative draft workflows where waiting costs time. ~75 tokens ≈ 55 words.

Prices are per 1M tokens (input / output) and reflect standard API pricing. A dash (—) means the model has not yet appeared on that leaderboard — never an estimated or interpolated value.

Model	EQ Creative ↓	Arena Text ↕	EQ General ↕	Speed ↕	Price / 1M ↕
Claude Fable 5 Premium Anthropic 📋 Consensus	2,189.3	1,510	2,069.4	—	$10.00 $50.00
Sources EQ-Bench leaderboard → LMArena Text →
Claude Opus 4.7 Premium Anthropic 📋 Consensus	2,183.9	1,502	1,911.5	—	$5.00 $25.00
Sources EQ-Bench leaderboard → LMArena Text →
GPT-5.5 Premium OpenAI 📋 Consensus	2,027.7	1,474	1,590.6	—	$5.00 $30.00
Sources EQ-Bench leaderboard → LMArena Text →
Claude Opus 4.8 Premium Anthropic 📋 Consensus	1,979.1	1,486	2,043.8	—	$5.00 $25.00
Sources EQ-Bench leaderboard → LMArena Text →
GPT-5.4 Premium OpenAI 📋 Consensus	1,965.2	1,468	1,586.7	—	—
Sources EQ-Bench leaderboard → LMArena Text →
Claude Sonnet 4.6 Premium Anthropic 📋 Consensus	1,937.6	1,471	1,736.6	50 t/s	$3.00 $15.00
Sources EQ-Bench leaderboard → LMArena Text → Artificial Analysis →
Claude Opus 4.6 Premium Anthropic 📋 Consensus	1,918.3	1,504	1,744.2	45 t/s	$5.00 $25.00
Sources EQ-Bench leaderboard → LMArena Text → Artificial Analysis →
Claude Sonnet 4.5 Premium Anthropic 📋 Consensus	1,764.4	1,456	1,515.7	—	$3.00 $15.00
Sources EQ-Bench leaderboard → LMArena Text →
GPT-5.3 Chat Mid-Range OpenAI 🧠 EQ-Bench	1,759.9	—	1,417.7	—	—
Sources EQ-Bench leaderboard →
Kimi K2.6 Mid-Range Moonshot AI 📋 Consensus	1,752.8	1,462	1,575.7	—	$0.60 $2.50
Sources EQ-Bench leaderboard → LMArena Text →
Claude Opus 4.5 Premium Anthropic 📋 Consensus	1,752.7	1,473	1,543.3	—	$5.00 $25.00
Sources EQ-Bench leaderboard → LMArena Text →
O3 Mid-Range OpenAI 📋 Consensus	1,739.5	1,431	1,500	—	$2.00 $8.00
Sources EQ-Bench leaderboard → LMArena Text →
Kimi K2 Mid-Range Moonshot AI 🧠 EQ-Bench	1,691.2	—	1,562.8	44 t/s	$0.55 $2.20
Sources EQ-Bench leaderboard → Artificial Analysis →
Horizon Alpha Mid-Range Unknown 🧠 EQ-Bench	1,663.9	—	1,552.4	—	—
Sources EQ-Bench leaderboard →
GLM-5 Budget Zhipu AI 📋 Consensus	1,657.2	1,457	1,532.7	80 t/s	$0.80 $2.50
Sources EQ-Bench leaderboard → LMArena Text → Artificial Analysis →
GLM-5.1 Mid-Range Zhipu AI 📋 Consensus	1,642.3	1,475	1,560.7	—	—
Sources EQ-Bench leaderboard → LMArena Text →
GPT-5.2 Premium OpenAI 📋 Consensus	1,641.4	1,435	1,567.8	—	$1.25 $10.00
Sources EQ-Bench leaderboard → LMArena Text →
Claude Opus 4 Premium Anthropic 📋 Consensus	1,634.4	1,424	1,413.1	—	$15.00 $75.00
Sources EQ-Bench leaderboard → LMArena Text →
Kimi K2.5 Mid-Range Moonshot AI 📋 Consensus	1,593.3	1,450	1,545.1	—	—
Sources EQ-Bench leaderboard → LMArena Text →
DeepSeek V3.2 Budget DeepSeek 📋 Consensus	1,510.5	1,425	—	—	$0.28 $0.42
Sources EQ-Bench leaderboard → LMArena Text →
Gemini 3 Pro Mid-Range Google 📋 Consensus	1,497.6	1,486	1,556.5	80 t/s	$2.00 $12.00
Sources EQ-Bench leaderboard → LMArena Text → Artificial Analysis →
Gemini 3.1 Pro Mid-Range Google 📋 Consensus	1,471.2	1,487	1,535.9	—	$2.11 $12.66
Sources EQ-Bench leaderboard → LMArena Text →
Mistral Medium 3 Budget Mistral AI 🧠 EQ-Bench	1,465.8	—	—	—	$0.40 $2.00
Sources EQ-Bench leaderboard →
Qwen3-235B Budget Alibaba 📋 Consensus	1,459	1,375	1,271.4	—	$0.18 $0.54
Sources EQ-Bench leaderboard → LMArena Text →
GPT-4o Premium OpenAI 📋 Consensus	1,443	1,346	1,393	185 t/s	$2.50 $10.00
Sources EQ-Bench leaderboard → LMArena Text → Artificial Analysis →
GLM-4.7 Budget Zhipu AI 📋 Consensus	1,399.1	1,443	1,447.6	—	$0.38 $1.70
Sources EQ-Bench leaderboard → LMArena Text →
MiniMax M2.5 Budget MiniMax 📋 Consensus	1,327.1	1,390	—	395 t/s	$0.30 $1.20
Sources EQ-Bench leaderboard → LMArena Text → Artificial Analysis →
Gemini 3 Flash Mid-Range Google 🏟️ Arena	—	1,473	—	250 t/s	$0.50 $3.00
Sources LMArena Text → Artificial Analysis →
Gemini 3.1 Flash-Lite Budget Google 🏟️ Arena	—	1,433	—	—	$0.25 $1.50
Sources LMArena Text →
Grok 4.1 Mid-Range xAI 🏟️ Arena	—	1,466	—	163 t/s	$0.20 $0.50
Sources LMArena Text → Artificial Analysis →
MiMo-V2.5 Budget Xiaomi 🏟️ Arena	—	1,433	—	—	$0.40 $2.00
Sources LMArena Text →

The right model depends on the task

Benchmark leaderboards rank models globally — but the best model for a 2,000-word thought leadership article is not necessarily the best model for a 15-word social media headline. Here's how the leading models split across common writing tasks:

Narrative & long-form

Thought leadership, case studies, email newsletters, ghostwriting. Requires emotional depth, tonal consistency, and the ability to sustain voice across thousands of words.

Best picks: Claude Opus 4.7 · Claude Sonnet 4.6

Structured commercial copy

Product descriptions, landing pages, ad copy, LinkedIn posts. Requires clarity, persuasion structure, and format adherence more than creative flair.

Best picks: GPT-5.5 · Claude Sonnet 4.6

High-volume / fast drafts

Social media scheduling, meta descriptions, bulk content variation. Speed and cost matter more than peak quality; fast iteration wins here.

Best picks: Gemini 3 Flash · Grok 4.1 · Kimi K2

Brand voice & consistency

Any content where staying on-brand is non-negotiable. Requires strong instruction-following, tonal control, and memory of brand guidelines.

Best picks: Claude Opus 4.8 · Claude Sonnet 4.6

Managing this by hand means juggling several API keys, pricing tiers, and a decision tree for every task type. The table above shows where each model wins, so you can match the model to the job instead of forcing one model onto everything.

You picked the model.
That's only half the battle.

The model sets the ceiling on quality. What it can't decide is how the writing actually sounds: your voice, your angle, your point of view. That's why output from even the best LLMs comes out generic, gets flagged as AI, and quietly loses reach. A voice.md file is the other half. It captures how you write, so any model turns your ideas into content that's distinct, human, and recognizably yours.

Generate your voice.md free →

Frequently asked questions

Which LLM is best for creative writing in 2026?

Claude Opus 4.7 and the new Claude Fable 5 (Anthropic) are neck-and-neck at the top of the EQ-Bench Creative Writing leaderboard — Fable 5 at an Elo of 2189 and Opus 4.7 at 2184 as of June 2026 — followed by GPT-5.5 at 2028. Fable 5, Anthropic's most capable model, also leads the broader boards: LMArena Text at 1510 and EQ General at 2069, where Claude Opus 4.8 is a close second at 2044. These models excel at narrative quality, emotional depth, and character voice — the core skills that separate great writing from generic AI output.

What is EQ-Bench and why does it matter for writing?

EQ-Bench is an independent benchmark that evaluates large language models on emotional intelligence and narrative quality, using a panel of human raters. Its Creative Writing sub-leaderboard specifically measures story quality, emotional resonance, and prose style — making it the most relevant benchmark for marketing copy, long-form content, and creative work. Scores are on an Elo scale where higher is better, typically ranging from ~1300 to ~2220.

What is LMArena Text and how is it different from EQ-Bench?

LMArena Text (formerly LMSYS Chatbot Arena) measures human preference through head-to-head votes: two anonymous models answer the same prompt, and users pick the better response. It's a broad preference signal across all text tasks, not just writing. EQ-Bench Creative Writing is narrower and more specialist — it specifically evaluates narrative and emotional writing quality with trained raters rather than crowd votes.

Which LLM is the best value for writing tasks?

Kimi K2 by Moonshot AI offers the best performance-per-dollar for writing: an EQ-Bench Creative score of 1691 at just $0.55 input / $2.20 output per 1M tokens — roughly 9× cheaper than Claude Opus 4.7 with ~77% of its creative writing performance. Kimi K2.6 scores even higher at 1753 Creative. GLM-5 (Zhipu AI) is another strong value option at $0.80/$2.50 with scores of 1657 EQ Creative and 1533 EQ General.

How often is this ranking updated?

Scores are updated weekly via an automated scraper that fetches the latest data from EQ-Bench and LMArena. Prices are reviewed manually and updated when providers announce changes. The 'Updated weekly' badge in the table header shows the date of the last successful update.

What does 'tokens per second' mean for writing?

Tokens per second (t/s) measures how fast a model outputs text — roughly, 75 tokens equals about 55 words. For writing workflows, speed matters when you need rapid iteration on drafts or real-time dictation-to-copy conversion. MiniMax M2.5 is the fastest tracked model at 395 t/s; Gemini 3 Flash at 250 t/s offers the best speed-to-cost ratio among paid models.

Does the best LLM for writing change depending on the task?

Yes — significantly. Claude Opus 4.7 and Claude Fable 5 sit neck-and-neck atop EQ Creative (2184 and 2189), while Fable 5 also leads EQ General (2069) — making it especially strong for character voice and emotional tone, with Claude Opus 4.8 close behind at 2044. Claude Sonnet 4.6 remains excellent for structured commercial copy where format consistency matters at a lower price. GPT-5.5 is a strong contender for creative work. Faster models like Gemini 3 Flash or Grok 4.1 suit high-volume, lower-stakes content.

Does picking the best LLM guarantee good writing?

No. The model sets the ceiling on quality, but it does not decide how the writing sounds: your voice, your angle, your point of view. Output from even the top-ranked models often reads as generic and gets flagged as AI, which costs reach. The fix is to give the model your voice. A short voice.md file captures how you write, so any model produces content that is distinct and recognizably yours.