Anthropic releases Claude Sonnet 5, its most agentic mid-tier model yet

Anthropic released Claude Sonnet 5 on June 30, 2026 — the newest entry in its mid-tier Sonnet line, and the most capable one yet. The model is designed to close the capability gap with Opus 4.8 while remaining significantly cheaper, and it ships today as the default for Free and Pro plan users.

Internally codenamed Fennec, Sonnet 5 replaces Sonnet 4.6 (released February 2026) as the workhorse of the Claude product line. It is available immediately via the Claude API as claude-sonnet-5, on claude.ai, and across Max, Team, and Enterprise plans.

Co-launched the same day: Claude Science — a dedicated AI workbench for researchers that integrates scientific tools and packages, produces auditable artifacts, and provides flexible access to computing resources.

Benchmarks

Sonnet 5 is a substantial step up from Sonnet 4.6 on every major evaluation, and it matches or surpasses Opus 4.8 on several knowledge-work tasks.

Benchmark	Sonnet 5	Sonnet 4.6	Opus 4.8
SWE-bench Verified (coding)	82.1%	—	—
SWE-bench Pro (agentic coding)	63.2%	58.1%	69.2%
OSWorld-Verified (computer use)	81.2%	78.5%	—
Terminal-Bench 2.1	80.4%	67.0%	—
HLE with tools	57.4%	46.8%	57.9%
HLE without tools	43.2%	34.6%	—
GPQA Diamond (PhD-level science)	96.2%	—	—
GDPval-AA v2 (knowledge work)	1,618	—	1,615

The standout results: Sonnet 5 hits 82.1% on SWE-bench Verified — a real-world coding benchmark measuring autonomous bug resolution — and matches Opus 4.8 on Humanity's Last Exam with tools (57.4% vs 57.9%). On knowledge work (GDPval-AA v2), it edges Opus 4.8 outright: 1,618 to 1,615.

The one area where Opus 4.8 still leads clearly is SWE-bench Pro, the harder agentic coding evaluation: 69.2% to Sonnet 5's 63.2%.

Agentic at its core

The design intent is explicit: Sonnet 5 is built to handle multi-step autonomous work that until recently required a larger, more expensive model. It can make plans, use tools including browsers and terminals, and execute complex workflows without human intervention between steps.

Anthropic highlights three specific improvements over Sonnet 4.6:

Better instruction following — fewer missed or misinterpreted constraints in long, complex prompts
Stronger tool selection — more accurate choices about which tool to invoke, and when
Self-correcting error recovery — when a step in an agentic workflow fails, Sonnet 5 is more likely to diagnose and recover rather than stall or compound the error

The model also ships with Dev Team multi-agent mode, which allows multiple Claude instances to collaborate on a shared filesystem as a coordinated team — delegating subtasks, running in parallel, and reporting back to a lead agent.

Safety improvements for agentic use

Agentic models carry different risks than chat models: they take real actions and can be hijacked mid-task by malicious content in their environment. Anthropic says Sonnet 5 reduces the rate of "undesirable behaviors" compared to Sonnet 4.6, and is specifically more resistant to prompt injection attacks — attempts by adversarial content in a browser page or document to redirect the model's actions.

It is also better at refusing requests designed to extract harmful outputs, while the refusal quality has improved: rather than a blunt decline, it explains what it won't do and why.

Specs and pricing

API model ID: claude-sonnet-5
Context window: 1M tokens
Max output: 128k tokens (up to 300k on Message Batches API with the output-300k-2026-03-24 beta header)
Introductory pricing (through August 31, 2026): $3/M input · $15/M output
Post-August pricing: same — $3/M input · $15/M output

Sonnet 5 is the default model on Free and Pro plans from launch. Max, Team, and Enterprise plans include it immediately.

Claude Science

Launched alongside Sonnet 5, Claude Science is a purpose-built workbench for researchers and scientists. It differs from Claude.ai's standard interface in three ways:

Integrates Python packages, statistical tools, and scientific libraries commonly used in research workflows
Produces auditable artifacts — outputs that include the reasoning chain and computation steps, traceable for peer review or replication
Provides flexible access to computing resources for longer-running scientific tasks

Claude Science appears targeted at life sciences, social science, and quantitative research teams that need more than a conversational assistant — specifically, outputs they can show colleagues and attach to papers.

Our take

The cost-to-capability story here is compelling. Sonnet 5 effectively delivers near-Opus-4.8 performance on knowledge work and autonomous coding for well under Opus pricing. For any team running agents at scale — where the per-token cost compounds across millions of calls — that spread matters considerably.

The SWE-bench Verified score of 82.1% is also notable. It means the model independently resolves more than four in five real-world GitHub issues on the first attempt, which is the benchmark condition that correlates most closely with developer productivity gains in practice.

The window to try it at introductory pricing is short: $3/$15 per million tokens through August 31, reverting to standard pricing after that. For high-volume agentic workloads, the incentive to test and lock in is real.

Sources: Anthropic · TechCrunch · The New Stack · MarkTechPost · Cybernews