Anthropic ships Claude Opus 4.7 with stronger coding, agents, and vision

Anthropic released Claude Opus 4.7 as a general-availability model on April 16, 2026, framing it as a step up over Opus 4.6 in software engineering, complex agentic work, and multimodal understanding.

What's new

Coding — measurable jump on real-world software engineering benchmarks (numbers below).
Vision — supports images up to 2,576 pixels on the long edge, with gains in instruction following on multimodal tasks.
Agents — Anthropic claims state-of-the-art performance on finance-agent and knowledge-work evaluations (GDPval-AA).

The benchmark numbers

The headline result is software engineering. On SWE-bench Pro — the harder, contamination-resistant version of SWE-bench — Opus 4.7 leads every shipping competitor.

Coding

Benchmark	Opus 4.7	Opus 4.6	GPT-5.4	Gemini 3.1 Pro
SWE-bench Verified	87.6%	80.8%	—	80.6%
SWE-bench Pro	64.3%	53.4%	57.7%	54.2%
Terminal-Bench 2.0	69.4%	65.4%	75.1% ¹	68.5%

Agents & tool use

Benchmark	Opus 4.7	Opus 4.6	GPT-5.4 Pro	Gemini 3.1 Pro
MCP-Atlas (tool orchestration)	77.3%	75.8%	68.1%	73.9%
Finance Agent v1.1	64.4%	60.1%	61.5%	59.7%
OSWorld-Verified (computer use)	78.0%	72.7%	75.0%	—
BrowseComp (agentic search)	79.3%	83.7%	89.3%	85.9%

Reasoning & knowledge

Benchmark	Opus 4.7	Opus 4.6	GPT-5.4 Pro	Gemini 3.1 Pro
GPQA Diamond	94.2%	91.3%	94.4%	94.3%
Humanity's Last Exam (no tools)	46.9%	40.0%	42.7%	44.4%
Humanity's Last Exam (with tools)	54.7%	53.3%	58.7%	51.4%

Vision & multilingual

Benchmark	Opus 4.7	Opus 4.6	Gemini 3.1 Pro
CharXiv Reasoning (no tools)	82.1%	69.1%	—
CharXiv Reasoning (with tools)	91.0%	84.7%	—
MMMLU (multilingual)	91.5%	91.1%	92.6%

¹ GPT-5.4 used a self-reported harness, not directly comparable.

Other published results

CursorBench — 70% (Opus 4.6: 58%)
BigLaw Bench (Harvey) — 90.9% accuracy at high effort
OfficeQA Pro (Databricks) — 21% fewer errors than Opus 4.6
Rakuten-SWE-Bench — 3× more production tasks resolved than Opus 4.6
GDPval-AA — Anthropic claims state-of-the-art (no specific score published)

The pattern: Opus 4.7 owns real-world software engineering (SWE-bench Verified and Pro), wins most agentic and tool-use evals outside of search, ties or narrowly trails GPT-5.4 Pro on graduate-level reasoning, and posts the largest jump anywhere on chart and document understanding (CharXiv).

Pricing and availability

Pricing is unchanged from 4.6: $5 per million input tokens and $25 per million output tokens. Opus 4.7 is available across Claude.ai, the Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

Why it matters for working professionals

For lawyers, researchers, and analysts already using Claude Projects (in Nowrap's tools directory), the upgrade is automatic on Pro and Team plans. Where it shows up most: longer documents, harder reasoning, and tighter agentic loops that previously needed manual steering.

Sources: anthropic.com/news/claude-opus-4-7 · benchmark breakdown via Vellum · competitor scoring via TheNextWeb