Interactive benchmark tool — compare Opus 4.7 vs GPT-5.4, Gemini 3.1, Llama 4 Maverick on MMLU, GPQA, HumanEval, MATH, SWE-bench and more.
Last updated April 18, 2026 · Anthropic's latest flagship model
| Benchmark | Claude Opus 4.7 | 2nd Place | Gap | Category |
|---|---|---|---|---|
| SWE-bench Verified | 74.2% | GPT-5.4 · 65.8% | +8.4% | Coding |
| HumanEval | 96.1% | GPT-5.4 · 94.5% | +1.6% | Coding |
| MMLU | 93.4% | GPT-5.4 · 92.8% | +0.6% | General Knowledge |
| MATH | 91.8% | GPT-5.4 · 90.5% | +1.3% | Mathematics |
| TAU-bench | 71.5% | GPT-5.4 · 66.2% | +5.3% | Agentic Tool Use |
| GPQA Diamond | 78.3% | GPT-5.4 · 76.8% | +1.5% | Science Reasoning |
Second-generation reasoning chains with higher transparency. Configurable thinking budgets up to 128K tokens. Visible step-by-step reasoning for auditable decisions.
Execute up to 8 tool calls simultaneously per turn — double Opus 4.6's limit. Dramatically faster agentic pipelines where tasks can run concurrently.
Persistent conversation memory across sessions for API users. Custom preference and style retention without external vector stores.
SWE-bench up 9.7 points to 74.2% — resolving real GitHub issues end-to-end. Best-in-class for large codebase navigation, multi-file edits, and debugging.
Maximum output doubled from 32K to 64K tokens. Generate entire codebases, long-form reports, or multi-chapter documents in a single response.
Better handling of complex, multi-constraint prompts. Fewer unnecessary refusals. Higher alignment with user intent on nuanced tasks.
Native PDF analysis with full layout awareness. Enhanced chart/diagram comprehension. Improved handwriting recognition and image reasoning.
Significant reduction in factual errors vs Opus 4.6. Better calibrated confidence and more honest "I don't know" responses when uncertain.
| Metric | Opus 4.6 | Opus 4.7 | Change |
|---|---|---|---|
| SWE-bench Verified | 64.5% | 74.2% | +9.7% |
| MMLU | 90.8% | 93.4% | +2.6% |
| HumanEval | 93.7% | 96.1% | +2.4% |
| MATH | 87.2% | 91.8% | +4.6% |
| TAU-bench | 62.3% | 71.5% | +9.2% |
| GPQA Diamond | 70.1% | 78.3% | +8.2% |
| Max Output Tokens | 32K | 64K | 2× |
| Parallel Tool Calls | 4 | 8 | 2× |
| Extended Thinking | v1 | v2 | Upgraded |
| Hallucination Rate | Baseline | −40% | −40% |
| Context Window | 200K | 200K | Same |
| Input Price / MTok | $15 | $15 | Same |
| Output Price / MTok | $75 | $75 | Same |
| Specification | Claude Opus 4.7 | Claude Sonnet 4.6 | GPT-5.4 | GPT-4o | Gemini 3.1 Pro | Llama 4 Maverick |
|---|---|---|---|---|---|---|
| Provider | Anthropic | Anthropic | OpenAI | OpenAI | Meta (Open) | |
| Release Date | Apr 2026 LATEST | Oct 2025 | Mar 2026 | May 2024 | Mar 2026 | Apr 2025 |
| Context Window | 200K | 200K | 128K | 128K | 1M | 1M |
| Max Output | 64K | 16K | 32K | 16K | 32K | 16K |
| MMLU | 93.4% | 89.2% | 92.8% | 87.2% | 91.5% | 88.5% |
| HumanEval | 96.1% | 92.0% | 94.5% | 90.2% | 92.8% | 88.7% |
| SWE-bench | 74.2% | 64.5% | 65.8% | 48.3% | 63.2% | 52.1% |
| MATH | 91.8% | 86.5% | 90.5% | 76.6% | 89.2% | 82.3% |
| GPQA Diamond | 78.3% | 68.2% | 76.8% | 53.6% | 74.5% | 62.4% |
| TAU-bench | 71.5% | 62.8% | 66.2% | 45.2% | 60.5% | 48.9% |
| Vision / Images | Yes | Yes | Yes | Yes | Yes | Yes |
| Tool / Function Calls | Yes (8× parallel) | Yes (4×) | Yes | Yes | Yes | Yes |
| Extended Thinking | v2 (128K budget) | v1 | Yes | No | Yes | No |
| Open Source | No | No | No | No | No | Yes |
| Input Price / MTok | $15.00 | $3.00 | $30.00 | $2.50 | $1.25 | Free* |
| Output Price / MTok | $75.00 | $15.00 | $60.00 | $10.00 | $10.00 | Free* |
| API Model ID | claude-opus-4-7-20260416 | claude-sonnet-4-6-20251022 | gpt-5.4 | gpt-4o | gemini-3.1-pro | llama-4-maverick |
* Llama 4 Maverick is open-source. Free to self-host; hosted API cost varies by provider. Gemini prices shown for <128K context.
Context Window — how much text the model reads per request
Max Output — how much text the model generates per response
Opus 4.7 is optimized for long-running agentic tasks requiring multi-step planning, tool use, and error recovery.
Best-in-class for real software engineering — not just code generation, but understanding large codebases and resolving real bugs.
78.3% on GPQA Diamond — expert-level performance on physics, chemistry, biology, and mathematics problems.
200K context window handles entire books, codebases, or legal document sets in one pass. 64K output for comprehensive reports.
Extended Thinking v2 provides auditable reasoning chains. Built-in memory API simplifies stateful enterprise deployments.
Superior instruction following and reduced hallucinations make Opus 4.7 ideal for nuanced, high-stakes creative tasks.
| Task Type | Use Opus 4.7 | Use Sonnet 4.6 ($3/$15) |
|---|---|---|
| Complex multi-step agent workflows | Yes — best reliability | Acceptable for simpler flows |
| Real GitHub issue resolution | Yes — 74.2% SWE-bench | 64.5% — good alternative |
| Graduate-level reasoning/science | Yes — 78.3% GPQA | 68.2% — misses harder problems |
| Simple Q&A / chatbot | Overkill | Yes — 5× cheaper |
| High-volume text classification | Overkill | Yes — much cheaper |
| Long-context document understanding | Yes — better comprehension | Adequate for simpler docs |
| Auditable reasoning required | Yes — Extended Thinking v2 | v1 available |
* Prompt caching applies only to Anthropic models. GPT-5.4 has ~50% cache discount on repeated prefixes. Llama 4 self-hosting costs not included.
| Model | Input / MTok | Output / MTok | Cache Input / MTok | Subscription |
|---|---|---|---|---|
| Claude Opus 4.7 | $15.00 | $75.00 | $3.75 (75% off) | Pro $20/mo |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 | Pro $20/mo |
| Claude Haiku 3.5 | $0.80 | $4.00 | $0.08 | Pro $20/mo |
| GPT-5.4 | $30.00 | $60.00 | $15.00 (50% off) | ChatGPT Plus $20/mo |
| GPT-4o | $2.50 | $10.00 | $1.25 | ChatGPT Plus $20/mo |
| Gemini 3.1 Pro | $1.25 | $10.00 | $0.31 | Google One AI $19.99/mo |
| Llama 4 Maverick | Free* | Free* | Free* | Self-host required |
Prices as of April 2026. MTok = 1 million tokens. * Llama 4 is free to self-host; compute costs apply.
claude-opus-4-7-20260416. It was available immediately for all Anthropic API users and claude.ai Pro, Team, and Enterprise subscribers upon announcement.claude-opus-4-7-20260416. You can also use claude-opus-4-7-latest for the always-current version alias.