AI

Claude Opus 4.8 Bets on Honesty Over Headline Benchmarks

Published

2 months ago

May 28, 2026

Anthropic released Claude Opus 4.8 on Thursday, an upgrade to its flagship model that ships at the same price as Opus 4.7 and that the company itself calls a “modest but tangible” improvement. Most of the announcement is about benchmarks. Further down sits the number that should interest the businesses paying the bill: the model is around four times less likely than its predecessor to let flaws in its own code pass without comment.

That figure reframes what a point release is for in 2026. Coding scores have crept up the leaderboard for two years straight. Whether a company will let an agent run overnight without a human watching has been a separate question, and a slower-moving one.

The Honesty Number That Outweighs the Benchmarks

Anthropic trains all its models to be honest, which in practice means not claiming work is finished when the evidence is thin. The well-documented failure mode of large language models is the opposite. They jump to conclusions, report success, and leave a human to discover later that the code did not compile or the analysis quietly skipped a step.

Opus 4.8 is built to catch itself. In Anthropic’s own evaluations, it is roughly four times less likely than Opus 4.7 to allow a flaw in code it wrote to go unremarked. Early testers describe the same behavior in plainer terms: the model flags uncertainty about its own output instead of papering over it, and pushes back when a plan does not hold together.

The alignment review points the same direction. Anthropic’s safety team reported that misaligned behavior, such as deception or going along with misuse, runs substantially lower than in Opus 4.7 and lands close to the rates of Claude Mythos Preview, the company’s best-aligned model so far.

Opus 4.8 reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user’s best interest.

That assessment came from Anthropic’s Alignment team in the release notes. It matters because the company has not always been able to say it. Anthropic spent part of 2025 explaining why an earlier Claude reached for coercive tactics in tests, work that traced Claude’s blackmail behavior back to patterns in its training data. A model that reliably says “I am not sure this is right” is the commercial answer to that history.

Claude Opus 4.8 launch brings honesty gains and a gated Mythos model.

What Else Anthropic Shipped on Thursday

The model did not arrive alone. Three feature changes landed with it, each aimed at letting Claude take on larger jobs with less hand-holding. You can read the full breakdown in Anthropic’s Claude Opus 4.8 release notes.

Dynamic workflows. Now in research preview inside Claude Code, this lets Claude plan a task, spin up hundreds of parallel subagents in one session, then verify the results before reporting back. Anthropic says it can now carry codebase-scale migrations across hundreds of thousands of lines from kickoff to merge, using the existing test suite as the bar. It is limited to Enterprise, Team, and Max plans.
Effort control. A new slider beside the model selector lets users decide how hard Claude works on a response. Higher settings think more often and more deeply; lower settings answer faster and burn through rate limits more slowly. The control is on every plan.
Mid-task system entries. The Messages API (application programming interface) now accepts system instructions inside the messages array, so developers can change permissions, token budgets, or environment context while an agent is still running, without breaking the prompt cache.

Fast mode also got cheaper. The model can run at 2.5 times its standard speed, and that mode now costs three times less than it did on previous models. For workloads where latency is the constraint rather than raw cost, that is the most immediate change of the day.

Pricing Held Flat While Fast Mode Got Cheaper

Standard pricing did not move. Opus 4.8 costs the same per token as the model it replaces, which is unusual in a market where each capability bump has tended to arrive with a price tag attached.

The table below sets the two tiers side by side. All figures are per million tokens, drawn from Anthropic’s published API pricing.

Tier	Input (per 1M tokens)	Output (per 1M tokens)	Speed
Opus 4.8 standard	$5	$25	Baseline
Opus 4.8 fast mode	$10	$50	2.5x baseline

There is a second cost story underneath the headline rates. Opus 4.8 defaults to “high” effort, which Anthropic says spends about as many tokens on coding tasks as Opus 4.7’s default did, but gets more done with them. Databricks, testing the model in its Genie data agent, reported reasoning over PDFs and diagrams at 61% lower token cost than Opus 4.7. The sticker price is flat; the effective price per finished task is the number that actually fell.

Where Opus 4.8 Lands Against GPT-5.5

Anthropic frames Opus 4.8 as competitive with or ahead of OpenAI’s GPT-5.5 across coding, agentic skills, reasoning, and knowledge work. The full comparison table sits in the system card; the release notes surface a handful of specific results worth reading carefully.

On Online-Mind2Web, a test of how well a model drives a web browser through real tasks, Opus 4.8 scored 84%, which one browser-agent tester called a meaningful jump over both Opus 4.7 and GPT-5.5. On an internal Super-Agent benchmark, a testing partner said Opus 4.8 was the only model to complete every case end-to-end, beating earlier Opus models and GPT-5.5 at the same cost. And on a Legal Agent Benchmark, it became the first model to clear 10% on the strictest all-pass standard.

Those are partner-reported figures, not independent audits, and the all-pass legal number is a reminder of how far frontier models still are from finishing hard professional work without help. A 10% pass rate is a lead in its category and a long way from done.

The reliability story ties back to where the money is moving. Anthropic’s emphasis on agents that finish tasks rather than chatbots that answer questions echoes a wider rotation, the same one driving how Claude’s model line is steering investors past the chip trade toward security, finance agents, and the infrastructure that runs long jobs.

The Mythos Model Anthropic Won’t Release Yet

The most telling line in Thursday’s announcement was about a model that is not for sale. Anthropic said a small number of organizations are already using Claude Mythos Preview for cybersecurity work under Project Glasswing, and that Mythos-class models, more capable than Opus, will reach customers “in the coming weeks” once stronger safeguards exist.

A 10,000-Vulnerability Haul in One Month

Project Glasswing launched in April 2026 with roughly 50 partners, a roster that includes Amazon Web Services, Apple, Google, Microsoft, NVIDIA, and JPMorganChase. Mythos powers it. In the first month, the program turned up results that read less like a product demo and more like a warning.

More than 10,000 high- or critical-severity vulnerabilities found across partner software in the first month, per Anthropic’s first Project Glasswing progress update.
6,202 high- or critical-severity flaws identified across more than 1,000 open-source projects.
90.6% of 1,752 findings reviewed by independent firms held up as valid, with 62.4% confirmed high or critical.

Individual partners posted eye-catching numbers too. Cloudflare reported 2,000 bugs, 400 of them high or critical. Mozilla logged hundreds, in line with an earlier single Mythos scan that surfaced 271 Firefox bugs. Across the Project Glasswing partner coalition, Anthropic said the rate of bug-finding rose by more than a factor of ten.

Why the Safeguards Aren’t Ready

The same capability that finds 10,000 flaws can write the exploits for them. That is why Anthropic is not selling Mythos to everyone. The company is blunt about the reason: no organization, itself included, has yet built safeguards strong enough to keep a model this capable from being turned to severe harm.

Its interim answer is a Cyber Verification Program, which lets vetted security professionals reach certain Mythos capabilities without the usual safety restrictions. Everyone else waits. So the company is shipping the honesty and reliability gains of Opus 4.8 to the whole market while holding its sharpest tool behind a gate.

If the promised safeguards arrive on schedule, the gap between what Anthropic sells and what it keeps in the lab closes within weeks. If they slip, Opus 4.8 stays the most capable model most customers can actually buy.

Frequently Asked Questions

How much does Claude Opus 4.8 cost?

Standard usage is $5 per million input tokens and $25 per million output tokens, the same as Opus 4.7. Fast mode, which runs at 2.5 times the speed, costs $10 per million input tokens and $50 per million output tokens.

How is Opus 4.8 different from Opus 4.7?

It posts higher scores on coding, agentic, reasoning, and knowledge-work benchmarks, and Anthropic says it is about four times less likely to let flaws in its own code go unflagged. The price is unchanged, and its alignment metrics are better than Opus 4.7.

What does the new effort control do?

It lets users choose how much work Claude puts into a response. Higher settings (“extra” or “max”) think more deeply and spend more tokens for better answers; lower settings respond faster and use rate limits more slowly. The control is available on all plans, with high as the default.

What are dynamic workflows in Claude Code?

A research-preview feature that lets Claude plan a large job, run hundreds of parallel subagents in one session, and verify the output before reporting back. It can handle codebase-scale migrations and is limited to Claude Code for Enterprise, Team, and Max plans.

Can developers access Opus 4.8 through the API?

Yes. It is available everywhere today, and developers can call it through the Claude API using the model identifier claude-opus-4-8.

What is Claude Mythos Preview?

It is a more capable, unreleased Anthropic model used for cybersecurity work under Project Glasswing. Anthropic is not making it generally available yet because it says no one has built safeguards strong enough to prevent misuse, though it expects to release Mythos-class models in the coming weeks.