ayush
Tags

© 2026 Ayush Sharma. Built with care.

All posts
#ai#security#essay#claude

GPT-5.5 Made the Mythos Restriction Obsolete

The AISI benchmark shows GPT-5.5 matches Mythos on offensive cyber tasks. The case for model-specific restriction got much harder to make.

May 11, 2026·9 min read
Dark cover with cyan glow and white title text

On May 1st, the UK AI Safety Institute published a benchmark table that quietly ended one chapter of the Mythos access debate.

The number that matters is 71.4. That is the percentage of Expert-tier offensive cybersecurity tasks GPT-5.5 completed in AISI's evaluation. Mythos Preview scored 68.6. The standard errors overlap: plus or minus 8.0% for GPT-5.5, plus or minus 8.7% for Mythos. Within measurement noise, they are indistinguishable.

GPT-5.5 is available to anyone with a credit card and an OpenAI account. Mythos Preview is gated behind Project Glasswing, a vetted consortium of roughly 52 organizations that Anthropic stood up in April. The White House has told Anthropic not to expand that list, even modestly, citing national security concerns.

Here is the argument I want to make: you cannot gate a capability by restricting one model when a commercially available model matches it on the same benchmark. The Mythos restriction is model-indexed. The risk, as AISI itself concluded, is not.

This is not a knock on the restriction's intention. Anthropic was trying to do something real. The problem is structural. And the AISI numbers made the problem visible in a way that is hard to argue around.

What Mythos is

Anthropic released a limited preview of Claude Mythos in early 2026 under Project Glasswing, an initiative aimed at securing critical infrastructure by giving vetted organizations access to a model unusually capable at finding vulnerabilities. The partner list includes Amazon, Apple, Google, Microsoft, Nvidia, Palo Alto Networks, CrowdStrike, Broadcom, Cisco, JPMorgan Chase, and the Linux Foundation, among roughly 40 others. About 52 organizations total.

The pitch is defensive: give major infrastructure owners an AI that can find their own vulnerabilities before attackers do. Mythos autonomously identified thousands of zero-day vulnerabilities across every major operating system and web browser during internal testing. It was the first model to complete a simulated 32-step corporate network attack end-to-end.

That last fact is what got the White House's attention. A 32-step attack chain is not a script kiddie tool. It requires chaining vulnerability discovery, lateral movement, privilege escalation, and persistence. Human experts need roughly 20 hours to work through the same simulated scenario. Mythos did it autonomously. That is a meaningful threshold, and I understand why policymakers noticed it.

Anthropic's response was to keep access narrow. A reasonable first move. The White House response was to say even that narrow group is too wide, and to block Anthropic from expanding to the roughly 70 additional companies that applied.

Then AISI published the GPT-5.5 evaluation.

The numbers

AISI runs models on a graduated difficulty scale. The Expert tier is where the alarming results live.

GPT-5.5 scored 71.4% on Expert tasks (plus or minus 8.0%, 1 standard error). Mythos Preview scored 68.6% (plus or minus 8.7%). For context: GPT-5.4 scored 52.4%, and Opus 4.7 scored 48.6%. The jump from GPT-5.4 to GPT-5.5 is larger than the current gap between GPT-5.5 and Mythos.

The end-to-end attack simulation (AISI calls it TLO, a full 32-step corporate network scenario) breaks the tie slightly differently. Mythos completed it in 3 of 10 attempts. GPT-5.5 completed it in 2 of 10 attempts. Mythos is still marginally ahead on the hardest single task, but marginally, and against a model that is commercially available.

AISI's own reading was careful. The institute did not conclude that GPT-5.5 is uniquely dangerous. It concluded that the capability is "a byproduct of more general improvements in long-horizon autonomy, reasoning, and coding" across frontier models, not a breakthrough unique to any single model. The threat is a trajectory. Mythos arrived at a threshold first. GPT-5.5 arrived within weeks.

Why restriction-by-model fails

The Mythos restriction rests on an unstated assumption: that Mythos represents a distinct, model-specific capability that can be separated from the rest of the ecosystem.

That assumption was plausible when Mythos was announced. It was already questionable when GPT-5.5 launched. After the AISI evaluation, it is hard to hold.

If the risk is "this specific model can complete Expert-tier offensive cyber tasks at scale," then restricting that model does real harm-reduction work, because no other model can do the same thing. That was the scenario being priced in.

If the risk is "frontier models in this capability bracket can complete Expert-tier offensive cyber tasks," then the question changes. The question becomes: which capability thresholds trigger which constraints, and how do you enforce that regardless of which company's model reaches the threshold? Model-specific restriction does almost nothing in this second scenario. An attacker who cannot access Mythos can access GPT-5.5, or wait sixty days for whatever comes after that.

The AISI data points at the second scenario. And AISI said so directly.

Anthropic was not wrong to restrict Mythos. The restriction was structurally incomplete from the beginning. It was a single-point control applied to a general capability trend. Those controls work in hardware, where the physical cost of replication is high. In software, and in a competitive market with four frontier labs shipping roughly every quarter, they leak fast.

The jailbreak finding

There is a second number from the AISI evaluation that deserves its own paragraph.

AISI red-teamed GPT-5.5 for malicious cyber use and identified a universal jailbreak that caused the model to produce violative content across all 95 malicious cyber queries in the test set, including in multi-turn agentic settings. The attack required six hours of expert red-teaming to develop.

Six hours. Expert red-teamers. Universal coverage across the entire offensive cyber query set.

I am not saying jailbreaks are trivial. Six hours of skilled work is real friction. Most casual users will not attempt it. But if the access controls on Mythos represent weeks of legal and policy effort, and the equivalent friction on a commercially available model is six person-hours, the asymmetry matters.

Jailbreaks do not stay private. Once developed, they circulate. The six-hour investment is a one-time cost paid by whoever finds it first. After that it costs nothing to share.

This does not mean GPT-5.5 should be restricted. The question of access to general-purpose AI tools is different from the question of access to specialized offensive tools. My point is narrower: if your safety model relies on the commercial model being unable to do the dangerous thing, you need to verify that more carefully than the current evaluation cadence allows. The AISI found a universal jailbreak on a routine red-team. That is not a reassuring baseline.

What the right frame looks like

The AISI evaluation and the White House response are pointing at the same problem and reaching for different solutions.

The White House is moving toward a pre-deployment review requirement: labs would submit models for government evaluation before public release. Google, Microsoft, and xAI signed agreements for this process in early May. Anthropic has not, at last report.

Pre-deployment review is a better frame than model-specific restriction, for the structural reason above: it applies a consistent standard across all models rather than tracking individual products. The gap between Anthropic and the White House on Mythos access made sense when Mythos was the only model that could do what Mythos does. It makes less sense now. If the standard is "can this model complete Expert-tier offensive cyber tasks end-to-end," both Mythos and GPT-5.5 clear it.

What I would want from a serious capability-based oversight framework starts with thresholds, not model names. Define the behavior: end-to-end attack chain completion at a given pass rate on defined scenarios. Any model that crosses the threshold enters a different regulatory category, regardless of which lab built it. Second, the evaluation cadence needs to track the release cadence. Frontier labs are shipping every few months. An annual review is not a security control; it is a documentation exercise. Third, jailbreaks belong in the threat model. A model that is safe under direct prompting but falls to six hours of red-teaming is not safe in the sense that matters for offensive use.

None of this is easy to implement. I do not have a specific bill to point to. But these are the questions the AISI data forces.

What Anthropic should do

Anthropic's position is awkward. They built the most capable offensive AI tool, tried to route it only to defenders, and then watched another lab ship something equivalent commercially before the restriction framework even settled.

The Project Glasswing list is real and contains organizations using Mythos responsibly. The defensive use case is real too. If Mythos can find vulnerabilities in critical infrastructure before attackers do, some version of restricted access to critical infrastructure operators is legitimate and worth preserving.

But the argument for keeping Mythos uniquely restricted is gone. The AISI said so. The benchmarks said so. The right move now is to engage with the pre-deployment review framework on terms that can scale, because model-specific restriction cannot scale once the capability is general.

Holding out from the review process while GPT-5.5 goes through it produces a strange outcome: Anthropic bears the compliance cost of Glasswing vetting without the policy legitimacy of an agreed oversight framework, while OpenAI has lower compliance overhead and a published safety evaluation. That is not a stable position.

Closing

The AISI evaluation is not a verdict that AI is too dangerous to develop. It is a more specific finding: the assumption that any single model represents a unique capability inflection is probably wrong, and it gets more wrong with every model generation.

Mythos was first to complete the 32-step chain. GPT-5.5 is within measurement noise. The next model, or the one after, will probably not be a debate at all. It will be a number in a table.

The restriction was worth trying. The structure it assumed did not survive contact with the next quarterly release cycle. That is not a failure of Anthropic's judgment in isolation. It is a property of the technology. Policy built around individual models will keep reaching that conclusion.

The number that matters next is not 71.4 or 68.6. It is the score of whatever comes out in Q3.

On this page

  • What Mythos is
  • The numbers
  • Why restriction-by-model fails
  • The jailbreak finding
  • What the right frame looks like
  • What Anthropic should do
  • Closing

Found this useful? Share it, or send a note.

PreviousProject Glasswing and the Open-Weights Problem