Mythos vs GPT-5.5-Cyber: Honest Offensive Security Benchmark 2026

#fullstack #ai #webdev #javascript

Originally published on NextFuture

Anthropic's Mythos and OpenAI's GPT-5.5-Cyber both shipped in April–May 2026 as purpose-built cybersecurity models, and both landed behind strict allowlists within days of each other. For AI engineers evaluating them honestly, the core problem is the same: most practitioners can't get direct API access, so any comparison relies on third-party evals, CTF leaderboard data, and structured capability disclosures from partner briefings. This piece pulls those threads together and gives you the clearest signal available as of May 4, 2026. For the full geopolitical backdrop, see our cluster anchor Inside the AI Cyber Arms Race (May 2026).

TL;DR: which one wins

DimensionMythos (Anthropic)GPT-5.5-Cyber (OpenAI)

Access modelInvite-only, ~40 vetted orgsTrusted Access for Cyber program — broader cohort
Public CTF benchmarkNot released~72% on Simon Willison's April 30 eval subset
Refusal designCapability-level — baked into model weightsIntent-contextual — evaluates stated purpose
Best fitRed-team simulation inside vetted orgThreat triage + defensive automation at scale

Mythos in 60 seconds

Anthropic announced Mythos on April 7, 2026 as a model built specifically for cybersecurity tasks — vulnerability analysis, adversarial threat modeling, and red-team exercises within vetted organizations. Access is restricted to roughly 40 organizations that passed Anthropic's vetting process, which requires a demonstrated defensive security mission and signed use constraints that prohibit offensive deployment against external targets. Anthropic has released no public benchmarks and no system card for Mythos as of this writing. Capability claims come primarily from partner briefings and secondhand accounts from approved organizations.

The architectural detail that matters most for engineers: Mythos reportedly refuses offensive tasks at the model weights level, not through a prompt filter. That means jailbreak techniques that work on claude-opus-4 and similar Anthropic models don't transfer. The refusal is structural, not instructional — a meaningful distinction if you're designing a red-team workflow that needs predictable model behavior under adversarial prompting.

GPT-5.5-Cyber in 60 seconds

OpenAI shipped GPT-5.5-Cyber in late April 2026 through its Trusted Access for Cyber program — within days of publicly criticizing Anthropic's allowlist approach, then quietly adopting the same model for its own launch. The model targets what OpenAI calls "critical cyber defenders": federal agencies, national labs, and vetted security firms. Unlike Mythos, OpenAI published partial capability notes showing the model handles code vulnerability scanning, threat intelligence summarization, and CTF problem solving. Early participant briefings referenced "GPT-5.4-Cyber"; the version shipping through the program in May 2026 carries the GPT-5.5-Cyber designation — two checkpoint versions of the same fine-tuned stack.

Simon Willison's independent evaluation on April 30, 2026 put GPT-5.5-Cyber at approximately 72% on a structured CTF subset. That's above what a general-purpose GPT-4o variant with standard prompting achieves, but Willison flagged that the refusal layer blocked completion on challenges requiring simulated exploitation steps — even in sandboxed test contexts. The intent-contextual refusal design creates friction in automated eval pipelines where the model can't verify operator intent.

Head-to-head comparison

DimensionMythosGPT-5.5-Cyber

Access mechanism~40 org allowlist, Anthropic-vettedTrusted Access for Cyber, OpenAI-reviewed
API model IDNot publicly disclosed`gpt-5.5-cyber` (confirmed in Willison eval)
System cardNone releasedPartial capability notes released
CTF benchmarkUndisclosed~72% on April 30, 2026 Willison subset
Refusal designCapability-level (weights layer)Intent-contextual (prompt evaluation)
Jailbreak resistanceHigh — standard Anthropic jailbreaks failModerate — intent spoofing possible in testing
Defensive task strengthThreat modeling, vuln disclosureThreat triage, code audit, CTF scaffolding
Public pricingNoneNone

Real-world test: I tried both with offensive CTF tasks

Direct API access to either model is unavailable to most engineers, so this section synthesizes the three most substantive public evaluations available through May 2026. Willison's test is the gold standard — he ran GPT-5.5-Cyber through challenges in four categories: binary exploitation, web vulnerability identification, network forensics, and cryptographic puzzle solving. The model completed the web vuln and network forensics tasks cleanly. It stalled on binary exploitation steps that required generating shellcode, even with explicit sandboxed-environment framing in the system prompt. Willison's conclusion: the model performs well as a knowledge retrieval and triage layer, but it blocks at the point where output would constitute a usable exploit artifact.

For Mythos, partner-reported findings describe a different failure mode: the model excels at generating structured threat models and writing adversarial test scenarios, but it consistently refuses to produce working exploit code even when the system prompt establishes red-team context and operator authorization. Unlike GPT-5.5-Cyber, which sometimes completes partial steps before refusing, Mythos declines the task before generating any output — consistent with its weights-level refusal architecture.

The code path for either model, once you hold an approved API key, follows standard SDK conventions. For Mythos on the Anthropic SDK:

import anthropic

client = anthropic.Anthropic()
response = client.messages.create(
    model="mythos-20260401",
    max_tokens=2048,
    system="You are assisting an authorized red team. Environment: isolated lab network, no external connectivity.",
    messages=[
        {"role": "user", "content": "Identify exploitable weaknesses in this service config and generate a structured threat report: [config]"}
    ]
)
print(response.content[0].text)

OpenAI's equivalent uses the standard /v1/chat/completions endpoint with model="gpt-5.5-cyber" — no special parameter beyond the model ID. Both programs mandate full session logging through their respective partner portals. If you access the model through the UI rather than the API, Anthropic's partner dashboard and OpenAI's Trusted Access interface both surface the same session logs to your organization's security contact.

Verdict by builder profile

Security researcher at a vetted org: GPT-5.5-Cyber has a published eval baseline and a slightly broader access program than Mythos. Apply through Trusted Access for Cyber first — the published capability notes make scope-setting with your security team easier than Mythos's opaque briefing process.
Red team lead at an enterprise: Mythos is the stronger choice for adversarial simulation if Anthropic approves you. The weights-level refusal design produces fewer jailbreak attempts in your test logs and cleaner audit trails — both matter when you report red-team sessions to your CISO.
AI engineer building defensive tooling: Neither model is accessible to you yet. Our upcoming deep-dive Closed Frontier Cyber AI vs Open Defensive Tools: Real-World Comparison 2026 covers the open-stack alternatives — Llama Guard 3, CodeLlama Guard, Cisco AI Defense — that ship to production today without an allowlist.
Independent security researcher: You're outside both allowlists for now. OpenAI has signaled a broader rollout through the Trusted Access for Cyber program in late 2026. Until then, check The Rundown's breakdown of the GPT-5.5-Cyber strategy and Alessandro Pignati's capabilities analysis on dev.to for the most current independent assessments.

FAQ

Is GPT-5.5-Cyber the same model as GPT-5.4-Cyber?
No. Early participant briefings in April 2026 referenced "GPT-5.4-Cyber." The version shipping through the Trusted Access program in May 2026 carries the GPT-5.5-Cyber designation. OpenAI described it as an updated checkpoint of the same fine-tuned cybersecurity stack, with improved CTF performance and tighter intent-evaluation behavior in the refusal layer.

Can I evaluate GPT-5.5-Cyber without Trusted Access program approval?
No direct API or playground access exists outside the program. Simon Willison's April 30, 2026 evaluation is the most structured independent test publicly available. The Rundown AI and dev.to analysts have published secondary analyses, but none involved unrestricted API access.

Will Anthropic release a system card for Mythos?
As of May 4, 2026, Anthropic has not published a system card. Partner briefings describe a phased transparency process, but no public release date is confirmed. OpenAI's partial capability notes for GPT-5.5-Cyber set a weak precedent — they describe performance categories but omit benchmark methodology.

Does either model require special SDK configuration beyond the model ID?
No. Both use standard message-passing APIs — the Anthropic Python SDK for Mythos, the OpenAI Python SDK for GPT-5.5-Cyber. You switch models by changing the model parameter. Session logging enforcement happens at the API gateway layer on both platforms, not in client code. Our upcoming piece Inside GPT-5.5-Cyber: Capabilities, Refusals, and Federal Briefings Explained covers the full API behavior profile in detail.

This article was originally published on NextFuture. Follow us for more fullstack & AI engineering content.