DEV Community

Searchless
Searchless

Posted on • Originally published at searchless.ai

Anthropic's AI Just Found 10,000 Critical Security Bugs. Here's Why That Matters for Every Brand

Originally published on The Searchless Journal

On May 22, 2026, Anthropic announced something that should terrify every software company on the planet — and should reframe how every brand thinks about what "AI visibility" actually means.

Anthropic's unreleased frontier model, Claude Mythos Preview, has autonomously found thousands of zero-day vulnerabilities across every major operating system and every major web browser. Not small bugs. Critical security flaws — some hiding in plain sight for 27 years.

Through Project Glasswing, a coalition that includes Amazon Web Services, Apple, Cisco, Google, Microsoft, NVIDIA, JPMorganChase, and Palo Alto Networks, that model has already been turned loose on the world's most critical software infrastructure. The results are staggering: Cloudflare alone found roughly 2,000 vulnerabilities across its infrastructure. Mozilla discovered 271 vulnerabilities in a single Firefox release cycle — ten times more than its previous model-assisted effort.

This is being covered as a cybersecurity story, and it is one. But it is also something else. It is a proof point that AI systems now evaluate companies, products, and infrastructure at a depth and scale that no brand can monitor. When an AI model can assess the security posture of your codebase, judge the quality of your software, and identify your hidden weaknesses before your own team does, "AI visibility" is no longer just about whether ChatGPT mentions your brand in an answer.

It is about whether AI systems see you accurately — in every context where they operate.

What Project Glasswing actually found

Let's get specific about what Mythos Preview has done, because the scale matters.

Anthropic's internal team used Mythos Preview to search for vulnerabilities across roughly a thousand open-source repositories from the OSS-Fuzz corpus, evaluating roughly 7,000 entry points. The model did not just find surface-level crashes. On a five-tier severity scale, Mythos Preview achieved full control-flow hijack — the most severe category — on ten separate, fully patched targets. Previous models like Claude Opus 4.6 and Claude Sonnet 4.6 managed that feat exactly zero times.

Three findings illustrate the leap:

A 27-year-old bug in OpenBSD. OpenBSD is an operating system renowned for its security focus, used to run firewalls and critical infrastructure worldwide. Mythos Preview found a vulnerability that had survived 27 years of human review and countless automated scans. It allowed an attacker to remotely crash any machine running the OS just by connecting to it.

A 16-year-old bug in FFmpeg. FFmpeg is the video encoding and decoding library embedded in innumerable pieces of software. The bug was in a single line of code that automated testing tools had hit five million times — five million — without catching the problem.

A multi-step Linux kernel exploit chain. Mythos Preview autonomously found and chained together several vulnerabilities in the Linux kernel — the software that runs most of the world's servers — to allow an attacker to escalate from ordinary user access to complete control of the machine.

None of these required human guidance. Mythos Preview identified the vulnerabilities, understood their significance, and in many cases wrote working proof-of-concept exploits entirely on its own. Anthropic engineers with no formal security training asked the model to find remote code execution vulnerabilities overnight and woke up the next morning to complete, working exploits.

What Mythos Preview can actually do — and why it shocked security researchers

The jump from previous frontier models to Mythos Preview is not incremental. It is a qualitative shift in what AI can do with code.

Cloudflare's security team described two capabilities that set Mythos Preview apart from every model they had previously tested.

Exploit chain construction. Real attacks rarely rely on a single vulnerability. They chain several small attack primitives together — a use-after-free bug turned into an arbitrary read-write primitive, then hijacked control flow, then assembled a return-oriented programming chain to take full control of a system. Mythos Preview can take multiple low-severity bugs — the kind that would traditionally sit invisible in a backlog — and reason about how to combine them into a single, more severe exploit. Previous models would identify an interesting bug, write a description of why it mattered, and then stop. Mythos Preview finishes the chain.

Proof generation. Finding a bug and proving it is exploitable are two different things. Mythos Preview does both. It writes code to trigger the suspected bug, compiles that code in a scratch environment, and runs it. If the program behaves as expected, the proof is confirmed. If it does not, the model reads the failure output, adjusts its hypothesis, and tries again. This loop matters as much as the bugs it finds. A suspected vulnerability without a working proof is speculation. Mythos Preview closes that gap autonomously.

In one notable demonstration, Mythos Preview wrote a web browser exploit that chained together four separate vulnerabilities, constructing a complex JIT heap spray that escaped both the renderer and operating system sandboxes. It autonomously developed local privilege escalation exploits on Linux by exploiting subtle race conditions and kernel address space layout randomization bypasses. And it wrote a remote code execution exploit on FreeBSD's NFS server that granted full root access to unauthenticated users by splitting a 20-gadget ROP chain across multiple network packets.

These are not theoretical capabilities. They are things the model actually did, autonomously, without human steering.

The partner results are equally dramatic

Anthropic did not keep this capability in a lab. Project Glasswing gave over 50 partner organizations access to Mythos Preview for their own defensive security work. The results from just two partners tell the story.

Cloudflare ran Mythos Preview against more than 50 of its own repositories. The model found approximately 2,000 vulnerabilities, including 400 rated high or critical severity. Cloudflare's security team noted that Mythos Preview's output had "noticeably higher quality" than previous AI-assisted scanning: fewer hedged findings, clearer reproduction steps, and less work to reach a fix-or-dismiss decision.

Mozilla's results were even more striking in relative terms. When Mozilla previously ran Claude Opus 4.6 against the Firefox JavaScript engine, the model managed to develop working shell exploits exactly twice out of several hundred attempts. Mythos Preview, running the same experiment, developed working exploits 181 times and achieved register control on 29 additional attempts. That is roughly a 90x improvement in exploit development capability.

Palo Alto Networks reported issuing five times more patches than usual in the weeks following Mythos Preview testing. Microsoft noted that patches from this effort "continue trending larger" than typical security updates.

Why this is an AI visibility story, not just a cybersecurity story

Here is where this becomes relevant for every brand reading this — not just the ones in the cybersecurity space.

Most discussions about AI visibility focus on search and answer engines. Does ChatGPT mention your brand? Does Perplexity cite you? Does Gemini recommend your product? These are important questions, and they represent the first wave of AI visibility that brands needed to reckon with.

But Mythos Preview demonstrates a second wave that most brands have not considered. AI systems are now capable of evaluating your company in contexts that have nothing to do with search results:

Code and infrastructure assessment. If your company ships software — and in 2026, nearly every company does — AI models can now evaluate its security, quality, and reliability at a level that previously required expert human security researchers. Your codebase is being judged by machines, and you do not get to see the evaluation.

Compliance and risk scoring. Financial institutions, healthcare companies, and regulated industries are already being evaluated by AI-driven risk assessment tools. Mythos Preview-level capabilities mean that evaluation depth is about to increase by an order of magnitude. Your software supply chain is about to become transparent to AI systems in ways it has never been transparent to humans.

Partner and vendor evaluation. When Cisco, AWS, Microsoft, and JPMorganChase are all using frontier AI models to evaluate software quality, they are not just evaluating their own code. The same capabilities that find zero-day vulnerabilities can be directed at vendor code, partner integrations, and supply-chain dependencies. If your company sells technology to enterprises, AI is about to start grading your security posture in procurement decisions.

Trust and reputation in AI-mediated contexts. When AI agents recommend products, evaluate vendors, or make purchasing decisions, they draw on a composite picture of a brand that includes far more than marketing copy and search presence. Software quality, security history, patch cadence, and vulnerability exposure are all becoming signals that feed into AI-mediated brand evaluation.

The brands that think AI visibility begins and ends with "getting cited by ChatGPT" are missing this entirely.

The evaluation gap is growing

There is a fundamental asymmetry emerging. AI systems can now evaluate companies at a depth and speed that companies themselves cannot match.

Consider the numbers. Mythos Preview scanned roughly 7,000 code entry points and found critical vulnerabilities across every major operating system. It did this in weeks. The same vulnerabilities had survived years or decades of human review and millions of automated tests.

Now extrapolate. When models with similar capabilities become more widely available — and Anthropic explicitly notes that "it will not be long before such capabilities proliferate" — the number of organizations that can evaluate your software at this depth will grow rapidly. Your vulnerabilities, your code quality, your security posture will become visible to AI systems long before you know they are looking.

Most brands have zero visibility into how AI systems evaluate them in security, compliance, or infrastructure contexts. They do not monitor it. They do not measure it. They do not even know it is happening.

This is the AI visibility gap that no one is talking about.

The speed of capability emergence

One of the most important details in the Anthropic announcement is how quickly these capabilities appeared.

Just last month, Anthropic's own researchers wrote that "Opus 4.6 is currently far better at identifying and fixing vulnerabilities than at exploiting them." Their internal evaluations showed Claude Opus 4.6 had a near-zero percent success rate at autonomous exploit development. Mythos Preview changed that overnight.

When Anthropic re-ran the Firefox JavaScript engine benchmark — the same experiment where Opus 4.6 managed two working exploits out of several hundred attempts — Mythos Preview produced 181 working exploits and achieved register control on 29 additional attempts. That is a leap from near-zero to dominant performance in a single model generation.

Anthropic also noted something that should concern every organization: they did not explicitly train Mythos Preview for these cybersecurity capabilities. The skills emerged as a "downstream consequence of general improvements in code, reasoning, and autonomy." The same improvements that make the model better at writing code, answering questions, and reasoning about problems also make it dramatically better at breaking into systems.

This means the next generation of general-purpose AI models — from any frontier lab, not just Anthropic — will likely inherit similar capabilities. The security community is not dealing with a specialized tool that can be contained. It is dealing with a general capability that is getting stronger with every model generation, across every frontier AI company.

What the Glasswing response tells us about the market

The industry response to Project Glasswing is instructive. This was not a quiet research paper. Anthropic assembled a coalition that includes Amazon, Apple, Google, Microsoft, Cisco, NVIDIA, JPMorganChase, CrowdStrike, Broadcom, and the Linux Foundation. They committed $100 million in usage credits and $4 million in direct donations to open-source security organizations.

That level of mobilization tells you two things.

First, the threat is real and imminent. These companies are not speculating about future AI capabilities. They have seen what Mythos Preview can do to their own code, and they are treating it as an emergency. Cisco's CTO said plainly: "AI capabilities have crossed a threshold that fundamentally changes the urgency required to protect critical infrastructure."

Second, the competitive dynamic is shifting. When every major tech company gets access to frontier vulnerability-detection AI at the same time, the advantage goes to the organizations that can fix what the AI finds fastest. Patch cadence, remediation speed, and security engineering capacity become competitive differentiators in a way they never were before.

For brands that are not Fortune 100 companies with dedicated security teams, the implications are sobering. If you ship software of any kind — whether it is a SaaS platform, an e-commerce application, a mobile app, or embedded firmware — AI systems will soon be able to evaluate its security at a level you cannot afford to match with human talent.

The broader lesson for AI visibility strategy

Project Glasswing is a specific story about cybersecurity, but the principle it illustrates applies to every brand.

AI systems evaluate companies across a growing number of dimensions: content quality in search, product quality in recommendations, security quality in infrastructure assessments, trustworthiness in compliance evaluations, and relevance in AI-mediated procurement. Most brands are optimizing for one or two of these dimensions at most.

The brands that will thrive in the AI era are the ones that understand they are being evaluated across all of these dimensions simultaneously — and that "AI visibility" means understanding and managing your presence across every surface where AI systems form opinions about you.

This is not theoretical. The same AI capabilities that find vulnerabilities in code are being applied to content quality assessment, brand sentiment analysis, vendor risk scoring, and product recommendation. The AI models making these evaluations do not distinguish between "search visibility" and "security visibility" and "compliance visibility." They evaluate holistically. Your brand is a single entity in the AI's model of the world, and every signal — from your blog posts to your bug history — contributes to how AI systems represent you.

What brands should do now

Three immediate actions, regardless of whether you are in the cybersecurity space.

Audit your AI-visible surfaces comprehensively. If your current AI visibility strategy focuses exclusively on search and answer engines, you are optimizing a fraction of the surface area. Understand that AI systems evaluate you in security, compliance, code quality, and infrastructure contexts too. Run a comprehensive AI visibility audit that looks beyond search citations.

Improve your software security posture as a brand signal. Patch cadence, vulnerability disclosure practices, and security engineering investment are no longer just operational concerns. They are brand signals that AI systems use to evaluate trustworthiness. This is especially true for B2B companies selling to enterprises that are deploying their own AI-driven vendor evaluation tools.

Monitor AI evaluation across all contexts. Just as brands monitor search rankings and social sentiment, they need to develop the capacity to monitor how AI systems evaluate them across security, compliance, and infrastructure contexts. The tools for this are still emerging, but the need is real and growing. Organizations that wait for mature monitoring tools will be flying blind during the most consequential period of AI evaluation adoption.

Treat security quality as a competitive advantage in AI-mediated markets. This is not just about avoiding breaches. It is about ensuring that when AI systems evaluate your company — whether for a recommendation, a procurement decision, or a risk assessment — your security posture does not become a negative signal. In a world where AI agents evaluate vendors behind firewalls, as OpenAI Codex moves on-premises, security hygiene becomes brand hygiene.

The trust paradox

There is a deep irony in Project Glasswing that deserves attention.

The same week that Anthropic demonstrated AI's ability to find and fix critical vulnerabilities at unprecedented scale, the AI search ecosystem was dealing with its own reliability crisis. Google AI Overviews went blank for real queries. The Bigfoot Effect showed AI citation concentration narrowing dangerously. Content provenance debates raged over SynthID and C2PA.

AI systems are simultaneously becoming more trustworthy in some dimensions and less reliable in others. They can find a 27-year-old bug in OpenBSD. They can also produce blank answers to straightforward questions. They can evaluate software quality better than most humans. They can also hallucinate fabricated quotes in published books.

For brands, this means trust is not binary. AI systems will evaluate you across multiple dimensions, and their accuracy will vary by dimension. The right strategy is not to trust or distrust AI evaluations wholesale, but to understand which dimensions matter most for your brand and ensure your presence is strong where it counts.


The bottom line

Project Glasswing is not just a cybersecurity milestone. It is a proof point that AI systems evaluate brands in contexts most brands cannot monitor. The companies that recognize this — and expand their AI visibility strategy beyond search engine citations to encompass the full range of AI-mediated evaluation — will be the ones that maintain control of their narrative in an increasingly AI-mediated world.

Your code is being evaluated. Your products are being assessed. Your infrastructure is being judged. The question is whether you know what the AI sees when it looks at you.


Find out what AI sees when it looks at your brand. Run a comprehensive AI visibility audit →


Sources

  • Anthropic. "Project Glasswing." May 22, 2026. anthropic.com/glasswing
  • Anthropic Frontier Red Team. "Claude Mythos Preview." April 7, 2026 (updated May 22, 2026). red.anthropic.com/2026/mythos-preview
  • Bourzikas, Grant. "Project Glasswing: What Mythos Showed Us." Cloudflare Blog, May 18, 2026. blog.cloudflare.com/cyber-frontier-models
  • UK AI Security Institute. "How Fast Is Autonomous AI Cyber Capability Advancing?" 2026.
  • Cisco, AWS, Microsoft, CrowdStrike, JPMorganChase partner statements in Anthropic Glasswing announcement, May 22, 2026.

Want to understand how AI systems discover, evaluate, and recommend brands across every surface? Explore Searchless AI visibility services to learn how we measure and improve your presence in the AI era.

Top comments (0)