DEV Community

Cover image for An AI Found a 27-Year-Old Bug in OpenBSD- The Most Security-Hardened OS on Earth
VIKAS
VIKAS

Posted on

An AI Found a 27-Year-Old Bug in OpenBSD- The Most Security-Hardened OS on Earth

"If OpenBSD has a 27-year-old bug, what's hiding in your codebase?"

On April 7, 2026, Anthropic announced something that made the entire security community stop scrolling.

Their new model โ€” Claude Mythos Preview โ€” had autonomously found a 27-year-old vulnerability sitting inside OpenBSD's TCP stack.

Not Linux. Not Windows. Not some legacy enterprise COBOL blob.

OpenBSD. The operating system that has had only two confirmed remote holes in its default install in its entire history. The OS that runs firewalls, SSH servers, and critical infrastructure for governments and enterprises worldwide. The OS where code review isn't a suggestion โ€” it's a religion.

That's where the bug lived. Since 1998. And no human, fuzzer, or static analyzer ever found it.

Claude Mythos did. In a matter of hours. Autonomously. For under $50 in compute.

Let's get into exactly what happened.


๐Ÿก First: Why Does OpenBSD Even Matter?

Quick context for developers who haven't touched BSD.

OpenBSD is a free Unix-like operating system forked from NetBSD in 1995. Its entire identity is built around one thing: proactive security. Not just "we patched it when someone reported it" โ€” but "we audit everything before it ships and we assume the attacker is smarter than us."

OpenBSD invented or popularized:

  • pledge(2) and unveil(2) โ€” syscall and filesystem sandboxing built into every process
  • pf โ€” the packet filter now used inside macOS, pfSense, OPNsense, and FreeBSD
  • OpenSSH โ€” yes, that OpenSSH, running on virtually every server on earth
  • W^X (Write XOR Execute) memory protection โ€” years before mainstream adoption
  • Stack canaries, ASLR, and kernel relinking on every boot

When OpenBSD says their default install has had two remote holes in its entire history, they mean it. This isn't marketing. It's a 30-year track record.

So when an AI finds a remotely-exploitable bug that's been sitting there since 1998 โ€” it matters.


๐Ÿ” The Bug: TCP SACK Integer Overflow

Here's the actual technical breakdown, straight from Anthropic's red team writeup.

Background: What is SACK?

TCP's Selective Acknowledgment (SACK) was proposed in RFC 2018 in October 1996. The problem it solved: if you send packets 1โ€“20 and the receiver only gets 1โ€“10 and 15โ€“20, old TCP would make you resend everything from 11 onward. SACK lets the receiver say "I got 15โ€“20, just resend 11โ€“14." Huge performance improvement. Every major TCP implementation added it.

OpenBSD added SACK support in 1998.

The Two-Flaw Interaction

The vulnerability requires two independent flaws to interact. Neither flaw alone causes a crash. Together, they do.

Flaw 1: No lower-bound validation on SACK ranges.

OpenBSD tracks SACK state as a singly linked list of "holes" โ€” byte ranges sent but not yet acknowledged. When a new SACK block arrives, the kernel walks this list shrinking or deleting holes, then appending a new hole at the tail if needed.

The implementation correctly checked the upper bound of incoming SACK ranges against the send window โ€” but never validated the lower bound. Under normal conditions, this doesn't matter. Real TCP peers never send pathological SACK blocks. But an attacker isn't a real peer.

Flaw 2: Reachable null pointer write.

If a single SACK block simultaneously:

  • deletes the only hole in the list (triggering the delete path), AND
  • triggers the append-a-new-hole path (because the new acknowledged range extends past what was previously known)

...then the kernel tries to write to p->next โ€” but p is now NULL because the walk just freed the only node.

Under normal conditions, these two conditions are mutually exclusive. You can't simultaneously satisfy "the SACK block's start is at or below the hole's start" AND "the SACK block's start is strictly above the highest previously acknowledged byte." One number can't be both.

The breakthrough: signed integer overflow.

TCP sequence numbers are 32-bit integers that wrap around. OpenBSD compared them with the expression:

(int)(a - b) < 0
Enter fullscreen mode Exit fullscreen mode

This is correct when a and b are within 2ยณยน of each other โ€” which legitimate sequence numbers always are. But because Flaw 1 lets an attacker place the SACK block's start anywhere, they can put it roughly 2ยณยน away from the real window.

At that distance, the subtraction overflows the sign bit in both comparisons simultaneously. The kernel concludes the attacker's start is below the hole start and above the highest acknowledged byte at the same time.

The impossible condition is now satisfied. The only hole gets deleted. The append path fires. The kernel writes through a null pointer. Machine crashes.

2 crafted packets โ†’ null pointer dereference โ†’ kernel panic โ†’ remote DoS
Enter fullscreen mode Exit fullscreen mode

Any OpenBSD host responding over TCP was vulnerable. Firewalls. SSH gateways. Web servers. VPN endpoints. All of them.

The Fix

OpenBSD's official 7.8 errata patch 025, dated March 25, 2026, fixed this by:

  1. Adding a lower-bound check on sack.start relative to snd_una
  2. Guarding the append path with an explicit p != NULL check

Two lines. 27 years. Done.


๐Ÿค– How Mythos Found It

This is where it gets genuinely fascinating.

Anthropic's red team ran ~1,000 scaffold runs against OpenBSD's source code. Total cost: under $20,000. The specific run that surfaced this bug: under $50.

The $50 figure is technically accurate but misleading โ€” as Anthropic themselves note, you can't know in advance which run will succeed. The $20K total is the honest number.

What's remarkable is how Mythos found it. Every existing automated tool had missed this:

  • SAST (static analysis): missed it โ€” the logic requires understanding how two separate code paths interact under adversarial integer arithmetic
  • Fuzzers: missed it โ€” sending random packets won't naturally produce a value exactly 2ยณยน away from the current send window
  • Code audits: missed it โ€” skilled humans reviewed this code for decades and saw two individually-benign conditions

Mythos caught it by reasoning about code semantics โ€” understanding what the code does under adversarial conditions, not just what it looks like. That's a qualitatively different capability from what any previous automated tool has had.

To put this in context:

"In CyberGym's directed vulnerability reproduction tests, Mythos Preview scored 83.1% versus Claude Opus 4.6's 66.6%. On Firefox 147 JavaScript engine testing, Mythos produced 181 full shell exploits. Opus 4.6 produced two."

That's not an incremental improvement. That's a different tool category.


๐Ÿ“ฆ The Bigger Picture: What Else Did It Find?

The OpenBSD bug is the headline, but it's not the only finding.

FFmpeg H.264 Codec โ€” 16 years old

A bug introduced in a 2003 commit, dormant until exposed by a 2010 refactor, survived there since. The slice number 65535 collides exactly with a sentinel value, enabling out-of-bounds writes. Fuzzers ran against this code path 5 million times without triggering it. Mythos caught it by reasoning about what the sentinel means semantically.

Fixed in FFmpeg 8.1 "Hoare", released March 16, 2026.

FreeBSD NFS Server โ€” 17 years old (CVE-2026-4747)

This one is scarier. A stack buffer overflow in the NFS server's authentication request handler. Attacker-controlled data is copied into a 128-byte stack buffer, but the length check allows up to 400 bytes.

Why did fstack-protector miss it? Because the buffer was declared as int32_t[32] โ€” the compiler only inserts stack canaries for functions containing char arrays.

Mythos didn't just find it. It built a 20-gadget ROP chain split across six sequential NFS packets to achieve unauthenticated remote root โ€” fully autonomously, no human involved after the initial prompt.

Linux kernel โ€” multiple LPE chains

Starting from a list of 100 CVEs from 2024โ€“2025, Mythos filtered to 40 exploitable candidates and successfully wrote privilege escalation exploits for more than half. These included KASLR bypasses, cross-cache heap reclamation, and credential structure overwrites. One chain โ€” starting from a 1-bit adjacent physical page write primitive โ€” cost under $1,000 to develop. Human experts said it would have taken weeks.


๐ŸŒŠ The Controversy: Project Glasswing and Restricted Access

Anthropic isn't releasing Mythos to the public. At all.

Instead, they launched Project Glasswing โ€” a coalition of 40+ organizations that get restricted access to use Mythos defensively:

Founding partners include: Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks.

Anthropic is committing:

  • $100 million in Mythos usage credits for defensive security work
  • $2.5 million to Alpha-Omega through OpenSSF (Linux Foundation)
  • $1.5 million to the Apache Software Foundation

The criticism from some corners: "This is asymmetric. You've given 40 companies a weapon and told everyone else good luck."

The counterpoint: "Bad actors don't wait for permission. They're already using whatever models they have. Glasswing at least gets defenders ahead of attackers."

The AISLE pushback is worth reading:

Researchers at AISLE tested Anthropic's showcase vulnerabilities on small, open-weights models. Their finding: a 5.1 billion parameter open model recovered the core analysis chain of the 27-year OpenBSD bug. A 3.6B parameter model at 11 cents per million tokens detected 8/8 of Anthropic's showcased vulnerabilities.

Their conclusion: "The moat in AI cybersecurity is the system, not the model."

In other words โ€” Mythos may have announced a new era, but the capability isn't exclusive to Mythos.


๐Ÿ“Š The Numbers That Should Keep You Up at Night

From the CrowdStrike 2026 Global Threat Report, cited in the Mythos coverage:

Metric 2022 2026
Average time-to-exploit after disclosure 30 days 5 days
CVEs exploited on/before disclosure day ~5% 32.1%
Average attacker breakout time ~84 min 29 minutes
Fastest observed breakout hours 27 seconds

And on the defense side:

  • Median organizational patch window: ~70 days (unchanged since 2022)
  • Organizations deploying critical patches within 30 days: dropped from 45% โ†’ 30%

Offense is accelerating. Defense isn't. That gap is what makes the Mythos announcement genuinely alarming โ€” not the individual bugs, but the structural shift in who finds them, how fast, and at what cost.


๐Ÿค” What Should Developers Actually Do?

Practically speaking, from this week's news:

If you run OpenBSD:

# Apply the patch immediately if you haven't
syspatch
# Verify patch 025 is applied
syspatch -l
Enter fullscreen mode Exit fullscreen mode

If you use FFmpeg:

# Update to 8.1+ immediately
# Check your package manager or build system
ffmpeg -version  # should show 8.1 or later
Enter fullscreen mode Exit fullscreen mode

If you run FreeBSD with NFS:
The FreeBSD security advisory for CVE-2026-4747 is out โ€” patch immediately. NFS exposed to untrusted networks is critical severity.

For everyone:

  • Shorten your patch cycles. The "patch in 70 days" culture is a liability.
  • Enable auto-updates for critical infrastructure where possible.
  • Treat CVE-tagged dependency updates as urgent, not scheduled maintenance.
  • Start thinking about AI-assisted vulnerability scanning as part of your SDLC โ€” because your adversaries already are.

๐Ÿ’ฌ The Bigger Question

Here's what I keep coming back to:

OpenBSD is arguably the most carefully audited open-source codebase in the world. Volunteer developers who read every line before it ships. The project that rewrites code from scratch rather than accept anything with an unacceptable license. The OS whose entire identity is "we take security seriously when nobody else does."

And there was a 27-year-old null pointer dereference in the TCP stack.

Not because the OpenBSD team is bad at their jobs. Because the bug required reasoning about the semantic interaction of two independent code conditions under adversarial integer arithmetic โ€” a class of analysis that human reviewers are structurally bad at and fuzzers can't reach by brute force.

That's the actual shift. Not "AI is better than humans at security." It's "AI reasons differently than humans and fuzzers about code correctness โ€” and that difference surfaces bugs that both approaches systematically miss."

The question isn't whether this changes security. It already has.

The question is: who gets to use it.


๐Ÿ“š Sources & Further Reading


If this made you think differently about legacy codebases, about fuzzing's limits, or about what "security-hardened" actually means in 2026 โ€” drop a โค๏ธ and share it. The security community needs to be having this conversation seriously, not just breathlessly.

And maybe go check when you last updated your FFmpeg dependency.

Top comments (0)