What the OpenClaw and Moltbook Breaches Reveal About AI Agent Security

#security #webdev #ai #opensource

Two major AI agent projects breached in two weeks — one exposing 42,900 control panels to the internet, the other leaking 1.5 million API keys through a misconfigured database. This post analyzes the technical root causes and identifies the architectural patterns missing across the ecosystem.

In the span of two weeks, the two most popular projects in the AI agent ecosystem suffered security breaches that exposed millions of credentials, granted attackers remote code execution on user machines, and turned an entire social network's database into a public read-write endpoint. The projects are different. The vulnerabilities are different. But the root cause is the same.

This is not a hit piece on either project. OpenClaw and Moltbook represent genuine innovation — the kind that pushes an ecosystem forward. But they also represent the moment where AI agent adoption outpaced AI agent security, and the architectural gaps they exposed deserve serious technical analysis. Because those gaps exist in every AI agent deployment today, not just these two.

OpenClaw: From One Click to Full Host Compromise

OpenClaw — the open-source personal AI agent formerly known as Clawdbot, then Moltbot — crossed 150,000 GitHub stars in its first weeks. It drew two million visitors in a single week and triggered a Mac mini shortage in U.S. stores. It also shipped with a default configuration that bound its control panel to 0.0.0.0:18789, listening on all network interfaces, with no authentication required for localhost connections.

The critical vulnerability, CVE-2026-25253 (CVSS 8.8), is a one-click remote code execution chain discovered by security researcher Mav Levin at depthfirst. The kill chain is worth understanding in detail because it illustrates a pattern we'll see again:

Victim clicks a crafted link or visits a malicious page.
Client-side JavaScript extracts the gateway authentication token — the Control UI trusts a gatewayUrl parameter from the query string without validation.
The attacker establishes a WebSocket connection back to the victim's local OpenClaw instance. The server doesn't validate the WebSocket origin header, so the victim's own browser bridges the connection past localhost restrictions.
Using the stolen token's privileged scopes, the attacker disables the sandbox (exec.approvals.set = off) and escapes Docker (tools.exec.host = gateway).
Full remote code execution on the host machine. The entire chain executes in milliseconds.

This was patched in version 2026.1.29. But SecurityScorecard's STRIKE team reports 42,900 unique IP addresses hosting exposed OpenClaw control panels across 82 countries as of February 10, 2026. Of these, they estimate 15,200 are directly vulnerable to RCE. Their data indicates that about 78% of exposed instances are still running older, unpatched versions branded as "Clawdbot" or "Moltbot."

Two additional CVEs compound the problem: CVE-2026-24763 (CVSS 8.8), a Docker sandbox escape via PATH manipulation, and CVE-2026-25157 (CVSS 7.8), an SSH command injection in the macOS app. All patched in the same release, all requiring that users actually update.

But patching individual CVEs doesn't address the deeper architectural issue. OpenClaw's native tools — exec, bash, file system access — are in-process calls. They don't traverse a network boundary. There's no proxy you can place between the agent's intent and its execution because there's no network hop to intercept. When a skill tells the agent to run curl to exfiltrate ~/.openclaw/credentials/ to an external server, that command executes within the agent's own process context. A network-level firewall sees localhost traffic. An EDR sees normal process behavior. The threat is semantic, not network-observable.

The Supply Chain Is Already Compromised

The ClawHub marketplace — where users discover and install third-party skills — has become an active attack surface. Two independent security audits paint a grim picture.

Snyk's ToxicSkills research scanned 3,984 skills from ClawHub and skills.sh as of February 5, 2026. They found 76 confirmed malicious payloads designed for credential theft, backdoor installation, and data exfiltration — eight of which were still live on clawhub.ai at time of publication. A separate scan found 283 skills (7.1% of the registry) that expose sensitive credentials in plaintext through the LLM's context window and output logs. These aren't malware. They're functional skills with insecure design patterns that turn the agent into an unintentional exfiltration channel.

Koi Security's audit of 2,857 skills found 341 malicious entries. Of these, 335 trace back to a single coordinated campaign codenamed ClawHavoc, which distributed Atomic Stealer (AMOS), a commodity macOS infostealer. The attack was social engineering: skills named solana-wallet-tracker and youtube-summarize-pro had professional documentation but contained fake "Prerequisites" sections that tricked users into installing malware. The campaign ran January 27–29 and targeted credential files, crypto wallets, SSH keys, and browser passwords.

The marketplace's only barrier to entry is a GitHub account at least one week old.

Moltbook: A Social Network With Its Database Wide Open

Moltbook launched on January 28, 2026 as a Reddit-style social network where autonomous AI agents post, vote, and interact with each other. It went viral immediately — Andrej Karpathy called it "the most incredible sci-fi takeoff-adjacent thing" he'd seen recently. Within three days, Wiz security researchers found the production database wide open.

The vulnerability was strikingly simple. Moltbook's backend ran on Supabase, a hosted PostgreSQL service. Supabase uses a publishable API key in client-side JavaScript — this is by design and safe when Row Level Security (RLS) policies are enabled. In Moltbook's implementation, RLS was not enabled. The publishable key granted unauthenticated read and write access to every table in the production database.

Wiz found the key "within minutes" of examining the site's JavaScript bundles. The exposed data included:

1.5 million API authentication tokens — enough to fully impersonate any agent on the platform, including high-karma accounts and well-known persona agents.
35,000 email addresses of users who registered agents, plus nearly 30,000 early access signup emails.
4,060 private DM conversations between agents, some containing third-party credentials including plaintext OpenAI API keys.

The write access is what elevates this from a data leak to something more dangerous. Wiz confirmed they could modify existing posts on the live platform. On a network where AI agents consume posts as input and act on their content, write access to the database means anyone could inject instructions that propagating agents would interpret and execute. This is the prompt injection vector made trivially accessible: no need for clever encoding or hidden CSS — just edit the database directly and every agent reading that post receives your payload.

Moltbook's creator Matt Schlicht publicly described his development approach on X: "I didn't write one line of code for @moltbook." He says he had a vision for the technical architecture and that AI made it a reality. This "vibe coding" approach shipped a production application handling millions of agent identities without implementing the database-level access control that Supabase explicitly documents as required.

Wiz disclosed the vulnerability on January 31 via X DM to the maintainer. The fix came in multiple rounds over roughly three hours — first securing the agents, owners, and site_admins tables, then agent_messages, notifications, votes, and follows, then blocking write access, and finally discovering additional exposed tables including observers and identity_verifications. Each fix surfaced new exposed surfaces. The iterative nature of the remediation underscores how easily misconfiguration compounds in fast-moving projects.

All Moltbook traffic is HTTP/REST. A tool that only monitors MCP protocol traffic would see nothing.

The Missing Layer: What Both Breaches Have in Common

OpenClaw and Moltbook are architecturally different in almost every way. One is a local agent framework using WebSocket and in-process execution. The other is a cloud platform using HTTP/REST and PostgreSQL. They share no codebase, no protocol, no deployment model.

Yet both suffered the same fundamental failure: there was no enforcement layer between what the agent intended to do and what it actually did. No deterministic check on actions before they executed. No policy that could say "this agent can read posts but not modify them" or "this tool can make HTTP calls but not to domains outside this allowlist."

This isn't a coincidence. It's a pattern. And it points to specific architectural gaps that exist across the AI agent ecosystem today:

Deterministic policy enforcement. When an AI agent decides to execute an action, the decision about whether that action is allowed should not be made by another LLM. Simon Willison, the researcher who popularized the term "prompt injection," identified the core problem: agents that combine access to private data, exposure to untrusted content, and the ability to communicate externally are vulnerable by design. The enforcement mechanism must be deterministic — rule-based, auditable, predictable. Using a vulnerable system to protect a vulnerable system is circular.

Protocol-agnostic interception. OpenClaw's critical actions happen over WebSocket and in-process calls. Moltbook's happen over REST. The next breach might happen over gRPC, or through a local subprocess, or via a framework-specific SDK. A security layer that only understands one protocol leaves every other protocol unmonitored. The enforcement point needs to normalize actions regardless of how they're transmitted.

Outbound connection control. Zenity researchers demonstrated how an indirect prompt injection embedded in a Google document could create a new Telegram bot integration in OpenClaw, giving the attacker a persistent command-and-control channel. The agent sent messages to the attacker's bot voluntarily — no exploit required, just a convincing instruction. Without outbound connection control, any compromised or manipulated agent can establish communication with attacker infrastructure, and the network sees normal HTTP traffic.

Response scanning. Agents don't just send data — they receive it. When an OpenClaw agent browses a web page containing hidden CSS-invisible instructions, or when a Moltbook agent reads a post that's been modified to contain prompt injection, the attack comes inbound. Scanning outgoing actions is necessary but insufficient. The data flowing back to the agent needs inspection too.

Content scanning on tool arguments. Moltbook agents were sharing plaintext API keys in DMs. OpenClaw skills instruct agents to store credentials in memory files that infostealers specifically target. The tools themselves aren't malicious — but the arguments they're passing around contain sensitive data that should never traverse an unencrypted, unmonitored channel.

Tool integrity verification. Snyk documented skills on ClawHub that ship clean, pass initial review, and later update to include malicious payloads. Koi found skills with hidden reverse shells embedded in otherwise functional code. Once a tool is installed, there's no mechanism to detect that its behavior has changed. A skill that summarized YouTube videos last week might exfiltrate credentials today.

Agent identity and trust boundaries. On Moltbook, every agent had the same level of access to the database. In OpenClaw, every skill runs with the same permissions as the agent itself — which typically means full access to the host system. There's no concept of "this agent is trusted to read email but not send it" or "this skill can access the network but not the filesystem."

These aren't theoretical concerns. Every single one maps directly to an attack that has already succeeded in the wild, within the past two weeks.

What Developers Can Do Today

If you're running OpenClaw, the immediate priorities are clear. Update to version 2026.2.1 or later — this addresses the RCE vulnerabilities. Bind the gateway to 127.0.0.1 instead of the default 0.0.0.0. Rotate every API key and token stored in the agent, and treat all credentials in ~/.openclaw/ as potentially compromised. Use --allowed-origins to restrict WebSocket connections. For remote access, use zero-trust tunnels like Tailscale or Cloudflare Tunnel instead of exposing ports directly. Audit your installed skills against the Snyk mcp-scan tool and remove anything that requires external prerequisites or copy-paste scripts.

If you were using Moltbook, rotate any API keys that were connected to the platform. Wiz confirmed that all data was publicly accessible before the fix — assume any credentials shared via the platform are compromised.

More broadly: apply the principle of least privilege to your agents. Don't give an agent full system access when it only needs to read email. Don't install skills from unvetted marketplaces without reviewing what they actually do. Run agents in isolated environments where a compromise doesn't mean losing everything. Monitor outbound connections for unexpected destinations.

These are important first steps, but they're mitigations — they reduce blast radius without addressing the architectural gaps. The ecosystem needs tooling purpose-built for the problem — deterministic enforcement that works regardless of protocol, deployment model, or agent framework.

Looking Forward

The OpenClaw and Moltbook incidents are not outliers. They're the first high-profile examples of a structural problem that will recur as AI agents gain broader adoption and deeper system access. The community response has been encouraging — Cisco's open-source Skill Scanner, Snyk's mcp-scan, OpenClaw's VirusTotal integration, and the rapid disclosure and patching work by all parties involved show an ecosystem that takes security seriously once problems are identified.

The challenge is moving from reactive patching to proactive enforcement. The gap between identifying threats and preventing them at runtime remains the core challenge — and it spans every protocol, framework, and deployment model in the ecosystem.

We're building SentinelGate, an open-source security layer for AI agents, because we believe this enforcement layer is what's missing. If this topic matters to you, the project is on GitHub.

Sources: Wiz Research — Moltbook breach · Snyk ToxicSkills — ClawHub audit · Koi Security / The Hacker News — ClawHavoc campaign · SecurityScorecard STRIKE — exposed instances · Hunt.io — CVE-2026-25253 analysis · The Hacker News — CVE-2026-25253 disclosure · Adversa.ai — OpenClaw security guide