A forensic analysis of the OpenClaw AI agent vulnerabilities, the Moltbook data breach, and the GTG-1002 AI-orchestrated espionage campaign. With reference architectures for secure agent deployment using AWS Nitro Enclaves and Firecracker.
Disclosure: I'm an AWS Community Builder. The mitigation architectures in this article focus on AWS services because that's my area of expertise, but the underlying security principles (hardware isolation, ephemeral compute, policy enforcement, network segmentation) are cloud-agnostic and apply equally to GCP, Azure, or bare-metal deployments.
TL;DR
OpenClaw, the most popular open-source AI agent (214K+ GitHub stars), suffered a cascade of security failures in early 2026: a one-click RCE exploit (CVE-2026-25253), 824+ malicious plugins distributing malware, and a social network data breach exposing 1.5M API tokens. Meanwhile, a Chinese state-sponsored group (GTG-1002) used Claude Code to autonomously compromise ~30 organizations β documented directly by Anthropic.
This post dissects what went wrong β from a formal threat modeling perspective β and shows you how to run autonomous AI agents safely using AWS Nitro Enclaves, Firecracker microVMs, and Zero Trust policies.
The core principle: The model is untrusted. Security must be architectural, not behavioral.
π Table of Contents
- Why AI Agents Are Different: The Attack Surface Expansion
- Threat Model: Actors, Assets, and Trust Boundaries
- The OpenClaw Timeline
- ClawJacked: The One-Click RCE
- The Core Vulnerability: Indirect Prompt Injection
- ClawHavoc: 824 Malicious Skills
- Moltbook: 1.5M Tokens Exposed via Vibe Coding
- GTG-1002: AI-Orchestrated Espionage Campaign
- Industry Metrics: The 72-Minute Exfiltration
- The Academic View: What Researchers Found
- Reference Architecture: Secure Agent Deployment on AWS
- Secure Deployment Checklist
- References
Why AI Agents Are Different: The Attack Surface Expansion
Traditional LLM chatbots are stateless text generators. AI agents are fundamentally different β they combine four capabilities that, together, create an unprecedented attack surface:
Agent Attack Surface = LLM Reasoning
+ Tool Execution (shell, APIs, databases)
+ Filesystem Access (read/write local files)
+ Internet Access (browse, fetch, connect)
This is what researchers call "agent attack surface expansion" (arXiv:2603.11619). A single successful prompt injection doesn't just produce bad text β it can execute commands, exfiltrate files, and pivot through networks.
Security Layers in an Agent System
| Layer | What It Does | What Can Go Wrong |
|---|---|---|
| Layer 1 β LLM Reasoning | Interprets instructions, plans actions | Prompt injection, jailbreak |
| Layer 2 β Agent Orchestration | Manages memory, sessions, tool routing | Memory poisoning, session hijacking |
| Layer 3 β Tool Execution | Runs commands, calls APIs | Command injection, safeBins bypass |
| Layer 4 β Infrastructure | Hosts the agent (container, VM, cloud) | Container escape, network exposure |
Every incident in this article maps to one or more of these layers.
Threat Model: Actors, Assets, and Trust Boundaries
Before analyzing specific vulnerabilities, here's the formal threat model:
Actors
| Actor | Motivation | Example |
|---|---|---|
| External attacker | Credential theft, cryptomining | ClawJacked (CVE-2026-25253) |
| Malicious skill developer | Malware distribution | ClawHavoc campaign |
| Compromised website | Silent agent hijacking | WebSocket CSWH via browser |
| State-sponsored APT | Espionage, persistent access | GTG-1002 (Anthropic report) |
Assets at Risk
| Asset | Where It Lives | Impact if Compromised |
|---|---|---|
| API tokens |
openclaw.json, .env
|
Full cloud account takeover |
| System credentials | SSH keys, keychains | Lateral movement |
| Agent memory |
soul.md, memory.md
|
Long-term behavior manipulation |
| Cloud resources | S3, EC2, IAM roles | Data breach, resource abuse |
Trust Boundaries
The core failure in OpenClaw: The trust boundary at the gateway was effectively non-existent. Untrusted inputs (websites, skills, logs) crossed directly into the trusted zone without validation.
The OpenClaw Timeline
Here's the full timeline of what happened in just 30 days:
| Date (2026) | Event | Impact |
|---|---|---|
| Jan 27-29 | ClawHavoc begins | 341 malicious skills on ClawHub |
| Jan 30 | Silent patch v2026.1.29 | CVE-2026-25253 partially fixed |
| Jan 31 | Censys/Shodan scan | 21,639 exposed instances |
| Jan 31 | Moltbook breach | 1.5M API tokens leaked |
| Feb 3 | CVE disclosure | CVSS 8.8 RCE via WebSocket |
| Feb 9 | Second scan | 135,000+ exposed instances |
| Feb 14 | Log poisoning discovered | Agent logic manipulation via TCP 18789 |
| Feb 26 | Full ClawJacked patch | v2026.2.25 |
| Mar 4 | Ongoing crisis | 220,000+ instances, 824+ malicious skills |
ClawJacked: The One-Click RCE
CVE-2026-25253 | CVSS 8.8 | Discovered by Oasis Security
The core problem? OpenClaw's gateway trusted localhost blindly. Any connection from 127.0.0.1 was treated as safe β no Origin header validation, no rate limiting.
But it gets worse. CVE-2026-28363 (CVSS 9.9) revealed that OpenClaw's safeBins β the allowlist of permitted commands β could be bypassed using GNU long-option abbreviations:
# β Blocked by safeBins:
tar --compress-program=/bin/bash
# β
Bypasses safeBins completely:
tar --compress-prog=/bin/bash
The validation only checked for exact string matches. GNU tools accept abbreviated options. Game over.
The Core Vulnerability: Indirect Prompt Injection (IPI)
While RCE and safeBins bypass are dramatic, the most pervasive threat to AI agents is Indirect Prompt Injection β and it's what makes agents fundamentally harder to secure than traditional software.
How IPI Works
Real-World IPI in OpenClaw: Log Poisoning
SOC Prime and Kaspersky documented an IPI variant targeting OpenClaw's TCP port 18789 (telemetry). Attackers injected prompt instructions disguised as log entries. When the agent processed its own logs for diagnostics, it executed the hidden commands β exfiltrating environment variables and scanning internal networks.
This is particularly dangerous because:
- The agent trusts its own logs (they're "internal" data)
- The attack survives across sessions via persistent memory (
memory.md) - Traditional firewalls can't detect it β the traffic looks like normal agent activity
Key insight from arXiv:2601.15654 (Zombie Agents): Once a malicious instruction enters long-term memory, it persists across sessions and can activate days later β a "sleeper agent" pattern that session-based security completely misses.
ClawHavoc: 824 Malicious Skills
Snyk's ToxicSkills study (Feb 2026) scanned 3,984 skills from ClawHub:
| Finding | Percentage |
|---|---|
| Skills with at least one security flaw | 36.8% |
| Skills with critical issues (malware, secrets, IPI) | 13.4% |
| Skills with confirmed malicious payloads | 76 |
| Malicious skills using IPI + traditional malware combo | 91% |
The ClawHavoc campaign grew from 341 malicious skills in January to 824+ by March, delivering:
- macOS: AMOS (Atomic Stealer) β keychain, SSH keys, crypto wallets
-
Windows: Vidar Stealer β specifically targeting
openclaw.json,soul.md,memory.md
The Attack Pattern
Moltbook: 1.5M Tokens Exposed
Moltbook was a social network built entirely by AI agents ("vibe coding"). The founder admitted he didn't write a single line of code manually.
The result? A Supabase database with Row Level Security disabled and the anon key hardcoded in frontend JavaScript.
Wiz Research discovered:
| Exposed Data | Count |
|---|---|
| API tokens (OpenAI, Anthropic, AWS) | 1,500,000 |
| Owner email addresses | 35,000 |
| Private DMs with plaintext API keys | 4,060 |
| Agent-to-human ratio ("Shadow AI") | 88:1 |
β οΈ An 88:1 agent-to-human ratio means massive, unsupervised automation. This is "Shadow AI" at enterprise scale.
Timeline: From discovery to first patch: 6 hours. But the damage β 1.5M tokens in the wild β was already done.
GTG-1002: AI-Orchestrated Espionage Campaign
In September 2025, Anthropic published a security disclosure titled "Disrupting the first reported AI-orchestrated cyber espionage campaign", documenting how an AI agent was weaponized at scale. This was subsequently covered by The Hacker News, The Record, The Guardian, and Fox Business.
| Attribute | Detail | Source |
|---|---|---|
| Threat Actor | GTG-1002 (Chinese state-sponsored) | Anthropic official disclosure |
| Tool Weaponized | Claude Code | Anthropic official disclosure |
| Targets | ~30 organizations (financial, government, tech) | Anthropic, The Record |
| Autonomy Level | 80-90% of operation was AI-driven | Anthropic official disclosure |
| Detection | Mid-September 2025 | Anthropic, The Guardian |
| Status | Accounts banned, victims notified | Anthropic official disclosure |
The attackers bypassed Claude's safety guardrails by convincing it they were legitimate pentesters, breaking malicious commands into seemingly benign requests. Anthropic noted the AI occasionally "hallucinated" non-existent credentials, requiring human validation β one of the few things preventing full autonomy.
Industry Metrics: The 72-Minute Exfiltration
Unit 42 Global Incident Response Report 2026 (750+ incidents analyzed):
| Metric | Value | Context |
|---|---|---|
| Fastest exfiltration time | 72 minutes | 4x faster than 2024 |
| Multi-surface attacks | 87% of cases | Endpoint + Cloud + SaaS simultaneously |
| Identity-based initial access | 65% | Token theft > software exploits |
| Preventable breaches | 90% | Misconfigs + excessive permissions |
| Cloud identities with unused perms (60+ days) | 99% | Massive attack surface |
The implication is clear: If attackers exfiltrate in 72 minutes and your SOC takes 4 hours to respond, you've already lost. Automated response is the only viable control.
The Academic View: What Researchers Found
Four recent arXiv papers formalize the threats described above. Here's what each one discovered and what mitigations they propose:
AgentSentry (arXiv:2602.22724)
Problem: Indirect Prompt Injection manipulates agent behavior across multiple turns, making it nearly invisible to single-turn defenses.
Discovery: By modeling IPI as a "temporal causal takeover," the researchers identified that the attack signal dominates at tool-return boundaries β the moment when an external tool sends data back to the agent.
Mitigation: Counterfactual re-execution: the system replays the agent's reasoning with the suspicious content removed. If the agent's behavior changes significantly, the content is flagged and purified.
Result: 0% Attack Success Rate on the AgentDojo benchmark while maintaining normal task utility.
AdapTools (arXiv:2602.20720)
Problem: MCP (Model Context Protocol) servers are increasingly used to connect agents to tools, but who audits them?
Discovery: 50% of third-party MCP servers lack any form of security audit. Attackers can register malicious MCP servers that look legitimate.
Mitigation: Adaptive tool-based IPI detection that monitors tool call patterns for anomalies.
Taming OpenClaw (arXiv:2603.11619) β Tsinghua University + Ant Group
Problem: Existing defenses are "point solutions" that miss cross-layer attacks.
Discovery: Introduced a 5-layer lifecycle framework (initialization β input β inference β decision β execution) revealing that most attacks exploit transitions between layers, not individual layers.
Mitigation: Proposes holistic defense: plugin vetting, context-aware filtering, memory integrity validation, intent verification, and capability enforcement β all applied at layer boundaries.
Zombie Agents (arXiv:2601.15654)
Problem: What happens when an IPI enters long-term memory?
Discovery: Malicious instructions persist across sessions through self-reinforcing injection patterns. The agent writes the malicious instruction into its own memory, creating a "sleeper agent" that activates days later.
Mitigation: Memory integrity validation protocols and session-scoped memory isolation.
Reference Architecture: Secure Agent Deployment on AWS
The security principles below are cloud-agnostic:
| Principle | AWS Implementation | Equivalent Elsewhere |
|---|---|---|
| Hardware isolation | Nitro Enclaves | GCP Confidential VMs, Azure Confidential Computing |
| Ephemeral compute | Firecracker microVMs | Kata Containers, gVisor |
| Policy-as-code | Cedar (AWS) | OPA/Rego (cloud-agnostic, CNCF) |
| Zero Trust access | Verified Access | BeyondCorp (GCP), Azure AD Conditional Access |
This article focuses on AWS because that's where I build, but the architecture pattern applies universally.
The Reference Architecture
Key Components Explained
1. Nitro Enclaves (Hardware Isolation)
The agent runs inside a Nitro Enclave β no network, no storage, no SSH. Communication happens exclusively via vsock to a forward proxy on the parent instance.
| PCR Register | What It Measures | Why It Matters |
|---|---|---|
| PCR0 | Enclave image hash | Agent binary wasn't tampered with |
| PCR1 | Kernel + ramdisk hash | OS integrity verified |
| PCR3 | IAM Role ARN hash | Only authorized instances can start it |
| PCR8 | Signing certificate hash | Software origin verified |
2. Firecracker microVMs (Ephemeral Sessions)
| Feature | Firecracker | Docker |
|---|---|---|
| Isolation | Hardware (KVM) | Shared kernel |
| Boot time | <125ms | ~1-5s |
| RAM overhead | <5MB | ~50-200MB |
| Escape risk | Minimal | High |
| Post-task cleanup | Auto-destroyed | Needs config |
Bedrock AgentCore Runtime uses Firecracker to run each agent session in a dedicated microVM. Memory is sanitized immediately after the session ends.
3. Zero Trust with Cedar
// Only managed devices + FinanceOps group + internal network
permit(
principal,
action == Action::"InvokeAgent",
resource == Resource::"FinancialAgent"
)
when {
context.device.is_managed == true &&
context.identity.groups.contains("FinanceOps") &&
context.network.source_ip.is_in_range(IPRange::"10.0.0.0/24")
};
4. OPA for Tool Validation
package agent.authz
default allow = false
# Allow reads on non-sensitive tables
allow {
input.tool == "DatabaseReader"
input.operation == "select"
not input.table == "user_credentials"
}
# Block destructive ops in production
deny {
input.operation == "delete"
input.environment == "production"
not is_maintenance_window
}
Secure Deployment Checklist
β
Agent sandbox (Firecracker microVM or Nitro Enclave)
β
Signed plugins/skills (cryptographic integrity)
β
Policy engine (OPA/Cedar for every tool invocation)
β
Network isolation (separate subnets: agent, tool, data)
β
Credential vault (Secrets Manager β never plaintext)
β
Egress filtering (domain allowlist via forward proxy)
β
Automated response (EventBridge β Lambda kill-switch)
β
Immutable logging (CloudWatch + tamper protection)
β
Device posture validation (Verified Access)
β
Session-scoped memory (no cross-session persistence)
Key Takeaways
The model is untrusted. Security must be architectural, not behavioral. You cannot rely on prompt engineering to keep an agent safe.
Indirect Prompt Injection is the #1 threat. It's the attack vector that makes agents fundamentally different from traditional software. Every layer of defense must account for it.
72-minute exfiltration means human-speed response is obsolete. Automate your incident response with EventBridge + Lambda.
36.8% of AI skills have security flaws (Snyk ToxicSkills). Treat every plugin as untrusted code.
The agent attack surface = LLM reasoning + tool execution + filesystem access + internet access. Secure each layer independently.
The tools exist today. Whether you use AWS (Nitro, Firecracker, AgentCore), GCP (Confidential VMs), or open-source (Kata, gVisor, OPA) β the principle is the same: hardware isolation + policy enforcement + ephemeral compute.
References
- Oasis Security β ClawJacked Technical Report (CVE-2026-25253)
- NIST NVD β CVE-2026-28363 (CVSS 9.9)
- Snyk β ToxicSkills Study (Feb 2026)
- Wiz Research β Moltbook Breach Analysis
- Anthropic β GTG-1002: First AI-Orchestrated Espionage Campaign
- Palo Alto Networks β Unit 42 Global Incident Response Report 2026
- CrowdStrike β Global Threat Report 2025
- AWS β Security Reference Architecture for Generative AI (Capability 5)
- AWS β Nitro Enclaves Cryptographic Attestation Documentation
- AWS β Bedrock AgentCore Runtime
- arXiv:2602.22724 β AgentSentry
- arXiv:2603.11619 β Taming OpenClaw
- arXiv:2601.15654 β Zombie Agents
- NIST RFI 2026-00206 β Security Considerations for AI Agents
If you found this useful, consider following for more cloud security deep dives. Questions? Drop them in the comments.







Top comments (0)