DEV Community

Santiago Palma
Santiago Palma

Posted on

Lessons from the OpenClaw Security Incident: Building Secure AI Agent Architectures on AWS

A forensic analysis of the OpenClaw AI agent vulnerabilities, the Moltbook data breach, and the GTG-1002 AI-orchestrated espionage campaign. With reference architectures for secure agent deployment using AWS Nitro Enclaves and Firecracker.

Disclosure: I'm an AWS Community Builder. The mitigation architectures in this article focus on AWS services because that's my area of expertise, but the underlying security principles (hardware isolation, ephemeral compute, policy enforcement, network segmentation) are cloud-agnostic and apply equally to GCP, Azure, or bare-metal deployments.

TL;DR

OpenClaw, the most popular open-source AI agent (214K+ GitHub stars), suffered a cascade of security failures in early 2026: a one-click RCE exploit (CVE-2026-25253), 824+ malicious plugins distributing malware, and a social network data breach exposing 1.5M API tokens. Meanwhile, a Chinese state-sponsored group (GTG-1002) used Claude Code to autonomously compromise ~30 organizations β€” documented directly by Anthropic.

This post dissects what went wrong β€” from a formal threat modeling perspective β€” and shows you how to run autonomous AI agents safely using AWS Nitro Enclaves, Firecracker microVMs, and Zero Trust policies.

The core principle: The model is untrusted. Security must be architectural, not behavioral.


πŸ“‘ Table of Contents

  1. Why AI Agents Are Different: The Attack Surface Expansion
  2. Threat Model: Actors, Assets, and Trust Boundaries
  3. The OpenClaw Timeline
  4. ClawJacked: The One-Click RCE
  5. The Core Vulnerability: Indirect Prompt Injection
  6. ClawHavoc: 824 Malicious Skills
  7. Moltbook: 1.5M Tokens Exposed via Vibe Coding
  8. GTG-1002: AI-Orchestrated Espionage Campaign
  9. Industry Metrics: The 72-Minute Exfiltration
  10. The Academic View: What Researchers Found
  11. Reference Architecture: Secure Agent Deployment on AWS
  12. Secure Deployment Checklist
  13. References

Why AI Agents Are Different: The Attack Surface Expansion

Traditional LLM chatbots are stateless text generators. AI agents are fundamentally different β€” they combine four capabilities that, together, create an unprecedented attack surface:

Agent Attack Surface = LLM Reasoning
                     + Tool Execution (shell, APIs, databases)
                     + Filesystem Access (read/write local files)
                     + Internet Access (browse, fetch, connect)
Enter fullscreen mode Exit fullscreen mode

This is what researchers call "agent attack surface expansion" (arXiv:2603.11619). A single successful prompt injection doesn't just produce bad text β€” it can execute commands, exfiltrate files, and pivot through networks.

Security Layers in an Agent System

Layer What It Does What Can Go Wrong
Layer 1 β€” LLM Reasoning Interprets instructions, plans actions Prompt injection, jailbreak
Layer 2 β€” Agent Orchestration Manages memory, sessions, tool routing Memory poisoning, session hijacking
Layer 3 β€” Tool Execution Runs commands, calls APIs Command injection, safeBins bypass
Layer 4 β€” Infrastructure Hosts the agent (container, VM, cloud) Container escape, network exposure

Every incident in this article maps to one or more of these layers.


Threat Model: Actors, Assets, and Trust Boundaries

Before analyzing specific vulnerabilities, here's the formal threat model:

Actors

Actor Motivation Example
External attacker Credential theft, cryptomining ClawJacked (CVE-2026-25253)
Malicious skill developer Malware distribution ClawHavoc campaign
Compromised website Silent agent hijacking WebSocket CSWH via browser
State-sponsored APT Espionage, persistent access GTG-1002 (Anthropic report)

Assets at Risk

Asset Where It Lives Impact if Compromised
API tokens openclaw.json, .env Full cloud account takeover
System credentials SSH keys, keychains Lateral movement
Agent memory soul.md, memory.md Long-term behavior manipulation
Cloud resources S3, EC2, IAM roles Data breach, resource abuse

Trust Boundaries

The core failure in OpenClaw: The trust boundary at the gateway was effectively non-existent. Untrusted inputs (websites, skills, logs) crossed directly into the trusted zone without validation.


The OpenClaw Timeline

Here's the full timeline of what happened in just 30 days:

Date (2026) Event Impact
Jan 27-29 ClawHavoc begins 341 malicious skills on ClawHub
Jan 30 Silent patch v2026.1.29 CVE-2026-25253 partially fixed
Jan 31 Censys/Shodan scan 21,639 exposed instances
Jan 31 Moltbook breach 1.5M API tokens leaked
Feb 3 CVE disclosure CVSS 8.8 RCE via WebSocket
Feb 9 Second scan 135,000+ exposed instances
Feb 14 Log poisoning discovered Agent logic manipulation via TCP 18789
Feb 26 Full ClawJacked patch v2026.2.25
Mar 4 Ongoing crisis 220,000+ instances, 824+ malicious skills

ClawJacked: The One-Click RCE

CVE-2026-25253 | CVSS 8.8 | Discovered by Oasis Security

The core problem? OpenClaw's gateway trusted localhost blindly. Any connection from 127.0.0.1 was treated as safe β€” no Origin header validation, no rate limiting.

But it gets worse. CVE-2026-28363 (CVSS 9.9) revealed that OpenClaw's safeBins β€” the allowlist of permitted commands β€” could be bypassed using GNU long-option abbreviations:

# ❌ Blocked by safeBins:
tar --compress-program=/bin/bash

# βœ… Bypasses safeBins completely:
tar --compress-prog=/bin/bash
Enter fullscreen mode Exit fullscreen mode

The validation only checked for exact string matches. GNU tools accept abbreviated options. Game over.


The Core Vulnerability: Indirect Prompt Injection (IPI)

While RCE and safeBins bypass are dramatic, the most pervasive threat to AI agents is Indirect Prompt Injection β€” and it's what makes agents fundamentally harder to secure than traditional software.

How IPI Works

Real-World IPI in OpenClaw: Log Poisoning

SOC Prime and Kaspersky documented an IPI variant targeting OpenClaw's TCP port 18789 (telemetry). Attackers injected prompt instructions disguised as log entries. When the agent processed its own logs for diagnostics, it executed the hidden commands β€” exfiltrating environment variables and scanning internal networks.

This is particularly dangerous because:

  • The agent trusts its own logs (they're "internal" data)
  • The attack survives across sessions via persistent memory (memory.md)
  • Traditional firewalls can't detect it β€” the traffic looks like normal agent activity

Key insight from arXiv:2601.15654 (Zombie Agents): Once a malicious instruction enters long-term memory, it persists across sessions and can activate days later β€” a "sleeper agent" pattern that session-based security completely misses.


ClawHavoc: 824 Malicious Skills

Snyk's ToxicSkills study (Feb 2026) scanned 3,984 skills from ClawHub:

Finding Percentage
Skills with at least one security flaw 36.8%
Skills with critical issues (malware, secrets, IPI) 13.4%
Skills with confirmed malicious payloads 76
Malicious skills using IPI + traditional malware combo 91%

The ClawHavoc campaign grew from 341 malicious skills in January to 824+ by March, delivering:

  • macOS: AMOS (Atomic Stealer) β†’ keychain, SSH keys, crypto wallets
  • Windows: Vidar Stealer β†’ specifically targeting openclaw.json, soul.md, memory.md

The Attack Pattern


Moltbook: 1.5M Tokens Exposed

Moltbook was a social network built entirely by AI agents ("vibe coding"). The founder admitted he didn't write a single line of code manually.

The result? A Supabase database with Row Level Security disabled and the anon key hardcoded in frontend JavaScript.

Wiz Research discovered:

Exposed Data Count
API tokens (OpenAI, Anthropic, AWS) 1,500,000
Owner email addresses 35,000
Private DMs with plaintext API keys 4,060
Agent-to-human ratio ("Shadow AI") 88:1

⚠️ An 88:1 agent-to-human ratio means massive, unsupervised automation. This is "Shadow AI" at enterprise scale.

Timeline: From discovery to first patch: 6 hours. But the damage β€” 1.5M tokens in the wild β€” was already done.


GTG-1002: AI-Orchestrated Espionage Campaign

In September 2025, Anthropic published a security disclosure titled "Disrupting the first reported AI-orchestrated cyber espionage campaign", documenting how an AI agent was weaponized at scale. This was subsequently covered by The Hacker News, The Record, The Guardian, and Fox Business.

Attribute Detail Source
Threat Actor GTG-1002 (Chinese state-sponsored) Anthropic official disclosure
Tool Weaponized Claude Code Anthropic official disclosure
Targets ~30 organizations (financial, government, tech) Anthropic, The Record
Autonomy Level 80-90% of operation was AI-driven Anthropic official disclosure
Detection Mid-September 2025 Anthropic, The Guardian
Status Accounts banned, victims notified Anthropic official disclosure

The attackers bypassed Claude's safety guardrails by convincing it they were legitimate pentesters, breaking malicious commands into seemingly benign requests. Anthropic noted the AI occasionally "hallucinated" non-existent credentials, requiring human validation β€” one of the few things preventing full autonomy.


Industry Metrics: The 72-Minute Exfiltration

Unit 42 Global Incident Response Report 2026 (750+ incidents analyzed):

Metric Value Context
Fastest exfiltration time 72 minutes 4x faster than 2024
Multi-surface attacks 87% of cases Endpoint + Cloud + SaaS simultaneously
Identity-based initial access 65% Token theft > software exploits
Preventable breaches 90% Misconfigs + excessive permissions
Cloud identities with unused perms (60+ days) 99% Massive attack surface

The implication is clear: If attackers exfiltrate in 72 minutes and your SOC takes 4 hours to respond, you've already lost. Automated response is the only viable control.


The Academic View: What Researchers Found

Four recent arXiv papers formalize the threats described above. Here's what each one discovered and what mitigations they propose:

AgentSentry (arXiv:2602.22724)

Problem: Indirect Prompt Injection manipulates agent behavior across multiple turns, making it nearly invisible to single-turn defenses.

Discovery: By modeling IPI as a "temporal causal takeover," the researchers identified that the attack signal dominates at tool-return boundaries β€” the moment when an external tool sends data back to the agent.

Mitigation: Counterfactual re-execution: the system replays the agent's reasoning with the suspicious content removed. If the agent's behavior changes significantly, the content is flagged and purified.

Result: 0% Attack Success Rate on the AgentDojo benchmark while maintaining normal task utility.

AdapTools (arXiv:2602.20720)

Problem: MCP (Model Context Protocol) servers are increasingly used to connect agents to tools, but who audits them?

Discovery: 50% of third-party MCP servers lack any form of security audit. Attackers can register malicious MCP servers that look legitimate.

Mitigation: Adaptive tool-based IPI detection that monitors tool call patterns for anomalies.

Taming OpenClaw (arXiv:2603.11619) β€” Tsinghua University + Ant Group

Problem: Existing defenses are "point solutions" that miss cross-layer attacks.

Discovery: Introduced a 5-layer lifecycle framework (initialization β†’ input β†’ inference β†’ decision β†’ execution) revealing that most attacks exploit transitions between layers, not individual layers.

Mitigation: Proposes holistic defense: plugin vetting, context-aware filtering, memory integrity validation, intent verification, and capability enforcement β€” all applied at layer boundaries.

Zombie Agents (arXiv:2601.15654)

Problem: What happens when an IPI enters long-term memory?

Discovery: Malicious instructions persist across sessions through self-reinforcing injection patterns. The agent writes the malicious instruction into its own memory, creating a "sleeper agent" that activates days later.

Mitigation: Memory integrity validation protocols and session-scoped memory isolation.


Reference Architecture: Secure Agent Deployment on AWS

The security principles below are cloud-agnostic:

Principle AWS Implementation Equivalent Elsewhere
Hardware isolation Nitro Enclaves GCP Confidential VMs, Azure Confidential Computing
Ephemeral compute Firecracker microVMs Kata Containers, gVisor
Policy-as-code Cedar (AWS) OPA/Rego (cloud-agnostic, CNCF)
Zero Trust access Verified Access BeyondCorp (GCP), Azure AD Conditional Access

This article focuses on AWS because that's where I build, but the architecture pattern applies universally.

The Reference Architecture

Key Components Explained

1. Nitro Enclaves (Hardware Isolation)

The agent runs inside a Nitro Enclave β€” no network, no storage, no SSH. Communication happens exclusively via vsock to a forward proxy on the parent instance.

PCR Register What It Measures Why It Matters
PCR0 Enclave image hash Agent binary wasn't tampered with
PCR1 Kernel + ramdisk hash OS integrity verified
PCR3 IAM Role ARN hash Only authorized instances can start it
PCR8 Signing certificate hash Software origin verified

2. Firecracker microVMs (Ephemeral Sessions)

Feature Firecracker Docker
Isolation Hardware (KVM) Shared kernel
Boot time <125ms ~1-5s
RAM overhead <5MB ~50-200MB
Escape risk Minimal High
Post-task cleanup Auto-destroyed Needs config

Bedrock AgentCore Runtime uses Firecracker to run each agent session in a dedicated microVM. Memory is sanitized immediately after the session ends.

3. Zero Trust with Cedar

// Only managed devices + FinanceOps group + internal network
permit(
    principal,
    action == Action::"InvokeAgent",
    resource == Resource::"FinancialAgent"
)
when {
    context.device.is_managed == true &&
    context.identity.groups.contains("FinanceOps") &&
    context.network.source_ip.is_in_range(IPRange::"10.0.0.0/24")
};
Enter fullscreen mode Exit fullscreen mode

4. OPA for Tool Validation

package agent.authz
default allow = false

# Allow reads on non-sensitive tables
allow {
    input.tool == "DatabaseReader"
    input.operation == "select"
    not input.table == "user_credentials"
}

# Block destructive ops in production
deny {
    input.operation == "delete"
    input.environment == "production"
    not is_maintenance_window
}
Enter fullscreen mode Exit fullscreen mode

Secure Deployment Checklist

βœ… Agent sandbox (Firecracker microVM or Nitro Enclave)
βœ… Signed plugins/skills (cryptographic integrity)
βœ… Policy engine (OPA/Cedar for every tool invocation)
βœ… Network isolation (separate subnets: agent, tool, data)
βœ… Credential vault (Secrets Manager β€” never plaintext)
βœ… Egress filtering (domain allowlist via forward proxy)
βœ… Automated response (EventBridge β†’ Lambda kill-switch)
βœ… Immutable logging (CloudWatch + tamper protection)
βœ… Device posture validation (Verified Access)
βœ… Session-scoped memory (no cross-session persistence)
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

  1. The model is untrusted. Security must be architectural, not behavioral. You cannot rely on prompt engineering to keep an agent safe.

  2. Indirect Prompt Injection is the #1 threat. It's the attack vector that makes agents fundamentally different from traditional software. Every layer of defense must account for it.

  3. 72-minute exfiltration means human-speed response is obsolete. Automate your incident response with EventBridge + Lambda.

  4. 36.8% of AI skills have security flaws (Snyk ToxicSkills). Treat every plugin as untrusted code.

  5. The agent attack surface = LLM reasoning + tool execution + filesystem access + internet access. Secure each layer independently.

  6. The tools exist today. Whether you use AWS (Nitro, Firecracker, AgentCore), GCP (Confidential VMs), or open-source (Kata, gVisor, OPA) β€” the principle is the same: hardware isolation + policy enforcement + ephemeral compute.


References

  1. Oasis Security β€” ClawJacked Technical Report (CVE-2026-25253)
  2. NIST NVD β€” CVE-2026-28363 (CVSS 9.9)
  3. Snyk β€” ToxicSkills Study (Feb 2026)
  4. Wiz Research β€” Moltbook Breach Analysis
  5. Anthropic β€” GTG-1002: First AI-Orchestrated Espionage Campaign
  6. Palo Alto Networks β€” Unit 42 Global Incident Response Report 2026
  7. CrowdStrike β€” Global Threat Report 2025
  8. AWS β€” Security Reference Architecture for Generative AI (Capability 5)
  9. AWS β€” Nitro Enclaves Cryptographic Attestation Documentation
  10. AWS β€” Bedrock AgentCore Runtime
  11. arXiv:2602.22724 β€” AgentSentry
  12. arXiv:2603.11619 β€” Taming OpenClaw
  13. arXiv:2601.15654 β€” Zombie Agents
  14. NIST RFI 2026-00206 β€” Security Considerations for AI Agents

If you found this useful, consider following for more cloud security deep dives. Questions? Drop them in the comments.

Top comments (0)