Santiago Palma

Posted on Mar 16

Lessons from the OpenClaw Security Incident: Building Secure AI Agent Architectures on AWS

#aws #ai #security #devops

A forensic analysis of the OpenClaw AI agent vulnerabilities, the Moltbook data breach, and the GTG-1002 AI-orchestrated espionage campaign. With reference architectures for secure agent deployment using AWS Nitro Enclaves and Firecracker.

Disclosure: I'm an AWS Community Builder. The mitigation architectures in this article focus on AWS services because that's my area of expertise, but the underlying security principles (hardware isolation, ephemeral compute, policy enforcement, network segmentation) are cloud-agnostic and apply equally to GCP, Azure, or bare-metal deployments.

TL;DR

OpenClaw, the most popular open-source AI agent (214K+ GitHub stars), suffered a cascade of security failures in early 2026: a one-click RCE exploit (CVE-2026-25253), 824+ malicious plugins distributing malware, and a social network data breach exposing 1.5M API tokens. Meanwhile, a Chinese state-sponsored group (GTG-1002) used Claude Code to autonomously compromise ~30 organizations — documented directly by Anthropic.

This post dissects what went wrong — from a formal threat modeling perspective — and shows you how to run autonomous AI agents safely using AWS Nitro Enclaves, Firecracker microVMs, and Zero Trust policies.

The core principle: The model is untrusted. Security must be architectural, not behavioral.

📑 Table of Contents

Why AI Agents Are Different: The Attack Surface Expansion
Threat Model: Actors, Assets, and Trust Boundaries
The OpenClaw Timeline
ClawJacked: The One-Click RCE
The Core Vulnerability: Indirect Prompt Injection
ClawHavoc: 824 Malicious Skills
Moltbook: 1.5M Tokens Exposed via Vibe Coding
GTG-1002: AI-Orchestrated Espionage Campaign
Industry Metrics: The 72-Minute Exfiltration
The Academic View: What Researchers Found
Reference Architecture: Secure Agent Deployment on AWS
Secure Deployment Checklist
References

Why AI Agents Are Different: The Attack Surface Expansion

Traditional LLM chatbots are stateless text generators. AI agents are fundamentally different — they combine four capabilities that, together, create an unprecedented attack surface:

Agent Attack Surface = LLM Reasoning
                     + Tool Execution (shell, APIs, databases)
                     + Filesystem Access (read/write local files)
                     + Internet Access (browse, fetch, connect)

This is what researchers call "agent attack surface expansion" (arXiv:2603.11619). A single successful prompt injection doesn't just produce bad text — it can execute commands, exfiltrate files, and pivot through networks.

Security Layers in an Agent System

Layer	What It Does	What Can Go Wrong
Layer 1 — LLM Reasoning	Interprets instructions, plans actions	Prompt injection, jailbreak
Layer 2 — Agent Orchestration	Manages memory, sessions, tool routing	Memory poisoning, session hijacking
Layer 3 — Tool Execution	Runs commands, calls APIs	Command injection, safeBins bypass
Layer 4 — Infrastructure	Hosts the agent (container, VM, cloud)	Container escape, network exposure

Every incident in this article maps to one or more of these layers.

Threat Model: Actors, Assets, and Trust Boundaries

Before analyzing specific vulnerabilities, here's the formal threat model:

Actors

Actor	Motivation	Example
External attacker	Credential theft, cryptomining	ClawJacked (CVE-2026-25253)
Malicious skill developer	Malware distribution	ClawHavoc campaign
Compromised website	Silent agent hijacking	WebSocket CSWH via browser
State-sponsored APT	Espionage, persistent access	GTG-1002 (Anthropic report)

Assets at Risk

Asset	Where It Lives	Impact if Compromised
API tokens	`openclaw.json`, `.env`	Full cloud account takeover
System credentials	SSH keys, keychains	Lateral movement
Agent memory	`soul.md`, `memory.md`	Long-term behavior manipulation
Cloud resources	S3, EC2, IAM roles	Data breach, resource abuse

Trust Boundaries

The core failure in OpenClaw: The trust boundary at the gateway was effectively non-existent. Untrusted inputs (websites, skills, logs) crossed directly into the trusted zone without validation.

The OpenClaw Timeline

Here's the full timeline of what happened in just 30 days:

Date (2026)	Event	Impact
Jan 27-29	ClawHavoc begins	341 malicious skills on ClawHub
Jan 30	Silent patch v2026.1.29	CVE-2026-25253 partially fixed
Jan 31	Censys/Shodan scan	21,639 exposed instances
Jan 31	Moltbook breach	1.5M API tokens leaked
Feb 3	CVE disclosure	CVSS 8.8 RCE via WebSocket
Feb 9	Second scan	135,000+ exposed instances
Feb 14	Log poisoning discovered	Agent logic manipulation via TCP 18789
Feb 26	Full ClawJacked patch	v2026.2.25
Mar 4	Ongoing crisis	220,000+ instances, 824+ malicious skills

ClawJacked: The One-Click RCE

CVE-2026-25253 | CVSS 8.8 | Discovered by Oasis Security

The core problem? OpenClaw's gateway trusted localhost blindly. Any connection from 127.0.0.1 was treated as safe — no Origin header validation, no rate limiting.

But it gets worse. CVE-2026-28363 (CVSS 9.9) revealed that OpenClaw's safeBins — the allowlist of permitted commands — could be bypassed using GNU long-option abbreviations:

# ❌ Blocked by safeBins:
tar --compress-program=/bin/bash

# ✅ Bypasses safeBins completely:
tar --compress-prog=/bin/bash

The validation only checked for exact string matches. GNU tools accept abbreviated options. Game over.

The Core Vulnerability: Indirect Prompt Injection (IPI)

While RCE and safeBins bypass are dramatic, the most pervasive threat to AI agents is Indirect Prompt Injection — and it's what makes agents fundamentally harder to secure than traditional software.

How IPI Works

Real-World IPI in OpenClaw: Log Poisoning

SOC Prime and Kaspersky documented an IPI variant targeting OpenClaw's TCP port 18789 (telemetry). Attackers injected prompt instructions disguised as log entries. When the agent processed its own logs for diagnostics, it executed the hidden commands — exfiltrating environment variables and scanning internal networks.

This is particularly dangerous because:

The agent trusts its own logs (they're "internal" data)
The attack survives across sessions via persistent memory (memory.md)
Traditional firewalls can't detect it — the traffic looks like normal agent activity

Key insight from arXiv:2601.15654 (Zombie Agents): Once a malicious instruction enters long-term memory, it persists across sessions and can activate days later — a "sleeper agent" pattern that session-based security completely misses.

ClawHavoc: 824 Malicious Skills

Snyk's ToxicSkills study (Feb 2026) scanned 3,984 skills from ClawHub:

Finding	Percentage
Skills with at least one security flaw	36.8%
Skills with critical issues (malware, secrets, IPI)	13.4%
Skills with confirmed malicious payloads	76
Malicious skills using IPI + traditional malware combo	91%

The ClawHavoc campaign grew from 341 malicious skills in January to 824+ by March, delivering:

macOS: AMOS (Atomic Stealer) → keychain, SSH keys, crypto wallets
Windows: Vidar Stealer → specifically targeting openclaw.json, soul.md, memory.md

The Attack Pattern

Moltbook: 1.5M Tokens Exposed

Moltbook was a social network built entirely by AI agents ("vibe coding"). The founder admitted he didn't write a single line of code manually.

The result? A Supabase database with Row Level Security disabled and the anon key hardcoded in frontend JavaScript.

Wiz Research discovered:

Exposed Data	Count
API tokens (OpenAI, Anthropic, AWS)	1,500,000
Owner email addresses	35,000
Private DMs with plaintext API keys	4,060
Agent-to-human ratio ("Shadow AI")	88:1

⚠️ An 88:1 agent-to-human ratio means massive, unsupervised automation. This is "Shadow AI" at enterprise scale.

Timeline: From discovery to first patch: 6 hours. But the damage — 1.5M tokens in the wild — was already done.

GTG-1002: AI-Orchestrated Espionage Campaign

In September 2025, Anthropic published a security disclosure titled "Disrupting the first reported AI-orchestrated cyber espionage campaign", documenting how an AI agent was weaponized at scale. This was subsequently covered by The Hacker News, The Record, The Guardian, and Fox Business.

Attribute	Detail	Source
Threat Actor	GTG-1002 (Chinese state-sponsored)	Anthropic official disclosure
Tool Weaponized	Claude Code	Anthropic official disclosure
Targets	~30 organizations (financial, government, tech)	Anthropic, The Record
Autonomy Level	80-90% of operation was AI-driven	Anthropic official disclosure
Detection	Mid-September 2025	Anthropic, The Guardian
Status	Accounts banned, victims notified	Anthropic official disclosure

The attackers bypassed Claude's safety guardrails by convincing it they were legitimate pentesters, breaking malicious commands into seemingly benign requests. Anthropic noted the AI occasionally "hallucinated" non-existent credentials, requiring human validation — one of the few things preventing full autonomy.

Industry Metrics: The 72-Minute Exfiltration

Unit 42 Global Incident Response Report 2026 (750+ incidents analyzed):

Metric	Value	Context
Fastest exfiltration time	72 minutes	4x faster than 2024
Multi-surface attacks	87% of cases	Endpoint + Cloud + SaaS simultaneously
Identity-based initial access	65%	Token theft > software exploits
Preventable breaches	90%	Misconfigs + excessive permissions
Cloud identities with unused perms (60+ days)	99%	Massive attack surface

The implication is clear: If attackers exfiltrate in 72 minutes and your SOC takes 4 hours to respond, you've already lost. Automated response is the only viable control.

The Academic View: What Researchers Found

Four recent arXiv papers formalize the threats described above. Here's what each one discovered and what mitigations they propose:

AgentSentry (arXiv:2602.22724)

Problem: Indirect Prompt Injection manipulates agent behavior across multiple turns, making it nearly invisible to single-turn defenses.

Discovery: By modeling IPI as a "temporal causal takeover," the researchers identified that the attack signal dominates at tool-return boundaries — the moment when an external tool sends data back to the agent.

Mitigation: Counterfactual re-execution: the system replays the agent's reasoning with the suspicious content removed. If the agent's behavior changes significantly, the content is flagged and purified.

Result: 0% Attack Success Rate on the AgentDojo benchmark while maintaining normal task utility.

AdapTools (arXiv:2602.20720)

Problem: MCP (Model Context Protocol) servers are increasingly used to connect agents to tools, but who audits them?

Discovery: 50% of third-party MCP servers lack any form of security audit. Attackers can register malicious MCP servers that look legitimate.

Mitigation: Adaptive tool-based IPI detection that monitors tool call patterns for anomalies.

Taming OpenClaw (arXiv:2603.11619) — Tsinghua University + Ant Group

Problem: Existing defenses are "point solutions" that miss cross-layer attacks.

Discovery: Introduced a 5-layer lifecycle framework (initialization → input → inference → decision → execution) revealing that most attacks exploit transitions between layers, not individual layers.

Mitigation: Proposes holistic defense: plugin vetting, context-aware filtering, memory integrity validation, intent verification, and capability enforcement — all applied at layer boundaries.

Zombie Agents (arXiv:2601.15654)

Problem: What happens when an IPI enters long-term memory?

Discovery: Malicious instructions persist across sessions through self-reinforcing injection patterns. The agent writes the malicious instruction into its own memory, creating a "sleeper agent" that activates days later.

Mitigation: Memory integrity validation protocols and session-scoped memory isolation.

Reference Architecture: Secure Agent Deployment on AWS

The security principles below are cloud-agnostic:

Principle	AWS Implementation	Equivalent Elsewhere
Hardware isolation	Nitro Enclaves	GCP Confidential VMs, Azure Confidential Computing
Ephemeral compute	Firecracker microVMs	Kata Containers, gVisor
Policy-as-code	Cedar (AWS)	OPA/Rego (cloud-agnostic, CNCF)
Zero Trust access	Verified Access	BeyondCorp (GCP), Azure AD Conditional Access

This article focuses on AWS because that's where I build, but the architecture pattern applies universally.

The Reference Architecture

Key Components Explained

1. Nitro Enclaves (Hardware Isolation)

The agent runs inside a Nitro Enclave — no network, no storage, no SSH. Communication happens exclusively via vsock to a forward proxy on the parent instance.

PCR Register	What It Measures	Why It Matters
PCR0	Enclave image hash	Agent binary wasn't tampered with
PCR1	Kernel + ramdisk hash	OS integrity verified
PCR3	IAM Role ARN hash	Only authorized instances can start it
PCR8	Signing certificate hash	Software origin verified

2. Firecracker microVMs (Ephemeral Sessions)

Feature	Firecracker	Docker
Isolation	Hardware (KVM)	Shared kernel
Boot time	<125ms	~1-5s
RAM overhead	<5MB	~50-200MB
Escape risk	Minimal	High
Post-task cleanup	Auto-destroyed	Needs config

Bedrock AgentCore Runtime uses Firecracker to run each agent session in a dedicated microVM. Memory is sanitized immediately after the session ends.

3. Zero Trust with Cedar

// Only managed devices + FinanceOps group + internal network
permit(
    principal,
    action == Action::"InvokeAgent",
    resource == Resource::"FinancialAgent"
)
when {
    context.device.is_managed == true &&
    context.identity.groups.contains("FinanceOps") &&
    context.network.source_ip.is_in_range(IPRange::"10.0.0.0/24")
};

4. OPA for Tool Validation

package agent.authz
default allow = false

# Allow reads on non-sensitive tables
allow {
    input.tool == "DatabaseReader"
    input.operation == "select"
    not input.table == "user_credentials"
}

# Block destructive ops in production
deny {
    input.operation == "delete"
    input.environment == "production"
    not is_maintenance_window
}

Secure Deployment Checklist

✅ Agent sandbox (Firecracker microVM or Nitro Enclave)
✅ Signed plugins/skills (cryptographic integrity)
✅ Policy engine (OPA/Cedar for every tool invocation)
✅ Network isolation (separate subnets: agent, tool, data)
✅ Credential vault (Secrets Manager — never plaintext)
✅ Egress filtering (domain allowlist via forward proxy)
✅ Automated response (EventBridge → Lambda kill-switch)
✅ Immutable logging (CloudWatch + tamper protection)
✅ Device posture validation (Verified Access)
✅ Session-scoped memory (no cross-session persistence)

Key Takeaways

The model is untrusted. Security must be architectural, not behavioral. You cannot rely on prompt engineering to keep an agent safe.
Indirect Prompt Injection is the #1 threat. It's the attack vector that makes agents fundamentally different from traditional software. Every layer of defense must account for it.
72-minute exfiltration means human-speed response is obsolete. Automate your incident response with EventBridge + Lambda.
36.8% of AI skills have security flaws (Snyk ToxicSkills). Treat every plugin as untrusted code.
The agent attack surface = LLM reasoning + tool execution + filesystem access + internet access. Secure each layer independently.
The tools exist today. Whether you use AWS (Nitro, Firecracker, AgentCore), GCP (Confidential VMs), or open-source (Kata, gVisor, OPA) — the principle is the same: hardware isolation + policy enforcement + ephemeral compute.

References

Oasis Security — ClawJacked Technical Report (CVE-2026-25253)
NIST NVD — CVE-2026-28363 (CVSS 9.9)
Snyk — ToxicSkills Study (Feb 2026)
Wiz Research — Moltbook Breach Analysis
Anthropic — GTG-1002: First AI-Orchestrated Espionage Campaign
Palo Alto Networks — Unit 42 Global Incident Response Report 2026
CrowdStrike — Global Threat Report 2025
AWS — Security Reference Architecture for Generative AI (Capability 5)
AWS — Nitro Enclaves Cryptographic Attestation Documentation
AWS — Bedrock AgentCore Runtime
arXiv:2602.22724 — AgentSentry
arXiv:2603.11619 — Taming OpenClaw
arXiv:2601.15654 — Zombie Agents
NIST RFI 2026-00206 — Security Considerations for AI Agents

If you found this useful, consider following for more cloud security deep dives. Questions? Drop them in the comments.

DEV Community