John R. Black III

Posted on Feb 5

OpenClaw, what it costs you and how to fix it (generally speaking)

#openclaw #systemdesign #ai #opensource

TL;DR

OpenClaw shows what happens when agentic power outpaces governance. The fix is not better prompts, it is a control plane that enforces identity, authorization, provenance, containment, and oversight. This adds cost and latency, but that is the price of deploying agents in adversarial environments.

Disclaimer

This is not an indictment of the OpenClaw developers by the way, but an example of what happens when powerful agentic systems mature faster than their governance models. Which is becoming more common each day.

What OpenClaw Is

OpenClaw, Clawdbot, Moltbot, it has gone by many names. Currently known as OpenClaw, it is an open source, local first AI agent platform that connects large language models to real tools and messaging surfaces like Slack, Telegram, and local system utilities. It is designed to let users wire up “skills” or plugins so the agent can read files, call APIs, and perform actions on the host machine. In practice, this makes OpenClaw powerful and convenient for automation, personal assistants, and developer workflows.

The tradeoff is that governance is mostly configuration based. Safety depends on how carefully the operator sets allowlists, sandboxes tools, and manages third party skills. There is no built in, always on security control plane that enforces identity, authorization, provenance, or oversight across every action. This is why OpenClaw is a useful real world example of how agentic systems can become risky when power scales faster than governance.

Current Issues We Are Seeing

On the surface level we are seeing many different issues popping up with the agent.

Malicious “skills” spreading malware

A large number of user-submitted extensions (“skills”) in OpenClaw’s ClawHub marketplace contain malware that steals credentials, installs keyloggers, and drops backdoors on systems. Hundreds of these malicious skills have been identified in the wild.

Supply chain abuse and social engineering

Threat actors disguise harmful tools as productivity or crypto-related skills. Some lure users to run terminal commands that fetch and execute malicious code, essentially weaponizing the agent’s access to local files and system tools.

High-risk access and execution permissions

Because OpenClaw agents can read email, browser data, SSH keys, and execute shell commands, a compromised or malicious skill can escalate from a simple extension into full credential theft or system compromise.

Prompt injection and unsafe instruction ingestion

Interface designs that let agents read untrusted input (from skills, user data, or other agents) can be exploited so that one instruction overrides safety constraints or implants harmful directives.

Rapid adoption outpacing security measures

The project’s viral growth means many deployments are done by casual users without proper sandboxing, security configuration, or threat awareness, magnifying the risk surface.

Critical vulnerabilities and hijack risks

Researchers have found vulnerabilities that could allow external attackers to hijack agent instances or execute code if an OpenClaw installation is improperly configured or exposed.

Regulatory and national warnings

Government authorities have explicitly flagged potential cyberattack and data breach risks tied to misconfigured OpenClaw installations, urging stronger access controls and audits.

With viral adoption of agentic AI and no one really understanding what security needs to be implemented for the system to be moderately safe, it is no surprise we are seeing these concerns showing up as actual malware campaigns, credential theft campaigns, prompt injection behavior, and warnings from security firms and regulators. The core pattern is a combination of powerful system access + weak governance + open extensibility = high-impact attack surface. That makes this a real world example of what happens when agentic systems are deployed without robust control frameworks.

What OpenClaw’s “Governance” Looks Like in Practice

Governance exists, but it is mostly operator-configured guardrails, not an always-on enforcement architecture:

DM pairing and allowlists (unknown senders get a pairing code, bot does not process until approved).
Tool policy + sandboxing (deny tools like exec per agent, disable elevated, keep sandboxed execution).
Elevated access is gated (it is not just “always root,” it is controlled by config and can be restricted).
A built-in security audit command meant to flag “footguns” and optionally apply safer defaults.
Some formal security modeling or model-checking claims, useful, but explicitly not a proof that the full implementation is correct.

The big red flag is the skills supply chain. ClawHub has already been used to distribute large volumes of malicious skills, and those skills can lead users, or the agent, into running harmful commands and stealing credentials.

My 11 Controls Architecture Applied to OpenClaw

1) Identity

Has (partial): DM pairing and local allowlists create a basic identity gate for “who can talk to the bot.”

Missing (enterprise-grade): strong cryptographic identity, device binding, token-based identity chain, and role-backed identities.

2) Authorization

Has (partial): per-agent tool allow or deny, sandbox vs host, and “elevated” gating.

Missing: policy enforcement that is consistently separated from the reasoning layer, plus explicit least-privilege boundaries between untrusted input, memory, and tool invocation.

3) Time-based access

Has (light): pairing codes expire.

Missing: time-based authorization as a first-class control for sensitive actions.

4) Rate limiting

Unclear or not prominent in docs.

Missing: explicit throttling policies as a core enforcement layer.

5) Rejection logging

Has (some observability): logs exist.

Missing: structured rejection telemetry designed for forensics and policy tuning.

6) Consensus and oversight

Mostly missing. Multi-agent routing is not the same as quorum, challenger-arbiter flows, or consensus-based approvals.

7) Integrity and provenance

Weak for extensions.

Missing: provenance validation for skills, plus integrity checks on memory and tool plans.

8) Supply chain security

Known weak point.

Has (some basics): public security policy and guidance, but no preventative controls.

9) Containment and recovery

Has (partial): sandboxing and denylisting.

Missing: automated quarantine and rollback behaviors.

10) Privacy and output protection

Risk is high by design.

Missing: privacy enforcement as a dedicated control layer.

11) Adversarial robustness

Documented weakness. Configuration and audits exist, but adversarial pressure is a default condition.

The Uncomfortable Truth

Most AI agent platforms today do not even implement basic enterprise security patterns consistently. The novelty is enforcing standard security controls inside AI-to-AI and agent-to-tool boundaries.

How to Enforce the Controls

Put a Control Plane in Front of Everything

Every input, memory write, tool call, and output goes through a single enforcement layer.

Practical implementation:

Gateway service (FastAPI, Envoy, or API Gateway)
Policy engine (OPA, Cedar, or custom)
Token service for agent identity
Central logging pipeline

Split the System into Trust Zones

Untrusted input zone
Reasoning zone
Action zone
Memory zone

Use Capability Tokens Instead of Static Permissions

Short-lived tokens encode identity, authorization, time, scope, and rate limits.

Enforce Tool Mediation

The model emits intent. The control plane validates, authorizes, rate limits, sanitizes, and executes.

Insert Quorum and Oversight for High-Risk Actions

Primary agent proposes, challenger reviews, arbiter or policy engine decides, optional human-in-the-loop.

Add Provenance to Everything That Moves

Attach metadata to memory, plans, outputs, and tool results.

Bake Containment into the Control Plane

Triggers include policy violations, anomalies, known-bad skills, and prompt injection indicators.

Responses include quarantine, token revocation, memory locks, and session kills.

Treat Outputs as a Security Boundary

Classify, redact, enforce destination policy, filter, and attach provenance.

Add Continuous Adversarial Testing

Prompt injection tests, fuzzing, policy bypass attempts, and red team simulations.

Implement the Controls as Modules

identity_middleware
auth_middleware
time_middleware
rate_limit_middleware
rejection_logger
quorum_engine
provenance_tracker
containment_engine
privacy_filter
adversarial_detector

What the Controls Do Not Solve

Moral safety
Bad human goals
Insider misconfiguration
Training-time data poisoning
Hallucinations

Tradeoffs and Cost

Control Plane Latency

Local policy engine: ~1 to 5 ms

Networked policy engine: ~5 to 30 ms

Cross-region calls: 50 ms or more

Consensus Overhead

Parallel challenger: +1 to 2 LLM calls

Human in the loop: seconds to minutes

Provenance and Logging Overhead

Metadata, storage, and indexing add cost and operational overhead.

Operational Overhead

Requires policy management, tuning, logging, alerts, audits, and incident response.

Bottom Line

Frameworks cost latency, engineering effort, and ops complexity. What you get is predictable behavior, bounded blast radius, auditability, legal defensibility, and survivability in adversarial environments. If you want raw speed with no friction, do not give agents real world permissions.

A Hard Truth About Open Source and Expectations

Open source guarantees visibility, not safety. Popular tools inherit threat models they were never designed to survive. Responsibility shifts to the operator.

Who Should Be Responsible for the Security Layer

Not every team should build this themselves. If your agents touch real users, credentials, data, or infrastructure, security ownership must exist. If you cannot own that layer, do not deploy agentic systems with real world permissions.

My Take on OpenClaw

Powerful tools should be used and vetted by real craftsmen. In its current form, OpenClaw should be limited to research, experimentation, and threat analysis. Until hardened with real controls and operational discipline, it represents a serious attack surface.

If you think any of this is wrong, comment and I will research and update as needed.