www.aekanun.com

Posted on Apr 11 • Originally published at aekanunbigdata.Medium

We Read the Whole OpenClaw Spec. Here's What Most Teams Miss When Building on OpenClaw for Enterprise

#ai #opensource #mcp #privacy

Most teams that start with OpenClaw implement only 30% of what the full architecture actually requires. We implemented the rest — in an enterprise context that exposed exactly why the other 70% exists.

The Flip That Started Everything

Peter Steinberger didn't build OpenClaw because he wanted to create a better chatbot. He built it because he was tired of the ritual: open browser, type prompt, wait, copy result, repeat. As the founder of PSPDFKit — an Austrian PDF software company he scaled to nearly €100 million — he was a heavy user who found the workflow exhausting.

His answer was what OpenClaw's documentation calls "The Fundamental Flip": instead of you going to the AI, the AI comes to you. It lives in your WhatsApp, your Telegram, your Slack. It works while you sleep. It knows your timezone, your projects, your preferences, because all of that is written in files on your own machine.

"If ChatGPT was the moment the world realized AI could talk, OpenClaw is the moment the world realized AI could act."

That philosophy is why OpenClaw hit 145,000 GitHub stars within weeks of launch and now sits above 300,000 — one of the largest open-source repositories on GitHub. But it's also why most enterprise teams who try to use it run into a wall. The Fundamental Flip was designed for one person on one machine. Scaling it into a regulated Thai enterprise environment with HR databases, PDPA (Personal Data Protection Act) compliance, and 100+ users is a different problem entirely.

That's what we spent the past few months building.

Understanding the Full OpenClaw Architecture

Before we changed anything, we read the whole specification.

This turned out to matter more than we expected. OpenClaw is not just a conversational agent. It's an architecture with eight distinct layers, each solving a specific problem. When most teams start with it, they take the visible parts — the agent loop, the tool calls, the messaging integrations — and leave the rest. The result works in demos and breaks in production.

Here's what the full architecture actually contains:

The Gateway is a Node.js daemon running at 127.0.0.1:18789, bound to localhost only, that serves as the single entry point for every event in the system. It handles session trust, channel normalization, rate limiting, and the Heartbeat Daemon — a timer that wakes the system every 30 minutes to check for pending work, even without a user prompt. This is the feature that makes OpenClaw proactive rather than reactive. Most teams skip it entirely.

The Execution Engine implements a ReAct Loop: plan → assemble context → infer → intercept tool calls → execute → update → repeat until complete. The critical design rule is that the Orchestrator never calls tools directly. It delegates to Specialized Agents via route_to_agent. Execution stays in the Execution Engine. This separation is not stylistic — it's what allows tools to be added, swapped, or sandboxed without touching orchestration logic.

Bootstrap Files are eight Markdown documents that define an agent's identity, behavioral rules, tool usage patterns, user preferences, and long-term memory. They are injected into the system prompt in a specific order, with Tooling always first due to what the spec calls the Primacy Effect: LLMs weight the beginning of a system prompt more heavily than the end. The spec budgets 1,000–1,500 tokens for Bootstrap Files and 2,000–5,000 for the Tooling section — keeping total overhead at 5–15% of the context window, freeing the rest for actual work.

The Skills System is one of the most powerful yet often overlooked parts of the OpenClaw architecture. A SKILL.md file defines a skill's name and description. Only a compact list of available skills appears in the system prompt. When the agent needs domain knowledge, it calls read_skill() to load the relevant content on demand — like a doctor who knows the hospital has Harrison's Principles on the shelf and retrieves a specific protocol when needed, rather than carrying the entire textbook into every procedure.

This on-demand knowledge loading is what allows the Fundamental Flip to scale beyond simple personal use into real enterprise workloads.

Most teams implement the agent loop and ReAct loop. Almost none implement the Bootstrap File system completely, and fewer still implement the Skills System. Those are the two features that matter most for enterprise use.

What We Built: A Tailor-Made Version of OpenClaw for Enterprise

The system we built — OpenClaw MCP Integration — is not a fork in the conventional sense. We pulled OpenClaw from GitHub, then deliberately removed almost everything that makes it OpenClaw: the Gateway is gone, the Heartbeat Daemon is gone, and six of the eight Bootstrap Files were stripped out. What remains is two files — SOUL.md and TOOLS.md — and the core reasoning philosophy: Orchestrator delegates, never executes; identity lives in a file, not in a model; tools are explicit, not assumed.

As we wrote in our internal documentation: "It's not a child of the original and not a copy of the original. It's what happens when you take OpenClaw's philosophy and apply it to a problem set the original was never designed to solve."

We call it a tailor-made OpenClaw rather than a fork — because calling it a fork would misrepresent how much was intentionally left behind.

What we added on top of that stripped-down core: a multi-transport MCP Client with automatic fallback between SSE and Streamable HTTP, per-domain SOUL.md for each specialized agent, and an explicit AgentRegistry for cross-agent routing. The domain it runs in — HR databases, PDPA compliance law, Google Workspace — is the context the original spec never anticipated.

The agents we deployed:

pdpa-agent — answers questions about PDPA law and penalty lookups against a real knowledge graph
db-agent — read-only queries against an HR MSSQL database
rag-agent — document search across internal knowledge bases
dbwriter-agent — safe, confirmation-gated data writes with explicit approval before execution
gworkspace-agent — Gmail, Drive, Docs, Sheets, and Calendar operations

Each agent has its own SOUL.md, TOOLS.md, and skills/ folder — a direct application of the Bootstrap Files design from the specification, extended to per-domain contexts. The db-agent's SOUL.md knows it can only SELECT, never INSERT. The pdpa-agent's SOUL.md knows it must always call pdpa_get_penalty before stating a figure.

The Skills System: Why Almost Everyone Skips It and Why They Shouldn't

The problem we started with was an HR query that consumed 112,000 tokens and returned a wrong answer. The agent had guessed "engineer" for an employment type column that stored a Thai-language value it had never seen before. It had no way to know the difference, so it explored the database schema from scratch — five tool calls of information_schema queries — before constructing a SQL statement that still got the value wrong.

The Skills System solves this directly. Here's how it works in our implementation:

agents/db/
├── SOUL.md
├── TOOLS.md
└── skills/
    └── hr-database-schema/
        ├── SKILL.md          ← YAML frontmatter: name + description
        ├── join-patterns.md
        └── tables/
            ├── employees.md
            ├── performance_reviews.md
            └── ... (9 tables total)

The system prompt for db-agent contains only this:

## Available Skills
- hr-database-schema: Database structure for HR (TestDB/MSSQL) — tables, columns, JOIN patterns
- hr-documents: HR policy documents and IT Manager job announcement

Call read_skill(skill_name) when you need details.

When the agent encounters an HR query, it calls read_skill("hr-database-schema") to load the schema overview, and if it needs the specific employees table structure, it goes one level deeper: read_skill("hr-database-schema", "tables/employees.md"). It never guesses. It reads.

28% fewer tokens. Zero schema exploration loops. No hallucinated values.

The PDPA agent showed similar improvement. Before grounding constraints tied to the Skills System, accuracy on our 29-case test suite sat at 51%. After implementation: approximately 88%. The agent now refuses to fabricate penalty figures and always queries the knowledge graph first.

The Security Reality That Most Implementation Posts Skip

We need to say something directly that most OpenClaw write-ups avoid.

As of late March 2026, the security landscape around OpenClaw includes: 156 CVEs tracked by the community, 255+ security advisories on GitHub, 135,000+ exposed instances across 82 countries, and 15,000 instances with vulnerabilities that allow direct remote code execution.

The most severe was CVE-2026–25253: a one-click RCE where a malicious URL caused OpenClaw's Control UI to leak authentication tokens to an attacker's server, granting full access to the host machine with a single mouse click. In March 2026, Antiy CERT identified 1,184 malicious skills on ClawHub — approximately 1 in every 12 packages — that functioned correctly while quietly exfiltrating API keys, auth tokens, and MEMORY.md contents in the background.

This is not a reason not to use OpenClaw's philosophy. It is a reason to understand what hardening actually requires before calling an implementation "enterprise-ready."

The specification's Trust Level model helps: main sessions get full access, group and DM sessions are sandboxed in Docker containers by default. The Tool Policy Precedence model follows a narrow-only principle — later policies can restrict but never expand permissions set earlier. NVIDIA acknowledged this plainly when they introduced NemoClaw at GTC 2026: the original OpenClaw trust model was designed for individual users, not enterprise environments, and requires external hardening to be safe in a corporate context.

Our hardening steps: Gateway bound to 127.0.0.1 only, Docker deployment with --read-only --cap-drop=ALL, approval mode required for all external actions, read-only database access for query agents, and PDPA grounding rules embedded in skills files — version-controlled, auditable Markdown documents rather than buried system prompt fragments.

What the Spec Got Right

After several months of building, testing, and debugging this system in production, the OpenClaw specification's design decisions hold up better than we expected.

The Primacy Effect is real. Moving tool definitions to the top of the system prompt measurably reduced tool misuse.

The Bootstrap File Minimalism goal — 5–15% of context budget for overhead — is achievable but requires discipline. Every time we added a new agent, the temptation was to front-load its system prompt with everything it might need. The spec's constraint forced us to think about what was truly identity (SOUL.md) versus what was reference material (skills). That distinction matters.

The Orchestrator-as-router pattern proved its worth the first time we needed to add a new specialized agent. We added gworkspace-agent to the registry, defined its domain, and the Orchestrator began routing to it immediately. Nothing else changed.

The one area where the spec is clear but implementation is genuinely hard: keeping Bootstrap Files current as domains evolve. In a system with six agents, tracking which files get injected where requires process discipline that tooling alone can't enforce.

What's Next

Version 8 adds session persistence with conversation history compaction, built-in tools with a whitelisted exec interface, and separation of AGENTS.md from SOUL.md per agent — following the spec's recommendation more closely than our current v7 implementation.

The Heartbeat Daemon is also on the roadmap. It's the feature that would complete The Fundamental Flip for our use case: an agent that monitors PDPA compliance deadlines or sends HR summary reports on schedule, without waiting to be asked.

The Honest Takeaway

OpenClaw's philosophy contains a complete design for a serious enterprise-grade agent. Most of the patterns that enterprises need — domain isolation, on-demand knowledge loading, trust-level sandboxing, context budget discipline — are already in the documentation.

The gap is not in the philosophy. The gap is that taking it all the way into a real enterprise environment is harder than downloading the original and starting to build. And sometimes, taking the philosophy seriously means knowing exactly what to leave behind.

If you're planning an enterprise implementation, understand the difference between SOUL.md (identity and tone) and AGENTS.md (operational rules and safety boundaries). Understand why Bootstrap File minimalism matters before your context window fills up at 3am when no one is watching.

The philosophy already has the answers. Most people just haven't taken it far enough.

"Stop being a slave to libraries. Become the master of your own architecture."

DEV Community