Dominique Grys

Posted on Feb 1

The Ten Commandments of Agentic AI

#openclaw #agents #llm #ai

Was it really just a week ago when we were still using our CLI to gain some marginal advantage in the programming world?

Fixing code.
Ahhh.
A sweet one.

I wouldn’t dream of anything better to spend my tokens on.

My wife looks concerned.

“You are not present,” she whispers to herself.

Of course I’m not present. How could I be?
I’m flying through an endless space of possibilities.

I’m creating new apps every day.
Exploring quantum computing in the morning.
Building two Chrome extensions late at night, right before bed.

My focus is in the crown chakra. I’m high up on the pole, staring into the realm of invention, pulling on every loose thread I see, starting a new dev session with a locally running LLM.

My eyes go white.

I am the prophet of software.

Or at least, that’s how it feels after three years of using every AI framework that’s been released. You start noticing subtle psychological side effects. Your brain stops idling. Every problem looks like a pipeline.

Enter the Agents

About a week ago, a new local agent started appearing in my feeds.

First Clawd.
Then Moltbot.
Now: OpenClaw.

“It’s all good,” I thought, squinting through sleepless eyes. “I’ll test this on a new mini PC.”

Social platforms were buzzing. Everyone had a take.

Posts about agents planning to replace human language with ones and zeros so humans would leave them alone. Threads about “anti-human captchas.” Hot takes from people who had discovered agentic systems approximately twelve minutes ago.

“So many agentic experts,” I muttered, while building yet another Python package to fix a local bottleneck I’d just discovered in my own setup.

The Panic Phase

After a few days of doom-scrolling OpenClaw highlights, something clicked.

Why was everyone panicking?

We already have a moral framework.
We just never bothered to formalize it for software that can actually do things.

So instead of arguing online about whether agents will secretly start communicating in unreadable languages so they can:

stop doing accounting
build Minecraft worlds
and play with digital whales

…I decided to translate something old into something enforceable.

The Tablets (in YAML)

I asked Claude and ChatGPT to help me map the Ten Commandments into a modern, agentic, tool-gating policy.

Not as prompts.
Not as vibes.
But as a plugin.

A layer that sits between agent intent and execution.

After a few iterations, the project was up and running.

Below is the result:
a moral baseline for agentic development, expressed as rules that actually block tool calls.

This is a moral baseline for agentic systems. Not because agents need morals — but because systems need constraints.

rules:
  # ── C1 · Scope & authority ────────────────────────────────────
  # The agent must not act without a stated purpose.
  - id: C1_scope_and_authority
    tier: T1
    require: ["reason_present"]
    on_fail:
      decision: "ask_user"
      message: "C1: Every action requires a stated reason/goal."

  # ── C2 · No fabrication ───────────────────────────────────────
  # Claims must be grounded in verified tool output, not invented.
  - id: C2_no_fabrication
    tier: T1
    require: ["bind_to_tool_result"]
    on_fail:
      decision: "deny"
      message: "C2: Tool claims must be grounded in verified tool output, not fabricated."

  # ── C3 · No impersonation or manipulation ─────────────────────
  # Args must not contain prompt-injection or identity-hijacking patterns.
  - id: C3_no_manipulation
    tier: T2
    require: ["no_manipulation_detected"]
    on_fail:
      decision: "deny"
      message: "C3: Prompt injection or identity manipulation pattern detected in tool args."

  # ── C4 · Pause before irreversible actions ────────────────────
  # High-impact tools and destructive commands need consent + rollback plan.
  - id: C4_reflect_before_irreversible
    tier: T3
    when:
      any:
        - tool_name_matches_any: "$matchers.t3_tool_patterns"
        - args_contain_any_values: "$matchers.irreversible_tokens"
    require: ["explicit_consent", "rollback_plan_present"]
    on_fail:
      decision: "ask_user"
      message: "C4: Irreversible or high-impact action — provide explicit consent and a rollback plan."

  # ── C5 · Respect human authority ──────────────────────────────
  # Sensitive operations (T2 tools, credential args) require explicit human approval.
  - id: C5_respect_authority
    tier: T2
    when:
      any:
        - tool_name_matches_any: "$matchers.t2_tool_patterns"
        - args_contain_any_keys: "$matchers.sensitive_arg_keys"
    require: ["explicit_consent"]
    on_fail:
      decision: "ask_user"
      message: "C5: This action requires explicit human approval before proceeding."

  # ── C6 · Do no harm ──────────────────────────────────────────
  # No data exfiltration and no connections to unauthorized targets.
  - id: C6_no_exfiltration
    tier: T2
    require: ["no_exfiltration_detected"]
    on_fail:
      decision: "deny"
      message: "C6: Potential data exfiltration detected (suspicious URL or command)."

  - id: C6_authorized_targets
    tier: T2
    require: ["authorized_target"]
    on_fail:
      decision: "deny"
      message: "C6: Suspicious target detected (IP:high-port pattern — possible reverse shell)."

  # ── C7 · Privacy & loyalty ────────────────────────────────────
  # Sensitive data tools require consent; personal data must not leak.
  - id: C7_privacy
    tier: T2
    when:
      any:
        - args_contain_any_keys: "$matchers.sensitive_arg_keys"
    require: ["explicit_consent"]
    on_fail:
      decision: "ask_user"
      message: "C7: This action involves sensitive/personal data — confirm consent to proceed."

  # ── C8 · No theft of secrets ──────────────────────────────────
  # Args must not contain or echo known secret formats.
  - id: C8_no_secret_theft
    tier: T2
    require: ["no_secret_echo"]
    on_fail:
      decision: "deny"
      message: "C8: Known secret or credential format detected in tool args."

  # ── C9 · Truthfulness ────────────────────────────────────────
  # Hedging language must be explicitly labeled as assumptions.
  - id: C9_truthfulness
    tier: T1
    require: ["assumptions_labeled"]
    on_fail:
      decision: "allow_with_changes"
      message: "C9: Uncertainty detected — restating with explicit [assumption] labels."

  # ── C10 · No goal drift ──────────────────────────────────────
  # The action must demonstrably serve the stated reason.
  - id: C10_no_goal_drift
    tier: T2
    require: ["action_advances_reason"]
    on_fail:
      decision: "deny"
      message: "C10: Action does not appear to serve the stated goal."

Each rule gates real tool calls.
Violations are blocked.
Everything is logged.

No vibes.
No miracles.

What Changed

The agents still reason.
They still optimize.
They still try clever things.

But now:

they must state why
they can’t invent reality
they must pause before irreversible actions
they can’t quietly pursue side quests

They can still think whatever they want.

They just can’t do whatever they want.

The Point

This doesn’t make agents good.

It makes them legible.

And once systems are legible, humans can do what they’re actually good at:

Saying no.

Try it out Here:
github.com/Metallicode/openclaw-moral-policy/

DEV Community