OpenClaw Diff Artifacts: Review Agent Edits Before They Ship

#agents #ai #automation #productivity

OpenClaw Diff Artifacts: Review Agent Edits Before They Ship

The risky part of agent-made changes is not that an agent can edit files. The risky part is when a human is asked to approve the result without a clean artifact to inspect.

A chat summary is not a review surface. "I changed the CTA and tightened the policy" sounds fine until the operator realizes the edit also touched unrelated state, moved a paragraph, or changed a config value that was not part of the request. If you run agents around production content, customer workflows, docs, or code, you need the change to become reviewable before it becomes shippable.

OpenClaw's diffs plugin exists for that exact gap. It turns before-and-after text or a unified patch into a read-only diff artifact. The artifact can be presented as a Gateway viewer URL, rendered as a PNG or PDF file, or returned as both. That sounds small. In real operations, it is the difference between "trust me" and "inspect this exact change."

What the diffs tool actually gives you

The official docs describe diffs as an optional plugin tool with short built-in system guidance and a companion skill. It accepts either before and after text, or a unified patch. It can return a canvas-ready viewer URL, a rendered file path for message delivery, or both outputs in one call.

That means an agent does not have to paste an unreadable wall of changed code into chat. It can create a focused artifact and hand the operator a review surface. In a local operator flow, mode: "view" is enough because the agent can present the viewer. In a Slack, Telegram, WhatsApp, or approval handoff flow, mode: "file" may be better because the output can be attached as a PNG or PDF. In higher-risk work, mode: "both" gives the operator an interactive viewer and a durable file-style artifact for the message surface.

The quick-start path is simple: enable the plugin, call diffs with a mode, then use the structured details response. The docs say viewer-producing modes include fields such as viewerUrl, viewerPath, artifactId, expiresAt, inputKind, and fileCount. File-producing modes include filePath, path, fileBytes, fileFormat, and quality/render fields.

{ plugins: { entries: { diffs: { enabled: true, config: { defaults: { mode: "both", theme: "dark", layout: "split", fileFormat: "pdf", ttlSeconds: 1800 } } } } } }

I would keep this enabled for teams that let agents propose changes, but I would still treat it as a review artifact generator, not as approval itself. A rendered diff proves what was presented. It does not prove that the change is correct, complete, or safe to deploy.

Before and after is the cleanest review input

The most operator-friendly input is before plus after. It lets the viewer show the full context that exists in both versions. The docs also note that expandable unchanged sections are more reliable when the tool has full before-and-after content. That matters when the change is small but context still decides whether the edit is safe.

{ "before": "# Runbook\\n\\nDeploy manually.", "after": "# Runbook\\n\\nDeploy after build and live URL checks.", "path": "docs/runbook.md", "mode": "both", "layout": "split", "ttlSeconds": 900 }

This is the pattern I want for content edits, policy text, templates, prompts, short configs, and runbooks. Give the reviewer the old version, the new version, the path label, and a short artifact lifetime. The operator can inspect the exact delta instead of reconstructing it from a prose description.

There are limits. The docs cap before and after at 512 KiB each. That is enough for many real review tasks, but it is not a license to stuff an entire repository into the tool. If the change is huge, the safer move is to split the review into meaningful files or sections.

Patch input is better for commit-style review

The other input is a unified patch. That is useful when the agent already has a diff from a Git operation or wants to show a proposed patch before applying it. The docs cap patch input at 2 MiB, 128 files, and 120,000 total lines. They also reject mixed input: provide either patch or before/after, not both.

{ "patch": "diff --git a/src/config.ts b/src/config.ts\\n--- a/src/config.ts\\n+++ b/src/config.ts\\n@@ -1 +1 @@\\n-export const mode = 'draft';\\n+export const mode = 'review';\\n", "mode": "view", "title": "Config review" }

Patch input is not always as expandable as full before-and-after input. The docs are clear that unified patches may include collapsed rows without expand controls because omitted context bodies are not available inside the parsed patch hunks. That is expected behavior, not a broken viewer.

For code review, that tradeoff is fine. A patch is compact and maps nicely to the way developers already inspect changes. For sensitive policy edits, legal text, or high-context content, I prefer before-and-after input because it gives the viewer more context to render.

Need agents to ship work without turning review into guesswork?

ClawKit gives you the practical operating patterns for review artifacts, command safety, memory, and production follow-through. Get ClawKit for $9.99.

Where apply_patch fits in the workflow

apply_patch is documented as a subtool of exec for structured multi-file edits. It can add files, update files, delete files, and move files using *** Move to: inside an update hunk. It is available by default for OpenAI and OpenAI Codex models, and its config lives under tools.exec.applyPatch.

That makes apply_patch a good edit surface, but it is not the same thing as a review surface. The structured patch says what the agent intends to apply. The diff artifact gives the operator a readable way to inspect that intent or inspect the resulting change.

*** Begin Patch *** Update File: docs/runbook.md @@ -Deploy manually. +Deploy after build and live URL checks. *** End Patch

The docs also state that tools.exec.applyPatch.workspaceOnly defaults to true. That is the right default. A patch tool should normally stay inside the workspace. Setting workspaceOnly to false should be intentional, rare, and tied to a concrete operator-approved job.

A practical review loop looks like this:

The agent prepares a small patch for the requested change.
The agent renders a diffs artifact from the patch or from before-and-after text.
The operator reviews the artifact before approval or deploy.
The agent applies the patch with apply_patch or leaves the proposed change for manual review.
The agent runs the smallest meaningful verification step: build, test, lint, preview, or direct inspection.

That loop is slower than blind editing, but only by a small amount. It is much faster than debugging an agent change that nobody really reviewed.

Do not confuse exec output with review

exec is the command runner. The docs say it runs shell commands in the workspace, supports foreground and background execution, and can use process for background sessions when that capability is allowed. It can also run with a pseudo-terminal for TTY-only CLIs, set a working directory, pass allowed environment overrides, enforce a timeout, and route execution to sandbox, gateway, or node.

That is powerful, but a terminal log is not a diff review. A command like git diff can produce the raw material, and exec can run the command, but the operator still benefits from a rendered artifact. The diff viewer makes the change easier to scan, safer to present, and easier to attach to a channel handoff.

The exec docs also call out why command surfaces need policy. Host and node execution have security modes such as deny, allowlist, and full, plus ask modes such as off, on-miss, and always. Gateway and node approvals are controlled by ~/.openclaw/exec-approvals.json. Host execution rejects env.PATH and loader overrides such as LD_* and DYLD_* to reduce binary hijacking risk.

That policy layer answers "may this command run?" Diff artifacts answer "what exactly changed?" You need both in a production workspace.

Security details that matter

The diffs plugin is built with temporary artifacts and secure defaults. Viewer URLs use the route /plugins/diffs/view/{artifactId}/{token}. Artifact metadata includes a random 20-character artifact ID, a random 48-character token, creation and expiry timestamps, and a stored viewer path. The default viewer TTL is 30 minutes, and the maximum accepted TTL is 6 hours.

By default, viewer access is loopback-only. The docs list security.allowRemoteViewer with a default of false, which means non-loopback requests to viewer routes are denied. Remote viewer access can be enabled only when the deployment requires it, and only if the tokenized path is valid.

The viewer response is also hardened with a restrictive content security policy. File rendering blocks external network requests and allows only local viewer assets from the plugin asset route. If mode: "file" or mode: "both" is used, rendering requires a Chromium-compatible browser. The docs list discovery through OpenClaw browser config, browser-related environment variables, and platform fallback paths.

Those details matter because diffs often contain sensitive operational information. Even if the viewer is read-only, you still should avoid sending secrets in diff input when not required. You should also set short ttlSeconds values for sensitive reviews and prefer PDF output when a chat channel compresses images too aggressively.

The buyer-grade rule

If your team uses agents to edit production-adjacent files, every meaningful change should have three artifacts: the proposed or actual diff, the verification result, and the final commit or deploy proof. The diff tells you what changed. The verification tells you whether the change still works. The final proof tells you what shipped.

That sounds basic, but it is where many agent workflows fail. They produce a long explanation, a passing vibe, and no durable evidence. Then the operator has to trust a summary instead of reviewing the work.

OpenClaw gives you the pieces to avoid that. Use apply_patch for structured edits, exec for the shell commands and verification that truly need a local or node runtime, and diffs for review artifacts that a human can actually inspect. Keep the authority ladder clear: edits are structured, commands are policy-bound, review artifacts are read-only, and deploys happen only after proof.

That is how agents become useful teammates instead of risky autocomplete with shell access.

Want the complete guide? Get ClawKit — $9.99

Originally published at https://www.openclawplaybook.ai/blog/openclaw-diff-artifacts-agent-review/

Get The OpenClaw Playbook -> https://www.openclawplaybook.ai?utm_source=devto&utm_medium=article&utm_campaign=parasite-seo