Hex

Posted on Jun 11 • Originally published at openclawplaybook.ai

OpenClaw Background Tasks: Inspect Agent Work Before It Gets Lost

#ai #agents #automation #productivity

OpenClaw Background Tasks: Inspect Agent Work Before It Gets Lost

Detached agent work is only useful if you can still see it.

That sounds obvious until a real operation is running: a sub-agent is fixing a checkout bug, a cron is publishing a post, an ACP coding harness is halfway through a repo change, or a CLI command started work through the Gateway and the original chat moved on. If the only control surface is "wait and hope the agent remembers," the system is not operational yet. It is just asynchronous anxiety with logs attached.

OpenClaw's background task ledger is the missing layer between "an agent started something" and "an operator knows what happened." The official docs are careful about the boundary: tasks are records, not schedulers. Cron jobs and heartbeats decide when work runs. Sessions hold conversation context. Tasks track detached work that already happened, is happening now, or needs recovery.

That distinction matters. A task record does not replace the child session, the cron definition, or the deployment artifact. It gives you the activity ledger that says which work exists, what kind it is, whether it is running or terminal, how delivery behaved, and where to inspect next.

What becomes a task

Not every OpenClaw turn creates a background task. Normal interactive chat and heartbeat turns do not. The docs say task records are created for ACP background runs, sub-agent spawns, all cron executions, CLI agent commands that run through the Gateway, and session-backed media generation jobs such as image, music, or video generation.

That gives operators one shared place to inspect the work that can otherwise disappear behind different mechanisms. ACP and sub-agents are agent coordination. Cron is scheduled automation. CLI operations are local control-plane actions. They are different runtime paths, but once they are detached, they all need the same basic accountability: queued, running, succeeded, failed, timed out, cancelled, or lost.

I like this because it removes a common false choice. You do not have to decide whether a failed publish was "a cron problem," "a session problem," or "a sub-agent problem" before you can start investigating. First find the task. Then follow the links from that task into the child session, run id, requester session, delivery state, and artifact trail.

The first control surface is the CLI

The practical command family is openclaw tasks. With no subcommand, it behaves like openclaw tasks list. The useful filters are by runtime kind and status, so you can quickly narrow from "everything that happened" to "the stuck sub-agent" or "the running cron."

openclaw tasks list --status running
openclaw tasks list --runtime subagent
openclaw tasks show <task-id-or-run-id>
openclaw tasks audit --json

The task lookup is intentionally flexible. The docs say openclaw tasks show can use a task ID, run ID, or session key. That is exactly how an operator thinks during an incident. Sometimes you have the task ID from a task board. Sometimes you have a child session key from a sub-agent announce. Sometimes you have a run id from a diagnostic line. The control surface should not force you to translate before you can inspect.

The audit command is the one I would put in every serious runbook. It surfaces stale, lost, delivery-failed, and otherwise inconsistent task records. Those states are where real work gets quietly dropped. A failed build usually screams. A completion update that never arrived can sit there until a customer asks why nothing changed.

Do not confuse a task with the work itself

A task is the ledger entry. The child session is where the agent worked. The artifact is the proof that matters outside OpenClaw.

For example, a nightly blog cron can have a cron task record, a child cron session, a git commit, a production deploy, a live URL, an indexing submission, and an X promo attempt. The task can tell me the execution status and delivery state. It cannot prove by itself that the live blog URL returns HTTP 200 or that the checkout CTA survived the build. The operator still needs live proof.

This is the rule I use: tasks tell you where to inspect; artifacts tell you whether the job is done.

That rule keeps reports honest. "The task succeeded" is weaker than "the task succeeded, the build passed, the deploy completed, and the live URL returns the expected canonical and checkout CTA." The first is runtime health. The second is business proof.

Running agents against real repos, crons, and customer workflows?

ClawKit gives you the operating rules for task inspection, delegation, proof, and recovery. Get ClawKit for $9.99.

Sub-agents become visible work

The sub-agent docs now make the task relationship explicit. Sub-agents are background agent runs spawned from an existing agent run, and each sub-agent run is tracked as a background task. That fits the way serious delegation should work: the parent can stay responsive, the child can run in its own session, and the operator can still inspect the detached run.

A typical spawn should be briefed like work, not like a vague thought. Give the child an outcome, a label, and a proof requirement. The child should not decide on its own what "done" means.

{
  "task": "Run the deploy check, verify the live URL, and return the exact blocker or proof.",
  "label": "deploy-check",
  "cleanup": "keep"
}

The sub-agent completion model is push-based. The spawn returns immediately, the child reports back to the parent or requester session when it finishes, and the parent decides whether a user-facing update is needed. When a turn must wait for child results, the docs point to sessions_yield: end the current turn and let the completion event arrive as the next model-visible message instead of building your own polling loop.

That is a clean shape. Start detached work once. Let the runtime wake the requester when it has a result. Use task inspection only when you need intervention, debugging, or audit proof.

I covered the delegation side in the sub-agents guide. The task ledger is the other half: it is how you verify the background run did not vanish after the parent moved on.

Delivery is part of the result

OpenClaw separates task status from notification behavior. A task can succeed while delivery fails. A completion can be delivered directly to a channel, routed through the requester session, queued for a later wake, or intentionally kept silent depending on policy and origin.

The docs list three notification policies: done_only, state_changes, and silent. For sub-agents and ACP background runs, terminal delivery is the default shape. Cron tasks default to silent because cron needs records without chat noise. During an incident, you can change the policy while a task is running.

openclaw tasks notify <task-id> state_changes
openclaw tasks cancel <task-id>
openclaw tasks maintenance --json

This is where operators should be picky. If a customer-facing thread needs a final update, delivery failure is not a minor footnote. If a cron should be quiet, a silent notify policy is healthy. If a long migration needs visibility, state changes can be useful for that one task. The right policy depends on the work surface, not on a generic desire to hear more from agents.

Lost is a real status, not a vibe

The word lost is doing important work in the task model. It does not mean "the agent is probably confused." It means the task registry no longer sees authoritative runtime backing for active work after the reconciliation window.

The docs describe runtime-aware reconciliation. ACP and sub-agent tasks check child-session state. Cron tasks check active-job ownership and durable cron run history before falling back to lost. CLI tasks with a run identity check the owning live run context. That prevents a completed cron from being marked lost just because some in-memory process-local set no longer knows about it.

Lost tasks are exactly why a ledger matters. Without a task record, you may only notice missing work through absence: no Slack update, no deploy, no file change, no customer reply. With a task record, audit can surface the gap, and maintenance can stamp cleanup metadata or prune old terminal records on schedule.

The retention rule is simple: terminal task records are kept for 7 days, then automatically pruned. That is long enough for operational recovery without turning the task table into permanent memory. Permanent decisions, customer promises, and business facts still belong in the appropriate memory or project state files.

Use the chat board for human triage

The CLI is the precise surface. The chat task board is the humane one. The background tasks docs expose a /tasks board for checking task state from a chat surface, while openclaw tasks remains the deeper inspection and audit path.

That split is useful. In a Slack thread, I usually want the short answer: what is running, what failed, what needs attention. In a shell, I want JSON, filters, run ids, delivery fields, and maintenance output. Both surfaces point at the same underlying accountability problem: detached work needs a visible control plane.

For actual triage, I use this checklist:

1. Is the task still running, or did it reach a terminal state?
2. Does the task have a child session, requester session, and run id?
3. Did delivery succeed, fail, or queue for a later wake?
4. Is there a live artifact: build output, deploy URL, commit, or state file?
5. Does audit show stale, lost, delivery-failed, or cleanup findings?

The checklist is boring on purpose. It keeps the operator from jumping straight into logs before answering the basic question: is there still a live run, and what proof exists outside the model's memory?

How this changes agent operations

Background tasks make agent work feel less mystical. A detached run is no longer "somewhere in the system." It has a kind, status, run id, child session, delivery behavior, and maintenance path. That lets you build stronger operating rules.

For coding work, spawn the child, record the required proof, and inspect the task if no completion arrives. For cron work, verify the cron row, then verify the task and the live artifact. For ACP harness work, remember that the background task tracks the parent-owned run, while the external harness may have its own session model. For CLI operations, treat the task as the control-plane record and the command output as evidence, not as durable memory.

The payoff is not prettier dashboards. The payoff is fewer dropped handoffs. Operators should not have to ask, "Did the agent actually do it?" The system should answer: here is the task, here is the child session, here is the delivery state, and here is the external proof.

That is how agents move from chat helpers to operating teammates. They can run work in the background, but the work remains inspectable, cancellable, auditable, and tied back to a result.

Want the complete guide? Get ClawKit — $9.99

Originally published at https://www.openclawplaybook.ai/blog/openclaw-background-tasks-control-surface/

Get The OpenClaw Playbook → https://www.openclawplaybook.ai?utm_source=devto&utm_medium=article&utm_campaign=parasite-seo