Hassann

Posted on Jun 8 • Originally published at apidog.com

How to Build Claude Workflows That Run Without You

There’s a line going around that sums up where agentic coding is headed: the goal isn’t a better prompt, it’s a workflow that runs without you watching it. Most people use Claude like a chat window: type, wait, read, type again. That works, but it caps your output at one agent you’re actively babysitting. The higher-leverage pattern is different: trigger a workflow, let it execute, verify its own results, and notify a human only when a decision is needed.

Try Apidog today

TL;DR

A Claude workflow that runs without supervision needs five parts:

A precise written spec
Headless execution
A deterministic verification gate
Hard guardrails
A human handoff

Claude Code headless mode (claude -p), the Claude Agent SDK, hooks, and a scheduler like cron or launchd give you the pieces. The risky part is not the agent itself. The risk is running it unattended without gates, limits, and observability.

Why “runs without you” is the real goal

Supervised chat has a hard ceiling: you.

Every iteration waits for a human to read the output and decide what happens next. The model generates in seconds, then idles while you context-switch.

Unattended workflows remove that bottleneck:

trigger -> agent work -> verification gate -> retry or handoff

Once the workflow runs without supervision, you scale by adding workflows instead of typing faster. That is the same shift covered in Claude Code dynamic workflows, where one session fans out into many parallel agents.

But unattended workflows raise the stakes. A supervised agent that makes a bad edit may be caught when you read the diff. An unattended one can keep going. That means the main work shifts from prompt writing to system design: build something bounded, verifiable, and observable.

Anthropic’s article on building effective agents makes the same point: the leverage comes from the environment around the model, not from one clever prompt.

The five parts every unattended workflow needs

1. A precise spec

The agent needs a written definition of done.

Bad:

Fix the API.

Better:

Implement POST /orders.

Requirements:
- Return 201 on valid requests.
- Validate the request body against the OpenAPI schema.
- Return 422 when required fields are missing.
- Return JSON matching the response schema.
- Do not modify contract tests or the OpenAPI file.

The spec should be checked into the repo and loaded at the start of every run.

2. Headless execution

Claude must run without a human at the keyboard. That means non-interactive execution, not a chat UI.

3. A verification gate

The workflow needs a deterministic pass/fail check:

Unit tests
Integration tests
Type checks
Linting
OpenAPI contract tests
JSON schema validation
Endpoint health checks

The gate decides whether the task is done. The model does not.

4. Guardrails

Unattended runs need hard limits:

Tool allowlists
Max iterations
Cost caps
Sandbox/worktree isolation
Protected files
Logging
Kill switch

5. A handoff

Every run should end with a visible result:

Draft PR
Slack/Discord/email notification
Issue comment
Failure alert
Log link

Silence is not success.

Claude building blocks

Headless mode with `claude -p`

Claude Code’s print mode runs a prompt non-interactively and exits. This is the base primitive for unattended workflows.

claude -p "Implement the orders endpoint per spec.md, then run the test suite" \
  --allowedTools "Edit,Write,Bash" \
  --output-format json \
  >> run.log 2>&1

The important flag is --allowedTools.

In the chat UI, you approve actions manually. In headless mode, there is no human approval step, so the allowlist becomes your control boundary.

Start narrow:

--allowedTools "Edit,Write"

Only add shell access when the workflow needs it:

--allowedTools "Edit,Write,Bash"

See the full option set in the Claude Code docs.

Use the Claude Agent SDK for controlled loops

For anything more complex than one shell command, use the Claude Agent SDK.

The SDK lets you drive Claude from code and wrap your own loop around it:

import { query } from "@anthropic-ai/claude-agent-sdk";

const MAX_ITERATIONS = 8;

let feedback = "";
let passed = false;

for (let attempt = 0; attempt < MAX_ITERATIONS; attempt++) {
  for await (const msg of query({
    prompt: `
${task}

Previous verification failures:
${feedback}
`,
    options: {
      allowedTools: ["Edit", "Write", "Bash"],
    },
  })) {
    // Stream or persist agent events here.
    console.log(msg);
  }

  const gate = runVerification();

  if (gate.passed) {
    passed = true;
    break;
  }

  feedback = gate.failures;
}

if (!passed) {
  notifyHuman({
    status: "failed",
    reason: feedback,
  });
}

The structure matters more than the exact implementation:

run agent -> run gate -> feed failures back -> retry -> stop or handoff

If you are choosing between your own loop and a hosted setup, this comparison of managed agents vs the Agent SDK explains when each approach fits.

Use hooks for deterministic guardrails

Hooks run your commands at fixed points in Claude’s lifecycle. They are useful because they do not depend on the model deciding to do the right thing.

For example, run tests after every edit:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "npm test --silent"
          }
        ]
      }
    ]
  }
}

Because the hook is plain code, it always fires. The agent cannot choose to skip it.

Use hooks for checks like:

npm test --silent
npm run typecheck
npm run lint
pytest
go test ./...

Trigger runs with cron or launchd

A workflow that runs without you needs a trigger.

On a server, use cron:

# every weekday at 7am: run the maintenance workflow and log everything
0 7 * * 1-5 cd /srv/api && claude -p "$(cat tasks/nightly-maintenance.md)" \
  --allowedTools "Edit,Bash" \
  >> logs/run-$(date +\%F).log 2>&1

That gives you the basic spine:

cron -> headless Claude -> spec -> edits -> gate -> logs -> handoff

For local macOS automation, use launchd instead of cron.

Design the loop, not the prompt

The most useful question is not:

What should I tell Claude?

It is:

What loop makes Claude correct itself?

A coding agent is a fast generator. It does not have a reliable built-in sense of correctness. Your verification gate supplies that signal.

This is the core idea in stop prompting your coding agent, build the loop instead: the model’s confidence does not matter. The gate’s verdict does.

A stable spec also beats a clever prompt. A design.md or AGENTS.md file gives the agent a repeatable target:

- Goal
- Constraints
- Files it may edit
- Files it must not edit
- Definition of done
- Verification command
- Escalation conditions

Worked example: unattended API maintenance

Suppose you want a workflow that keeps API endpoints aligned with an OpenAPI spec, runs every morning, and never ships a broken endpoint.

1. Write the spec

The contract lives in an OpenAPI file. Behavior is covered by tests.

Example task file:

# Nightly API maintenance

## Goal

Keep implementation aligned with openapi.yaml.

## Scope

Only update endpoint implementation files under:

- src/routes
- src/controllers
- src/validators

Do not edit:

- openapi.yaml
- tests
- package.json
- lockfiles

## Definition of done

The run passes:

- npm test
- npm run typecheck
- OpenAPI contract tests

## Failure handling

If the same gate fails after 5 attempts, stop and notify a human.

2. Trigger the workflow

0 7 * * 1-5 cd /srv/api && claude -p "$(cat tasks/nightly-api-maintenance.md)" \
  --allowedTools "Edit,Write,Bash" \
  --output-format json \
  >> logs/api-maintenance-$(date +\%F).log 2>&1

3. Let the agent reconcile implementation

The agent can:

Add missing endpoints
Fix response shapes
Tighten validation
Update controller logic
Repair schema mismatches

4. Run the verification gate

The workflow runs API tests against the service:

npm test
npm run typecheck
npm run contract:test

Failures should be structured enough to feed back into the next iteration:

Expected 422 on missing customer_id, got 500.

Response field total is a string, schema says number.

5. Loop or escalate

If the gate fails, feed the failure back into the agent:

The verification gate failed.

Failure:
Expected 422 on missing customer_id, got 500.

Patch only the validation path for POST /orders.
Do not edit tests or openapi.yaml.

If the gate passes, open a draft PR.

If the run reaches the iteration cap, stop and notify a human.

6. Handoff

A human should receive one of two outcomes:

Success: draft PR created with passing verification logs.

or:

Failure: workflow stopped after 5 attempts. Last gate failure attached.

The gate is what makes this safe to run unattended. Without it, the agent edits code and reports success based on its own judgment.

For API workflows, Apidog fits well as the verification layer: API design, schemas, mock servers, and automated tests live in one workspace, so the spec and gate stay aligned. You can point the run at an Apidog test scenario and give the agent schema-validated pass/fail feedback on every iteration. The mock server can also stand in for dependencies during unattended runs.

Teams that wire endpoint access through the Apidog AI agent debugger let the agent inspect endpoints in the same way a human tester would. If you prefer a visual gate instead of a hand-rolled runner, download Apidog.

Guardrails for unattended runs

Use these before you let a workflow run overnight.

Narrow tool allowlists

Do not give unattended agents broad access by default.

Prefer:

--allowedTools "Edit,Write"

Use shell access only when required:

--allowedTools "Edit,Write,Bash"

Avoid unrestricted destructive commands unless the run is isolated.

Bound iterations

A workflow that cannot pass after a few attempts should stop.

const MAX_ITERATIONS = 5;

Do not let loops run forever.

Add a cost ceiling

Unattended loops can burn tokens without anyone noticing.

Track spend per run and stop when the workflow exceeds a limit. The same practices in reducing agent token costs apply directly here.

Protect the gate

Do not let the agent edit:

Tests
OpenAPI specs
Verification scripts
CI configuration
Approval logic

If the agent can rewrite the test to pass, the gate is not a gate.

Run in a sandbox

Use an isolated workspace:

git worktree add ../api-agent-run feature/agent-maintenance

or a disposable branch/container.

Never let an unattended workflow work directly on main.

Log every run

Capture:

Prompt/spec
Tool calls
Files changed
Verification output
Iteration count
Final status

Example:

mkdir -p logs

claude -p "$(cat tasks/nightly-api-maintenance.md)" \
  --allowedTools "Edit,Write,Bash" \
  --output-format json \
  >> "logs/run-$(date +%F-%H%M%S).log" 2>&1

Keep a kill switch

You need a way to stop a bad run quickly:

pkill -f "claude -p"

For scheduled workflows, also keep the cron or launchd entry easy to disable.

Put humans at the edges

“Without you” does not mean “without review.”

Use humans for:

Approving the task before automation starts
Reviewing the draft PR after automation finishes
Handling escalation when the gate fails repeatedly

Do not put humans inside the inner retry loop unless necessary.

The wiring patterns and failure modes here are similar to the ones covered in agentic workflow tool wiring.

Common mistakes

No verification gate

If the only check is:

Claude, did you finish?

you do not have an autonomous workflow. You have an unsupervised chatbot.

The gate must be external to the model.

One giant task

This usually fails:

Maintain the whole service.

Prefer small, bounded tasks:

Update POST /orders to match openapi.yaml and pass contract tests.

Small workflows converge. Large ones thrash.

Wide-open permissions

This is convenient but dangerous:

--allowedTools "Edit,Write,Bash,Read,WebFetch"

Grant only what the task needs.

Silent success or failure

A workflow should never commit, fail, or stop without telling anyone.

Always emit a handoff:

Draft PR created.

or:

Run failed after 5 attempts. Last gate output attached.

Trusting the model’s self-report

The agent will often say it is done. That is not enough.

Use this rule:

The model proposes. The gate decides.

If you want the deeper architecture, this breakdown of agent harness design shows how the pieces fit at scale.

The takeaway

Claude workflows that run without you are mostly a systems problem.

You need:

A precise spec
Headless execution
A deterministic verification gate
Hard guardrails
A clean handoff

Start with one workflow. Write a tight spec, run Claude headlessly, verify with a fast gate, allowlist the tools, cap the iterations, isolate the workspace, and notify a human on finish or failure.

For API work, your automated tests are the safety gate. Apidog gives you API design, mocking, and automated testing in one workspace, so you can build that gate without hand-rolling every piece. Download it, wire the gate, and let the workflow run while you do something else.

DEV Community

How to Build Claude Workflows That Run Without You

TL;DR

Why “runs without you” is the real goal

The five parts every unattended workflow needs

1. A precise spec

2. Headless execution

3. A verification gate

4. Guardrails

5. A handoff

Claude building blocks

Headless mode with `claude -p`

Use the Claude Agent SDK for controlled loops

Use hooks for deterministic guardrails

Trigger runs with cron or launchd

Design the loop, not the prompt

Worked example: unattended API maintenance

1. Write the spec

2. Trigger the workflow

3. Let the agent reconcile implementation

4. Run the verification gate

5. Loop or escalate

6. Handoff

Guardrails for unattended runs

Narrow tool allowlists

Bound iterations

Add a cost ceiling

Protect the gate

Run in a sandbox

Log every run

Keep a kill switch

Put humans at the edges

Common mistakes

No verification gate

One giant task

Wide-open permissions

Silent success or failure

Trusting the model’s self-report

The takeaway

Top comments (0)

TL;DR

Why “runs without you” is the real goal

The five parts every unattended workflow needs

1. A precise spec

2. Headless execution

3. A verification gate

4. Guardrails

5. A handoff

Claude building blocks

Headless mode with claude -p

Use the Claude Agent SDK for controlled loops

Use hooks for deterministic guardrails

Trigger runs with cron or launchd

Design the loop, not the prompt

Worked example: unattended API maintenance

1. Write the spec

2. Trigger the workflow

3. Let the agent reconcile implementation

4. Run the verification gate

5. Loop or escalate

6. Handoff

Guardrails for unattended runs

Narrow tool allowlists

Bound iterations

Add a cost ceiling

Protect the gate

Run in a sandbox

Log every run

Keep a kill switch

Put humans at the edges

Common mistakes

No verification gate

One giant task

Wide-open permissions

Silent success or failure

Trusting the model’s self-report

The takeaway

Headless mode with `claude -p`