DEV Community

Cover image for How to Build Claude Workflows That Run Without You
Hassann
Hassann

Posted on • Originally published at apidog.com

How to Build Claude Workflows That Run Without You

There’s a line going around that sums up where agentic coding is headed: the goal isn’t a better prompt, it’s a workflow that runs without you watching it. Most people use Claude like a chat window: type, wait, read, type again. That works, but it caps your output at one agent you’re actively babysitting. The higher-leverage pattern is different: trigger a workflow, let it execute, verify its own results, and notify a human only when a decision is needed.

Try Apidog today

TL;DR

A Claude workflow that runs without supervision needs five parts:

  1. A precise written spec
  2. Headless execution
  3. A deterministic verification gate
  4. Hard guardrails
  5. A human handoff

Claude Code headless mode (claude -p), the Claude Agent SDK, hooks, and a scheduler like cron or launchd give you the pieces. The risky part is not the agent itself. The risk is running it unattended without gates, limits, and observability.

Why “runs without you” is the real goal

Supervised chat has a hard ceiling: you.

Every iteration waits for a human to read the output and decide what happens next. The model generates in seconds, then idles while you context-switch.

Unattended workflows remove that bottleneck:

trigger -> agent work -> verification gate -> retry or handoff
Enter fullscreen mode Exit fullscreen mode

Once the workflow runs without supervision, you scale by adding workflows instead of typing faster. That is the same shift covered in Claude Code dynamic workflows, where one session fans out into many parallel agents.

But unattended workflows raise the stakes. A supervised agent that makes a bad edit may be caught when you read the diff. An unattended one can keep going. That means the main work shifts from prompt writing to system design: build something bounded, verifiable, and observable.

Anthropic’s article on building effective agents makes the same point: the leverage comes from the environment around the model, not from one clever prompt.

The five parts every unattended workflow needs

1. A precise spec

The agent needs a written definition of done.

Bad:

Fix the API.
Enter fullscreen mode Exit fullscreen mode

Better:

Implement POST /orders.

Requirements:
- Return 201 on valid requests.
- Validate the request body against the OpenAPI schema.
- Return 422 when required fields are missing.
- Return JSON matching the response schema.
- Do not modify contract tests or the OpenAPI file.
Enter fullscreen mode Exit fullscreen mode

The spec should be checked into the repo and loaded at the start of every run.

2. Headless execution

Claude must run without a human at the keyboard. That means non-interactive execution, not a chat UI.

3. A verification gate

The workflow needs a deterministic pass/fail check:

  • Unit tests
  • Integration tests
  • Type checks
  • Linting
  • OpenAPI contract tests
  • JSON schema validation
  • Endpoint health checks

The gate decides whether the task is done. The model does not.

4. Guardrails

Unattended runs need hard limits:

  • Tool allowlists
  • Max iterations
  • Cost caps
  • Sandbox/worktree isolation
  • Protected files
  • Logging
  • Kill switch

5. A handoff

Every run should end with a visible result:

  • Draft PR
  • Slack/Discord/email notification
  • Issue comment
  • Failure alert
  • Log link

Silence is not success.

Claude building blocks

Headless mode with claude -p

Claude Code’s print mode runs a prompt non-interactively and exits. This is the base primitive for unattended workflows.

claude -p "Implement the orders endpoint per spec.md, then run the test suite" \
  --allowedTools "Edit,Write,Bash" \
  --output-format json \
  >> run.log 2>&1
Enter fullscreen mode Exit fullscreen mode

The important flag is --allowedTools.

In the chat UI, you approve actions manually. In headless mode, there is no human approval step, so the allowlist becomes your control boundary.

Start narrow:

--allowedTools "Edit,Write"
Enter fullscreen mode Exit fullscreen mode

Only add shell access when the workflow needs it:

--allowedTools "Edit,Write,Bash"
Enter fullscreen mode Exit fullscreen mode

See the full option set in the Claude Code docs.

Use the Claude Agent SDK for controlled loops

For anything more complex than one shell command, use the Claude Agent SDK.

The SDK lets you drive Claude from code and wrap your own loop around it:

import { query } from "@anthropic-ai/claude-agent-sdk";

const MAX_ITERATIONS = 8;

let feedback = "";
let passed = false;

for (let attempt = 0; attempt < MAX_ITERATIONS; attempt++) {
  for await (const msg of query({
    prompt: `
${task}

Previous verification failures:
${feedback}
`,
    options: {
      allowedTools: ["Edit", "Write", "Bash"],
    },
  })) {
    // Stream or persist agent events here.
    console.log(msg);
  }

  const gate = runVerification();

  if (gate.passed) {
    passed = true;
    break;
  }

  feedback = gate.failures;
}

if (!passed) {
  notifyHuman({
    status: "failed",
    reason: feedback,
  });
}
Enter fullscreen mode Exit fullscreen mode

The structure matters more than the exact implementation:

run agent -> run gate -> feed failures back -> retry -> stop or handoff
Enter fullscreen mode Exit fullscreen mode

If you are choosing between your own loop and a hosted setup, this comparison of managed agents vs the Agent SDK explains when each approach fits.

Use hooks for deterministic guardrails

Hooks run your commands at fixed points in Claude’s lifecycle. They are useful because they do not depend on the model deciding to do the right thing.

For example, run tests after every edit:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "npm test --silent"
          }
        ]
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

Because the hook is plain code, it always fires. The agent cannot choose to skip it.

Use hooks for checks like:

npm test --silent
npm run typecheck
npm run lint
pytest
go test ./...
Enter fullscreen mode Exit fullscreen mode

Trigger runs with cron or launchd

A workflow that runs without you needs a trigger.

On a server, use cron:

# every weekday at 7am: run the maintenance workflow and log everything
0 7 * * 1-5 cd /srv/api && claude -p "$(cat tasks/nightly-maintenance.md)" \
  --allowedTools "Edit,Bash" \
  >> logs/run-$(date +\%F).log 2>&1
Enter fullscreen mode Exit fullscreen mode

That gives you the basic spine:

cron -> headless Claude -> spec -> edits -> gate -> logs -> handoff
Enter fullscreen mode Exit fullscreen mode

For local macOS automation, use launchd instead of cron.

Design the loop, not the prompt

The most useful question is not:

What should I tell Claude?
Enter fullscreen mode Exit fullscreen mode

It is:

What loop makes Claude correct itself?
Enter fullscreen mode Exit fullscreen mode

A coding agent is a fast generator. It does not have a reliable built-in sense of correctness. Your verification gate supplies that signal.

This is the core idea in stop prompting your coding agent, build the loop instead: the model’s confidence does not matter. The gate’s verdict does.

A stable spec also beats a clever prompt. A design.md or AGENTS.md file gives the agent a repeatable target:

- Goal
- Constraints
- Files it may edit
- Files it must not edit
- Definition of done
- Verification command
- Escalation conditions
Enter fullscreen mode Exit fullscreen mode

Worked example: unattended API maintenance

Suppose you want a workflow that keeps API endpoints aligned with an OpenAPI spec, runs every morning, and never ships a broken endpoint.

1. Write the spec

The contract lives in an OpenAPI file. Behavior is covered by tests.

Example task file:

# Nightly API maintenance

## Goal

Keep implementation aligned with openapi.yaml.

## Scope

Only update endpoint implementation files under:

- src/routes
- src/controllers
- src/validators

Do not edit:

- openapi.yaml
- tests
- package.json
- lockfiles

## Definition of done

The run passes:

- npm test
- npm run typecheck
- OpenAPI contract tests

## Failure handling

If the same gate fails after 5 attempts, stop and notify a human.
Enter fullscreen mode Exit fullscreen mode

2. Trigger the workflow

0 7 * * 1-5 cd /srv/api && claude -p "$(cat tasks/nightly-api-maintenance.md)" \
  --allowedTools "Edit,Write,Bash" \
  --output-format json \
  >> logs/api-maintenance-$(date +\%F).log 2>&1
Enter fullscreen mode Exit fullscreen mode

3. Let the agent reconcile implementation

The agent can:

  • Add missing endpoints
  • Fix response shapes
  • Tighten validation
  • Update controller logic
  • Repair schema mismatches

4. Run the verification gate

The workflow runs API tests against the service:

npm test
npm run typecheck
npm run contract:test
Enter fullscreen mode Exit fullscreen mode

Failures should be structured enough to feed back into the next iteration:

Expected 422 on missing customer_id, got 500.
Enter fullscreen mode Exit fullscreen mode
Response field total is a string, schema says number.
Enter fullscreen mode Exit fullscreen mode

5. Loop or escalate

If the gate fails, feed the failure back into the agent:

The verification gate failed.

Failure:
Expected 422 on missing customer_id, got 500.

Patch only the validation path for POST /orders.
Do not edit tests or openapi.yaml.
Enter fullscreen mode Exit fullscreen mode

If the gate passes, open a draft PR.

If the run reaches the iteration cap, stop and notify a human.

6. Handoff

A human should receive one of two outcomes:

Success: draft PR created with passing verification logs.
Enter fullscreen mode Exit fullscreen mode

or:

Failure: workflow stopped after 5 attempts. Last gate failure attached.
Enter fullscreen mode Exit fullscreen mode

The gate is what makes this safe to run unattended. Without it, the agent edits code and reports success based on its own judgment.

For API workflows, Apidog fits well as the verification layer: API design, schemas, mock servers, and automated tests live in one workspace, so the spec and gate stay aligned. You can point the run at an Apidog test scenario and give the agent schema-validated pass/fail feedback on every iteration. The mock server can also stand in for dependencies during unattended runs.

Teams that wire endpoint access through the Apidog AI agent debugger let the agent inspect endpoints in the same way a human tester would. If you prefer a visual gate instead of a hand-rolled runner, download Apidog.

Apidog interface

Guardrails for unattended runs

Use these before you let a workflow run overnight.

Narrow tool allowlists

Do not give unattended agents broad access by default.

Prefer:

--allowedTools "Edit,Write"
Enter fullscreen mode Exit fullscreen mode

Use shell access only when required:

--allowedTools "Edit,Write,Bash"
Enter fullscreen mode Exit fullscreen mode

Avoid unrestricted destructive commands unless the run is isolated.

Bound iterations

A workflow that cannot pass after a few attempts should stop.

const MAX_ITERATIONS = 5;
Enter fullscreen mode Exit fullscreen mode

Do not let loops run forever.

Add a cost ceiling

Unattended loops can burn tokens without anyone noticing.

Track spend per run and stop when the workflow exceeds a limit. The same practices in reducing agent token costs apply directly here.

Protect the gate

Do not let the agent edit:

  • Tests
  • OpenAPI specs
  • Verification scripts
  • CI configuration
  • Approval logic

If the agent can rewrite the test to pass, the gate is not a gate.

Run in a sandbox

Use an isolated workspace:

git worktree add ../api-agent-run feature/agent-maintenance
Enter fullscreen mode Exit fullscreen mode

or a disposable branch/container.

Never let an unattended workflow work directly on main.

Log every run

Capture:

  • Prompt/spec
  • Tool calls
  • Files changed
  • Verification output
  • Iteration count
  • Final status

Example:

mkdir -p logs

claude -p "$(cat tasks/nightly-api-maintenance.md)" \
  --allowedTools "Edit,Write,Bash" \
  --output-format json \
  >> "logs/run-$(date +%F-%H%M%S).log" 2>&1
Enter fullscreen mode Exit fullscreen mode

Keep a kill switch

You need a way to stop a bad run quickly:

pkill -f "claude -p"
Enter fullscreen mode Exit fullscreen mode

For scheduled workflows, also keep the cron or launchd entry easy to disable.

Put humans at the edges

“Without you” does not mean “without review.”

Use humans for:

  • Approving the task before automation starts
  • Reviewing the draft PR after automation finishes
  • Handling escalation when the gate fails repeatedly

Do not put humans inside the inner retry loop unless necessary.

The wiring patterns and failure modes here are similar to the ones covered in agentic workflow tool wiring.

Common mistakes

No verification gate

If the only check is:

Claude, did you finish?
Enter fullscreen mode Exit fullscreen mode

you do not have an autonomous workflow. You have an unsupervised chatbot.

The gate must be external to the model.

One giant task

This usually fails:

Maintain the whole service.
Enter fullscreen mode Exit fullscreen mode

Prefer small, bounded tasks:

Update POST /orders to match openapi.yaml and pass contract tests.
Enter fullscreen mode Exit fullscreen mode

Small workflows converge. Large ones thrash.

Wide-open permissions

This is convenient but dangerous:

--allowedTools "Edit,Write,Bash,Read,WebFetch"
Enter fullscreen mode Exit fullscreen mode

Grant only what the task needs.

Silent success or failure

A workflow should never commit, fail, or stop without telling anyone.

Always emit a handoff:

Draft PR created.
Enter fullscreen mode Exit fullscreen mode

or:

Run failed after 5 attempts. Last gate output attached.
Enter fullscreen mode Exit fullscreen mode

Trusting the model’s self-report

The agent will often say it is done. That is not enough.

Use this rule:

The model proposes. The gate decides.
Enter fullscreen mode Exit fullscreen mode

If you want the deeper architecture, this breakdown of agent harness design shows how the pieces fit at scale.

The takeaway

Claude workflows that run without you are mostly a systems problem.

You need:

  1. A precise spec
  2. Headless execution
  3. A deterministic verification gate
  4. Hard guardrails
  5. A clean handoff

Start with one workflow. Write a tight spec, run Claude headlessly, verify with a fast gate, allowlist the tools, cap the iterations, isolate the workspace, and notify a human on finish or failure.

For API work, your automated tests are the safety gate. Apidog gives you API design, mocking, and automated testing in one workspace, so you can build that gate without hand-rolling every piece. Download it, wire the gate, and let the workflow run while you do something else.

Top comments (0)