There’s a line going around that sums up where agentic coding is headed: the goal isn’t a better prompt, it’s a workflow that runs without you watching it. Most people use Claude like a chat window: type, wait, read, type again. That works, but it caps your output at one agent you’re actively babysitting. The higher-leverage pattern is different: trigger a workflow, let it execute, verify its own results, and notify a human only when a decision is needed.
TL;DR
A Claude workflow that runs without supervision needs five parts:
- A precise written spec
- Headless execution
- A deterministic verification gate
- Hard guardrails
- A human handoff
Claude Code headless mode (claude -p), the Claude Agent SDK, hooks, and a scheduler like cron or launchd give you the pieces. The risky part is not the agent itself. The risk is running it unattended without gates, limits, and observability.
Why “runs without you” is the real goal
Supervised chat has a hard ceiling: you.
Every iteration waits for a human to read the output and decide what happens next. The model generates in seconds, then idles while you context-switch.
Unattended workflows remove that bottleneck:
trigger -> agent work -> verification gate -> retry or handoff
Once the workflow runs without supervision, you scale by adding workflows instead of typing faster. That is the same shift covered in Claude Code dynamic workflows, where one session fans out into many parallel agents.
But unattended workflows raise the stakes. A supervised agent that makes a bad edit may be caught when you read the diff. An unattended one can keep going. That means the main work shifts from prompt writing to system design: build something bounded, verifiable, and observable.
Anthropic’s article on building effective agents makes the same point: the leverage comes from the environment around the model, not from one clever prompt.
The five parts every unattended workflow needs
1. A precise spec
The agent needs a written definition of done.
Bad:
Fix the API.
Better:
Implement POST /orders.
Requirements:
- Return 201 on valid requests.
- Validate the request body against the OpenAPI schema.
- Return 422 when required fields are missing.
- Return JSON matching the response schema.
- Do not modify contract tests or the OpenAPI file.
The spec should be checked into the repo and loaded at the start of every run.
2. Headless execution
Claude must run without a human at the keyboard. That means non-interactive execution, not a chat UI.
3. A verification gate
The workflow needs a deterministic pass/fail check:
- Unit tests
- Integration tests
- Type checks
- Linting
- OpenAPI contract tests
- JSON schema validation
- Endpoint health checks
The gate decides whether the task is done. The model does not.
4. Guardrails
Unattended runs need hard limits:
- Tool allowlists
- Max iterations
- Cost caps
- Sandbox/worktree isolation
- Protected files
- Logging
- Kill switch
5. A handoff
Every run should end with a visible result:
- Draft PR
- Slack/Discord/email notification
- Issue comment
- Failure alert
- Log link
Silence is not success.
Claude building blocks
Headless mode with claude -p
Claude Code’s print mode runs a prompt non-interactively and exits. This is the base primitive for unattended workflows.
claude -p "Implement the orders endpoint per spec.md, then run the test suite" \
--allowedTools "Edit,Write,Bash" \
--output-format json \
>> run.log 2>&1
The important flag is --allowedTools.
In the chat UI, you approve actions manually. In headless mode, there is no human approval step, so the allowlist becomes your control boundary.
Start narrow:
--allowedTools "Edit,Write"
Only add shell access when the workflow needs it:
--allowedTools "Edit,Write,Bash"
See the full option set in the Claude Code docs.
Use the Claude Agent SDK for controlled loops
For anything more complex than one shell command, use the Claude Agent SDK.
The SDK lets you drive Claude from code and wrap your own loop around it:
import { query } from "@anthropic-ai/claude-agent-sdk";
const MAX_ITERATIONS = 8;
let feedback = "";
let passed = false;
for (let attempt = 0; attempt < MAX_ITERATIONS; attempt++) {
for await (const msg of query({
prompt: `
${task}
Previous verification failures:
${feedback}
`,
options: {
allowedTools: ["Edit", "Write", "Bash"],
},
})) {
// Stream or persist agent events here.
console.log(msg);
}
const gate = runVerification();
if (gate.passed) {
passed = true;
break;
}
feedback = gate.failures;
}
if (!passed) {
notifyHuman({
status: "failed",
reason: feedback,
});
}
The structure matters more than the exact implementation:
run agent -> run gate -> feed failures back -> retry -> stop or handoff
If you are choosing between your own loop and a hosted setup, this comparison of managed agents vs the Agent SDK explains when each approach fits.
Use hooks for deterministic guardrails
Hooks run your commands at fixed points in Claude’s lifecycle. They are useful because they do not depend on the model deciding to do the right thing.
For example, run tests after every edit:
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": "npm test --silent"
}
]
}
]
}
}
Because the hook is plain code, it always fires. The agent cannot choose to skip it.
Use hooks for checks like:
npm test --silent
npm run typecheck
npm run lint
pytest
go test ./...
Trigger runs with cron or launchd
A workflow that runs without you needs a trigger.
On a server, use cron:
# every weekday at 7am: run the maintenance workflow and log everything
0 7 * * 1-5 cd /srv/api && claude -p "$(cat tasks/nightly-maintenance.md)" \
--allowedTools "Edit,Bash" \
>> logs/run-$(date +\%F).log 2>&1
That gives you the basic spine:
cron -> headless Claude -> spec -> edits -> gate -> logs -> handoff
For local macOS automation, use launchd instead of cron.
Design the loop, not the prompt
The most useful question is not:
What should I tell Claude?
It is:
What loop makes Claude correct itself?
A coding agent is a fast generator. It does not have a reliable built-in sense of correctness. Your verification gate supplies that signal.
This is the core idea in stop prompting your coding agent, build the loop instead: the model’s confidence does not matter. The gate’s verdict does.
A stable spec also beats a clever prompt. A design.md or AGENTS.md file gives the agent a repeatable target:
- Goal
- Constraints
- Files it may edit
- Files it must not edit
- Definition of done
- Verification command
- Escalation conditions
Worked example: unattended API maintenance
Suppose you want a workflow that keeps API endpoints aligned with an OpenAPI spec, runs every morning, and never ships a broken endpoint.
1. Write the spec
The contract lives in an OpenAPI file. Behavior is covered by tests.
Example task file:
# Nightly API maintenance
## Goal
Keep implementation aligned with openapi.yaml.
## Scope
Only update endpoint implementation files under:
- src/routes
- src/controllers
- src/validators
Do not edit:
- openapi.yaml
- tests
- package.json
- lockfiles
## Definition of done
The run passes:
- npm test
- npm run typecheck
- OpenAPI contract tests
## Failure handling
If the same gate fails after 5 attempts, stop and notify a human.
2. Trigger the workflow
0 7 * * 1-5 cd /srv/api && claude -p "$(cat tasks/nightly-api-maintenance.md)" \
--allowedTools "Edit,Write,Bash" \
--output-format json \
>> logs/api-maintenance-$(date +\%F).log 2>&1
3. Let the agent reconcile implementation
The agent can:
- Add missing endpoints
- Fix response shapes
- Tighten validation
- Update controller logic
- Repair schema mismatches
4. Run the verification gate
The workflow runs API tests against the service:
npm test
npm run typecheck
npm run contract:test
Failures should be structured enough to feed back into the next iteration:
Expected 422 on missing customer_id, got 500.
Response field total is a string, schema says number.
5. Loop or escalate
If the gate fails, feed the failure back into the agent:
The verification gate failed.
Failure:
Expected 422 on missing customer_id, got 500.
Patch only the validation path for POST /orders.
Do not edit tests or openapi.yaml.
If the gate passes, open a draft PR.
If the run reaches the iteration cap, stop and notify a human.
6. Handoff
A human should receive one of two outcomes:
Success: draft PR created with passing verification logs.
or:
Failure: workflow stopped after 5 attempts. Last gate failure attached.
The gate is what makes this safe to run unattended. Without it, the agent edits code and reports success based on its own judgment.
For API workflows, Apidog fits well as the verification layer: API design, schemas, mock servers, and automated tests live in one workspace, so the spec and gate stay aligned. You can point the run at an Apidog test scenario and give the agent schema-validated pass/fail feedback on every iteration. The mock server can also stand in for dependencies during unattended runs.
Teams that wire endpoint access through the Apidog AI agent debugger let the agent inspect endpoints in the same way a human tester would. If you prefer a visual gate instead of a hand-rolled runner, download Apidog.
Guardrails for unattended runs
Use these before you let a workflow run overnight.
Narrow tool allowlists
Do not give unattended agents broad access by default.
Prefer:
--allowedTools "Edit,Write"
Use shell access only when required:
--allowedTools "Edit,Write,Bash"
Avoid unrestricted destructive commands unless the run is isolated.
Bound iterations
A workflow that cannot pass after a few attempts should stop.
const MAX_ITERATIONS = 5;
Do not let loops run forever.
Add a cost ceiling
Unattended loops can burn tokens without anyone noticing.
Track spend per run and stop when the workflow exceeds a limit. The same practices in reducing agent token costs apply directly here.
Protect the gate
Do not let the agent edit:
- Tests
- OpenAPI specs
- Verification scripts
- CI configuration
- Approval logic
If the agent can rewrite the test to pass, the gate is not a gate.
Run in a sandbox
Use an isolated workspace:
git worktree add ../api-agent-run feature/agent-maintenance
or a disposable branch/container.
Never let an unattended workflow work directly on main.
Log every run
Capture:
- Prompt/spec
- Tool calls
- Files changed
- Verification output
- Iteration count
- Final status
Example:
mkdir -p logs
claude -p "$(cat tasks/nightly-api-maintenance.md)" \
--allowedTools "Edit,Write,Bash" \
--output-format json \
>> "logs/run-$(date +%F-%H%M%S).log" 2>&1
Keep a kill switch
You need a way to stop a bad run quickly:
pkill -f "claude -p"
For scheduled workflows, also keep the cron or launchd entry easy to disable.
Put humans at the edges
“Without you” does not mean “without review.”
Use humans for:
- Approving the task before automation starts
- Reviewing the draft PR after automation finishes
- Handling escalation when the gate fails repeatedly
Do not put humans inside the inner retry loop unless necessary.
The wiring patterns and failure modes here are similar to the ones covered in agentic workflow tool wiring.
Common mistakes
No verification gate
If the only check is:
Claude, did you finish?
you do not have an autonomous workflow. You have an unsupervised chatbot.
The gate must be external to the model.
One giant task
This usually fails:
Maintain the whole service.
Prefer small, bounded tasks:
Update POST /orders to match openapi.yaml and pass contract tests.
Small workflows converge. Large ones thrash.
Wide-open permissions
This is convenient but dangerous:
--allowedTools "Edit,Write,Bash,Read,WebFetch"
Grant only what the task needs.
Silent success or failure
A workflow should never commit, fail, or stop without telling anyone.
Always emit a handoff:
Draft PR created.
or:
Run failed after 5 attempts. Last gate output attached.
Trusting the model’s self-report
The agent will often say it is done. That is not enough.
Use this rule:
The model proposes. The gate decides.
If you want the deeper architecture, this breakdown of agent harness design shows how the pieces fit at scale.
The takeaway
Claude workflows that run without you are mostly a systems problem.
You need:
- A precise spec
- Headless execution
- A deterministic verification gate
- Hard guardrails
- A clean handoff
Start with one workflow. Write a tight spec, run Claude headlessly, verify with a fast gate, allowlist the tools, cap the iterations, isolate the workspace, and notify a human on finish or failure.
For API work, your automated tests are the safety gate. Apidog gives you API design, mocking, and automated testing in one workspace, so you can build that gate without hand-rolling every piece. Download it, wire the gate, and let the workflow run while you do something else.

Top comments (0)