DEV Community

Cover image for The /goal Command: How to Run Codex and Claude Code as 24/7 Autonomous Agents
Hassann
Hassann

Posted on • Originally published at apidog.com

The /goal Command: How to Run Codex and Claude Code as 24/7 Autonomous Agents

Every major AI lab recently shipped the same primitive: /goal. Anthropic added it to Claude Code, OpenAI shipped it in Codex CLI and the Codex desktop app, and Nous Research wired it into Hermes. The naming is deliberate: the industry is converging on a shared interface for agents that run in a closed loop until a measurable end state is reached, without asking for approval at every step.

Try Apidog today

If you have been doing the manual “approve, send prompt, tell the agent to continue, repeat” workflow, /goal is the slash command that removes the babysitting. You give the agent a target, it works against that target, and it returns when the target is reached or when it hits a constraint.

This guide is for developers and API builders. You will learn what /goal does under the hood, how to set it up in Codex and Claude Code, how to write goal prompts that actually converge, and how to connect the workflow to API development with Apidog.

You can download Apidog for free if you want to follow along with the API examples.

What /goal actually does

In one line:

/goal lets an AI agent loop on a task until a stop condition is met.
Enter fullscreen mode Exit fullscreen mode

Instead of stopping after one response, the agent repeatedly:

  1. Plans the next step.
  2. Executes the step.
  3. Checks whether the goal has been met.
  4. Continues or stops based on that check.

The important part is the validator. A smaller, faster model evaluates the agent’s work after each step and answers:

Has the goal been met?
Enter fullscreen mode Exit fullscreen mode

If the answer is no, the main model continues. If yes, the loop closes and the agent reports back.

This is the same pattern popularized by early “Ralph loop” workflows, but now it is built into official tools as a first-class command.

The difference from normal agent usage is simple:

  • Without /goal: you are the loop. You read output, decide whether it is correct, prompt the next step, approve tool calls, and repeat.
  • With /goal: the agent owns the loop. It plans, executes, validates, and only surfaces when it finishes, hits a constraint, or runs out of budget.

Example:

/goal create a landing page until it builds successfully and renders without console errors
Enter fullscreen mode Exit fullscreen mode

That goal can trigger scaffolding, styling, debugging, browser verification, and a final summary in one continuous run.

Why /goal is showing up everywhere

Long-running agent tasks usually fail in two ways:

  1. Drift

    The model starts with the right task, then gradually wanders away from the original target.

  2. Babysitting

    The model can do the work, but the user still has to supervise every iteration.

A validator model helps with both. It gives the loop a stop condition and continuously checks the work against the original target.

The implementation is relatively simple:

main model -> action -> validator -> continue or stop
Enter fullscreen mode Exit fullscreen mode

That pattern is cheap enough to run because the validator can be a smaller model with a narrow prompt. Once vendors saw the pattern working, they shipped it under the same command name.

Setting up /goal in Codex

The Codex CLI gives you the most control.

1. Enable goals in Codex desktop

Open Codex desktop and go to:

Settings → Configuration
Enter fullscreen mode Exit fullscreen mode

Set:

goals = true
Enter fullscreen mode Exit fullscreen mode

The CLI inherits this configuration.

2. Launch Codex in full-auto mode

Use full-auto mode if you want to avoid approval prompts during the loop:

codex --approval-mode full-auto
Enter fullscreen mode Exit fullscreen mode

3. Set a goal

Inside the Codex session, run:

/goal [your goal here]
Enter fullscreen mode Exit fullscreen mode

Example:

/goal fix every failing test until npm test exits 0 without modifying files outside the /auth directory
Enter fullscreen mode Exit fullscreen mode

Codex will confirm that the goal is registered, then start running.

Codex /goal setup

If you do not want to work in the terminal, use the Codex desktop app. The functionality is similar, but the UI makes it easier to pause, clear goals, and monitor token usage.

Setting up /goal in Claude Code

Claude Code works similarly.

Launch the CLI and run:

/goal [task description]
Enter fullscreen mode Exit fullscreen mode

Example:

/goal update the checkout API tests until every payment scenario passes without changing production credentials
Enter fullscreen mode Exit fullscreen mode

The official docs are available on the Claude Code documentation site.

Claude Code /goal

If Claude Code fails during setup, this guide to fixing the invalid custom3p enterprise config error covers a common failure mode.

For multi-agent workflows alongside /goal, see this breakdown of Ruflo, a multi-agent layer on top of Claude Code.

One practical tip: Claude Code shows a live token count and progress bar while a goal is running. Watch the token count. If the agent burns tokens without visible progress, the validator may not be converging. In that case, use:

/pause
Enter fullscreen mode Exit fullscreen mode

or:

/goal clear
Enter fullscreen mode Exit fullscreen mode

Then rewrite the goal with clearer success criteria.

The prompt structure that works

The syntax is easy. The hard part is writing a goal that can be validated.

A useful /goal prompt has three parts:

  1. Work: what should be done.
  2. Measurable end state: what “done” means.
  3. Constraints: rules the agent must not violate.

Use this structure:

/goal [do the work] until [measurable end state] without [constraints]
Enter fullscreen mode Exit fullscreen mode

Example:

/goal fix every failing test until npm test exits 0 without modifying any file outside the /auth directory
Enter fullscreen mode Exit fullscreen mode

Why this works:

  • npm test exits 0 is measurable.
  • without modifying any file outside /auth is enforceable.
  • The agent cannot claim success unless the validator can verify it.

Avoid vague goals like:

/goal make this UI feel modern
Enter fullscreen mode Exit fullscreen mode

That has no reliable stop condition.

Rewrite it as:

/goal improve the dashboard UI until Lighthouse accessibility score is 90+ and all existing Playwright visual checks pass
Enter fullscreen mode Exit fullscreen mode

That gives the validator something concrete to check.

A stronger format for long-running goals

For larger tasks, use a block format:

/goal
Objective: [one-line goal]

Success criteria:
  - [measurable criterion 1]
  - [measurable criterion 2]
  - [measurable criterion 3]

Constraints:
  - [boundary 1]
  - [boundary 2]

Context:
  - [files, repos, commands, API specs, or keys the agent should know about]
Enter fullscreen mode Exit fullscreen mode

Example:

/goal
Objective: Implement the password reset API flow.

Success criteria:
  - POST /auth/password-reset returns 200 for a valid registered email.
  - Invalid emails return the documented error schema.
  - npm test exits 0.
  - The OpenAPI spec remains valid.

Constraints:
  - Do not modify files outside /auth and /tests/auth.
  - Do not log tokens or reset links.
  - Do not call production email services.

Context:
  - OpenAPI spec: ./openapi.yaml
  - Test command: npm test -- auth
  - Mock email service is available in ./tests/mocks/email.ts
Enter fullscreen mode Exit fullscreen mode

This format works because each loop iteration can be checked against explicit criteria.

/goal examples you can reuse

Research

/goal collect every public benchmark for Claude Opus 4.7 published since April 2026, save sources, and produce a markdown table sorted by date until the table covers at least 10 distinct benchmarks
Enter fullscreen mode Exit fullscreen mode

Repo maintenance

/goal find dead code, unused dependencies, and stale files in this repo, then propose a PR description listing safe removals until every item has a justification
Enter fullscreen mode Exit fullscreen mode

Documentation

/goal rewrite README.md so a new contributor can install, run, test, and understand the project until each of those four steps has a working command and an expected output
Enter fullscreen mode Exit fullscreen mode

Feature work

/goal add a dark/light theme toggle, persist the choice in localStorage, update styles for both themes, and verify in the browser until the toggle works without a page reload and survives a refresh
Enter fullscreen mode Exit fullscreen mode

The pattern is the same in every example: define a verifiable end state.

Pairing /goal with API development

API work is a strong fit for /goal because “done” is usually testable.

For an endpoint, completion can mean:

  • The request returns the expected status code.
  • The response schema matches the contract.
  • Required error cases are covered.
  • The OpenAPI document is valid.
  • The test suite passes.

That gives the validator a concrete signal.

A practical workflow looks like this:

1. Design the contract first

Define the endpoint, request schema, response schema, and example payloads in Apidog.

This contract becomes the source of truth.

2. Export the API spec

Export the OpenAPI 3.x spec from Apidog and add it to your repo, for example:

./openapi.yaml
Enter fullscreen mode Exit fullscreen mode

Give this file to Codex or Claude Code as context.

3. Run /goal

Use a goal that points at the contract and tests:

/goal implement POST /users from ./openapi.yaml until every Apidog test case passes and npm test exits 0 without changing the OpenAPI contract
Enter fullscreen mode Exit fullscreen mode

4. Let the validator run the test command

Each iteration should run the API test suite against the local service. The agent should only finish when the tests are green.

This is better than letting the agent invent tests after the fact. The contract already exists, so the agent has to implement against it instead of redefining success.

If you are new to this workflow, start with the design-first API workflow guide. For test structure, see the API testing tool overview for QA engineers.

If you are working with MCP servers, the same idea applies. See MCP server testing with Apidog for a setup that lets /goal agents safely run against a local MCP server.

Production tips for /goal

After using /goal on real work, a few practices help.

Use one goal at a time

Codex and Claude Code are designed around a single active goal.

Before starting a new one, clear the current goal:

/goal clear
Enter fullscreen mode Exit fullscreen mode

Plan before running

Use a planning step first:

/plan implement the password reset API based on ./openapi.yaml
Enter fullscreen mode Exit fullscreen mode

Review the plan, then turn it into a goal:

/goal implement the reviewed password reset plan until all auth tests pass without modifying files outside /auth
Enter fullscreen mode Exit fullscreen mode

This reduces wasted iterations because the agent is less likely to redesign the solution mid-loop.

Use a scratchpad file

Ask the agent to maintain a progress file:

Maintain progress.md with:
- completed steps
- current blocker
- next action
- test results
Enter fullscreen mode Exit fullscreen mode

This gives you an audit trail and gives the agent persistent context across iterations.

Let the model write the goal

If your initial idea is rough, ask the model to convert it into a good /goal prompt:

Turn this task into a /goal command with measurable success criteria and strict constraints:
[task]
Enter fullscreen mode Exit fullscreen mode

This often produces better stop conditions than writing the goal manually.

Watch the validator signal

If the loop does not close, do not keep retrying the same goal. Tighten the success criteria.

Bad:

until the code is good
Enter fullscreen mode Exit fullscreen mode

Better:

until npm test exits 0, npm run lint exits 0, and the response matches ./openapi.yaml
Enter fullscreen mode Exit fullscreen mode

Use /goal for long-horizon work

For a one-line refactor, a normal prompt is usually faster. /goal has overhead because it runs a loop and validation steps.

Use it when the task needs multiple iterations.

When /goal will let you down

/goal is useful, but it is not magic.

Cost

Long-running loops burn tokens. The validator may be cheaper than the main model, but the main model is still doing the work.

Set a budget, monitor usage, and pause when needed.

Tasks with weak validation

Some tasks do not have clean stop conditions:

  • UX polish
  • visual taste
  • prose tone
  • “make it better”
  • subjective design feedback

For these, use normal prompts or define a measurable proxy.

External side effects

Be careful with goals that can trigger real-world actions:

  • sending email
  • charging cards
  • calling production APIs
  • deleting data
  • modifying customer records

Add hard constraints:

without calling production APIs
without sending real emails
without using live payment credentials
without deleting user data
Enter fullscreen mode Exit fullscreen mode

If you are designing access control around AI agents calling APIs, this GitHub Copilot usage and billing API for teams writeup shows how major vendors approach usage and control.

Stale context

Long-running goals can drift if the repo changes while the agent is working. If the codebase changes mid-loop, pause and restart with fresh context.

What this changes for AI-assisted development

/goal shifts the developer workflow from:

write every instruction manually
Enter fullscreen mode Exit fullscreen mode

to:

define success criteria, constraints, and validation commands
Enter fullscreen mode Exit fullscreen mode

The command itself is small, but the workflow change is large.

Teams that benefit most already have:

  • clear specs
  • strong CI
  • executable tests
  • API contracts
  • deterministic validation commands

If your API has an OpenAPI document and a test suite, a /goal agent can work against a real target. If the API only exists in someone’s head, the agent has no reliable validator.

That is why API platforms become important in AI workflows. Apidog supports design-first API development, which gives agents a contract to implement against and tests to run.

If you want to try the workflow, download Apidog and set up a contract-first API project before running /goal.

FAQ

Does /goal work in the Codex web app?

Yes. It works in Codex CLI, Codex desktop, the Codex app, and Claude Code CLI. Hermes also supports the same command.

How is /goal different from a regular prompt?

A regular prompt runs once and stops.

/goal runs in a loop. A validator model checks the stop condition after every step, and the agent decides whether to continue or finish.

Can the agent break out of the constraints I set?

The validator checks constraints on each iteration, but constraint quality matters.

Use explicit constraints:

without modifying any file outside /auth
Enter fullscreen mode Exit fullscreen mode

Avoid vague constraints:

without breaking anything
Enter fullscreen mode Exit fullscreen mode

The first is enforceable. The second is not.

Will /goal cost more than a normal Claude or Codex session?

Usually, yes. The loop runs multiple iterations, and the main model continues working until the goal is met or stopped.

Use budgets, token monitoring, and /pause to control spend.

What if I want to test the agent’s output against a real API?

Use a tool like Apidog to define the API contract and run real test cases against the implementation.

The validator can call the test runner or Apidog CLI, which gives /goal a measurable end state.

If you are bootstrapping a Claude-powered service on a limited budget, see the free Claude API guide.

Top comments (0)