Owen

Posted on May 28 • Originally published at ofox.ai

Codex Goal Mode & Remote Computer Use: How OpenAI's Agent Can Code for Days

#ai #openai #codex #agents

Codex Goal Mode & Remote Computer Use: How OpenAI's Agent Can Code for Days

TL;DR

On May 21, 2026, OpenAI moved two Codex features to general availability: Goal Mode (a persistent /goal directive that survives session breaks and budget resets) and Locked Computer Use (the desktop agent continues driving Mac apps after screen lock). Combined with gpt-5.3-codex and verifiable success criteria, engineers can delegate real objectives like "ship the v2 checkout endpoint with the benchmark green" and walk away. The breakthrough isn't longer prompts—a coding agent now treats time as a budgetable resource rather than something requiring constant supervision.

Both features shipped in Codex CLI 0.133.0 and matching IDE and desktop builds. After a week running Goal Mode against production repositories, the gap between demos and practical utility depends on how the goal is structured, not patience levels.

What Goal Mode Actually Changes About Your Prompt

Goal Mode replaces per-turn instructions with a persistent objective that Codex re-evaluates each cycle. The command interface is minimal:

# Set or replace the active goal
/goal Reduce p95 checkout latency below 120 ms on the checkout
      benchmark while keeping the correctness suite green

/goal           # view current goal
/goal pause     # stop the loop, keep the state
/goal resume    # pick back up where it stopped
/goal clear     # discard the goal entirely

Goal structure matters more than wording. The OpenAI cookbook recommends: <desired end state> verified by <specific evidence> while preserving <constraints>—three mandatory slots in that order.

What Fails vs. What Works

Ineffective:

/goal Make the code more elegant

Effective:

/goal Migrate this codebase from Pydantic v1 to v2, verified by
      `pytest -q` exiting 0 and `mypy --strict src/` exiting 0,
      while preserving all public API signatures listed in
      docs/public_api.md

The second version gives Codex measurable targets. The agent writes, runs the suite, reads diffs between expected and actual, revises, and stops when both commands exit zero—or surfaces blockers it cannot overcome.

Stopping conditions are explicit: success, /goal pause, /goal clear, user interruption, a repeated unresolvable blocker, or usage limit exhaustion. Nothing else terminates the loop, making verifiable success criteria more critical than before—without them, the loop only stops on cost constraints.

"Code for Days" Means Something Specific

The phrase "code for days" doesn't mean one continuous uninterrupted session. Goal Mode persists objectives across:

Session breaks: Close the terminal, return tomorrow, run /goal resume, and the agent continues from the last verified state
Token budget resets: When rolling budgets roll over (daily for most plans), the active goal survives and work continues
Interruptions: Ctrl-C, app crashes, Mac restarts—the goal is journaled to disk; Codex 0.133+ rehydrates it on next launch

This creates a multi-session objective layer. A migration consuming three afternoons of one-shot prompts now runs as one coherent thread. The cost model remains unchanged: every reasoning turn costs the same per-token rate against gpt-5.3-codex. The coordination cost drops to nearly zero, where most wall-clock savings originate.

Real-World Testing

Testing against a production repo migration (Pydantic v1 → v2 on a 14k-line internal service) showed:

Total wall time: approximately 31 hours across four sessions
Total Codex token spend at gpt-5.3-codex rates: roughly $44
Hand-prompting the same task would have required two full focused days of supervision
Actual engagement: three check-ins

Locked Computer Use: The Controversial Half

Computer Use shipped earlier in 2026—Codex could operate GUI apps when the Mac was unlocked and monitored. The May 21 update added:

Continued operation after screen lock: Goal Mode loops driving desktop apps don't stall when screensaver activates
Mobile triggering: Hand the agent tasks from your phone to drive the Mac left at your desk

Safety Model

Enabling Locked Use installs an Apple authorization plugin participating in macOS unlock flow:

The Mac unlocks temporarily, but display stays covered—the lock screen remains visible while Codex operates in the background
Authorization windows are short-lived and scoped to the current unlock attempt; no standing grants exist
Keyboard, trackpad, or mouse contact immediately relocks the Mac and disables auto-unlock until manual unlock
Codex asks before operating each new app—mark frequently-used apps "Always allow"
Cannot drive Terminal apps, Codex itself, or system admin prompts—hard-coded exclusions prevent privilege escalation through GUI automation

Launch Availability & Restrictions

The feature is unavailable in the EEA, UK, and Switzerland at launch. Apple's automation policy blocks several app categories regardless of user settings.

If regular Computer Use isn't enabled, grant Screen Recording and Accessibility permissions to Codex through System Settings first. The plugin install adds only the locked-screen layer.

A Real Goal Mode Loop, End to End

Starting in your project root:

$ cd ~/work/orders-service
$ codex
# Inside the TUI:
> /goal Migrate this codebase from Pydantic v1 to v2, verified by
        `pytest -q` exiting 0 and `mypy --strict src/` exiting 0,
        while preserving all public API signatures in docs/public_api.md

Codex acknowledges the goal, runs initial scans, and proposes a plan. From here you can:

Walk away—the loop runs until success, blocker, or budget exhaustion
Hand off to Locked Computer Use for GUI steps (migration wizards, CI dashboard screenshots, etc.) and lock your Mac
Trigger status checks from Codex Mobile while away from the laptop

Returning later, /goal shows current state: what's verified, what's pending, last blockers. /goal pause lets you intervene without losing context.

Recommended Starter Configuration

Add to ~/.codex/config.toml:

model = "gpt-5.3-codex"
model_provider = "ofox"      # or "openai" if going direct

[model_providers.ofox]
name = "ofox.ai"
base_url = "https://api.ofox.ai/v1"
env_key = "OFOX_API_KEY"
wire_api = "responses"

Goal Mode exposes no per-session token or iteration caps in config.toml—documented stopping levers are slash commands (/goal pause, /goal clear), detected repeated blockers, and your plan's usage limit. The practical control is the usage cap on whichever provider you select. At gpt-5.3-codex rates of $1.75 input / $14 output per million tokens, single mostly-output multi-hour sessions easily run $30-80, so your account cap becomes the actual budget guardrail.

Why Route Codex Through ofox.ai

Goal Mode hammers the model—multi-day objectives routinely make hundreds of reasoning turns with bills dominated by gpt-5.3-codex output tokens at $14/M. Three reasons to pipe requests through a unified gateway instead of directly to OpenAI:

Single key for side models: Goal loops typically delegate cheap sub-tasks (summarization, classification, regex generation) to smaller models. One ofox.ai key routes the hot path to gpt-5.3-codex and cold path to gpt-5.4-mini or deepseek-v4-flash without juggling credentials
Per-goal spend visibility: Tag sessions with custom headers; the dashboard shows per-goal cost, not per-day. Useful when determining whether a Pydantic migration justified its expense
Failover on outages: Long-horizon goals get burned by brief provider blips. ofox falls back automatically; direct OpenAI keys error out and force /goal pause until recovery

When NOT to Use Goal Mode

Three disqualifiers:

Cannot write verification commands: If success means "feels right" or "more elegant," Goal Mode either declares premature victory or churns indefinitely. Use one-shot prompts instead
Work needs frequent human judgment: Goals target autonomy. If every change needs approval, you pay for unused context. Run one-shot sessions instead—cheaper, faster
Destructive work at scale: Database migrations, git push --force, production touching. Goal Mode excels at unattended convergence but lacks judgment about when not to act. Sandbox agents to worktrees, set approval_policy requiring shell command approval, prefer goals with dry-run verification over live mutations

The Shape of the Next Year

Goal Mode plus Locked Computer Use represents the first credible "set a goal, lock your laptop, check tomorrow" coding loop for production use. The agent isn't smarter than last month—friction simply vanished, changing which engineering tasks merit delegating to models. A coding agent surviving screen locks, budget resets, and dinner breaks differs fundamentally from one requiring constant supervision.

The important caveat: hours of attended Goal Mode work proves reliable today, but fully unattended multi-day work still depends on goal verifiability. The discipline of writing goals with real evidence surfaces is now the critical skill, superseding single-turn prompt craft.

Sources & Further Reading

Codex Changelog — May 2026 — official release notes for Goal Mode GA and Locked Computer Use
Using Goals in Codex — cookbook with goal syntax and worked examples
Computer Use — Codex App — official safety model and platform constraints
MacRumors: Codex Can Use Your Mac When Locked — independent writeup of the unlock flow
GPT-5.3-Codex on OpenRouter — pricing and context window reference

Originally published on ofox.ai/blog.

DEV Community

Codex Goal Mode & Remote Computer Use: How OpenAI's Agent Can Code for Days

Codex Goal Mode & Remote Computer Use: How OpenAI's Agent Can Code for Days

TL;DR

What Goal Mode Actually Changes About Your Prompt

What Fails vs. What Works

"Code for Days" Means Something Specific

Real-World Testing

Locked Computer Use: The Controversial Half

Safety Model

Launch Availability & Restrictions

A Real Goal Mode Loop, End to End

Recommended Starter Configuration

Why Route Codex Through ofox.ai

When NOT to Use Goal Mode

The Shape of the Next Year

Sources & Further Reading

Top comments (0)