Codex Goal Mode & Remote Computer Use: How OpenAI's Agent Can Code for Days
TL;DR
On May 21, 2026, OpenAI moved two Codex features to general availability: Goal Mode (a persistent /goal directive that survives session breaks and budget resets) and Locked Computer Use (the desktop agent continues driving Mac apps after screen lock). Combined with gpt-5.3-codex and verifiable success criteria, engineers can delegate real objectives like "ship the v2 checkout endpoint with the benchmark green" and walk away. The breakthrough isn't longer prompts—a coding agent now treats time as a budgetable resource rather than something requiring constant supervision.
Both features shipped in Codex CLI 0.133.0 and matching IDE and desktop builds. After a week running Goal Mode against production repositories, the gap between demos and practical utility depends on how the goal is structured, not patience levels.
What Goal Mode Actually Changes About Your Prompt
Goal Mode replaces per-turn instructions with a persistent objective that Codex re-evaluates each cycle. The command interface is minimal:
# Set or replace the active goal
/goal Reduce p95 checkout latency below 120 ms on the checkout
benchmark while keeping the correctness suite green
/goal # view current goal
/goal pause # stop the loop, keep the state
/goal resume # pick back up where it stopped
/goal clear # discard the goal entirely
Goal structure matters more than wording. The OpenAI cookbook recommends: <desired end state> verified by <specific evidence> while preserving <constraints>—three mandatory slots in that order.
What Fails vs. What Works
Ineffective:
/goal Make the code more elegant
Effective:
/goal Migrate this codebase from Pydantic v1 to v2, verified by
`pytest -q` exiting 0 and `mypy --strict src/` exiting 0,
while preserving all public API signatures listed in
docs/public_api.md
The second version gives Codex measurable targets. The agent writes, runs the suite, reads diffs between expected and actual, revises, and stops when both commands exit zero—or surfaces blockers it cannot overcome.
Stopping conditions are explicit: success, /goal pause, /goal clear, user interruption, a repeated unresolvable blocker, or usage limit exhaustion. Nothing else terminates the loop, making verifiable success criteria more critical than before—without them, the loop only stops on cost constraints.
"Code for Days" Means Something Specific
The phrase "code for days" doesn't mean one continuous uninterrupted session. Goal Mode persists objectives across:
-
Session breaks: Close the terminal, return tomorrow, run
/goal resume, and the agent continues from the last verified state - Token budget resets: When rolling budgets roll over (daily for most plans), the active goal survives and work continues
- Interruptions: Ctrl-C, app crashes, Mac restarts—the goal is journaled to disk; Codex 0.133+ rehydrates it on next launch
This creates a multi-session objective layer. A migration consuming three afternoons of one-shot prompts now runs as one coherent thread. The cost model remains unchanged: every reasoning turn costs the same per-token rate against gpt-5.3-codex. The coordination cost drops to nearly zero, where most wall-clock savings originate.
Real-World Testing
Testing against a production repo migration (Pydantic v1 → v2 on a 14k-line internal service) showed:
- Total wall time: approximately 31 hours across four sessions
- Total Codex token spend at
gpt-5.3-codexrates: roughly $44 - Hand-prompting the same task would have required two full focused days of supervision
- Actual engagement: three check-ins
Locked Computer Use: The Controversial Half
Computer Use shipped earlier in 2026—Codex could operate GUI apps when the Mac was unlocked and monitored. The May 21 update added:
- Continued operation after screen lock: Goal Mode loops driving desktop apps don't stall when screensaver activates
- Mobile triggering: Hand the agent tasks from your phone to drive the Mac left at your desk
Safety Model
Enabling Locked Use installs an Apple authorization plugin participating in macOS unlock flow:
- The Mac unlocks temporarily, but display stays covered—the lock screen remains visible while Codex operates in the background
- Authorization windows are short-lived and scoped to the current unlock attempt; no standing grants exist
- Keyboard, trackpad, or mouse contact immediately relocks the Mac and disables auto-unlock until manual unlock
- Codex asks before operating each new app—mark frequently-used apps "Always allow"
- Cannot drive Terminal apps, Codex itself, or system admin prompts—hard-coded exclusions prevent privilege escalation through GUI automation
Launch Availability & Restrictions
The feature is unavailable in the EEA, UK, and Switzerland at launch. Apple's automation policy blocks several app categories regardless of user settings.
If regular Computer Use isn't enabled, grant Screen Recording and Accessibility permissions to Codex through System Settings first. The plugin install adds only the locked-screen layer.
A Real Goal Mode Loop, End to End
Starting in your project root:
$ cd ~/work/orders-service
$ codex
# Inside the TUI:
> /goal Migrate this codebase from Pydantic v1 to v2, verified by
`pytest -q` exiting 0 and `mypy --strict src/` exiting 0,
while preserving all public API signatures in docs/public_api.md
Codex acknowledges the goal, runs initial scans, and proposes a plan. From here you can:
- Walk away—the loop runs until success, blocker, or budget exhaustion
- Hand off to Locked Computer Use for GUI steps (migration wizards, CI dashboard screenshots, etc.) and lock your Mac
- Trigger status checks from Codex Mobile while away from the laptop
Returning later, /goal shows current state: what's verified, what's pending, last blockers. /goal pause lets you intervene without losing context.
Recommended Starter Configuration
Add to ~/.codex/config.toml:
model = "gpt-5.3-codex"
model_provider = "ofox" # or "openai" if going direct
[model_providers.ofox]
name = "ofox.ai"
base_url = "https://api.ofox.ai/v1"
env_key = "OFOX_API_KEY"
wire_api = "responses"
Goal Mode exposes no per-session token or iteration caps in config.toml—documented stopping levers are slash commands (/goal pause, /goal clear), detected repeated blockers, and your plan's usage limit. The practical control is the usage cap on whichever provider you select. At gpt-5.3-codex rates of $1.75 input / $14 output per million tokens, single mostly-output multi-hour sessions easily run $30-80, so your account cap becomes the actual budget guardrail.
Why Route Codex Through ofox.ai
Goal Mode hammers the model—multi-day objectives routinely make hundreds of reasoning turns with bills dominated by gpt-5.3-codex output tokens at $14/M. Three reasons to pipe requests through a unified gateway instead of directly to OpenAI:
Single key for side models: Goal loops typically delegate cheap sub-tasks (summarization, classification, regex generation) to smaller models. One ofox.ai key routes the hot path to
gpt-5.3-codexand cold path togpt-5.4-miniordeepseek-v4-flashwithout juggling credentialsPer-goal spend visibility: Tag sessions with custom headers; the dashboard shows per-goal cost, not per-day. Useful when determining whether a Pydantic migration justified its expense
Failover on outages: Long-horizon goals get burned by brief provider blips. ofox falls back automatically; direct OpenAI keys error out and force
/goal pauseuntil recovery
When NOT to Use Goal Mode
Three disqualifiers:
Cannot write verification commands: If success means "feels right" or "more elegant," Goal Mode either declares premature victory or churns indefinitely. Use one-shot prompts instead
Work needs frequent human judgment: Goals target autonomy. If every change needs approval, you pay for unused context. Run one-shot sessions instead—cheaper, faster
Destructive work at scale: Database migrations,
git push --force, production touching. Goal Mode excels at unattended convergence but lacks judgment about when not to act. Sandbox agents to worktrees, setapproval_policyrequiring shell command approval, prefer goals with dry-run verification over live mutations
The Shape of the Next Year
Goal Mode plus Locked Computer Use represents the first credible "set a goal, lock your laptop, check tomorrow" coding loop for production use. The agent isn't smarter than last month—friction simply vanished, changing which engineering tasks merit delegating to models. A coding agent surviving screen locks, budget resets, and dinner breaks differs fundamentally from one requiring constant supervision.
The important caveat: hours of attended Goal Mode work proves reliable today, but fully unattended multi-day work still depends on goal verifiability. The discipline of writing goals with real evidence surfaces is now the critical skill, superseding single-turn prompt craft.
Sources & Further Reading
- Codex Changelog — May 2026 — official release notes for Goal Mode GA and Locked Computer Use
- Using Goals in Codex — cookbook with goal syntax and worked examples
- Computer Use — Codex App — official safety model and platform constraints
- MacRumors: Codex Can Use Your Mac When Locked — independent writeup of the unlock flow
- GPT-5.3-Codex on OpenRouter — pricing and context window reference
Originally published on ofox.ai/blog.
Top comments (0)