Anup Karanjkar

Posted on Jul 2 • Originally published at wowhow.cloud

OpenAI Codex Goal Mode Is Now GA — Multi-Hour Autonomous Coding Sessions

#openaicodex #codexgoal #codexrole

OpenAI Codex Goal Mode lets you describe a software objective at 9am and come back at noon to a working implementation. That is the pitch, and after a 6-week closed beta, it is now live for all ChatGPT Plus, Pro, and Team subscribers as of May 21, 2026. Five million weekly active Codex users means this is not a niche feature — it is becoming the default way a significant portion of developers interact with AI coding tools.

The critical distinction between Goal Mode and standard Codex sessions: persistence. A standard Codex session is reactive — you ask, it responds, you approve each step. Goal Mode is proactive — you define a terminal objective, Codex builds a plan, executes it step by step, runs tests after each change, handles failures autonomously, and keeps going until the goal is achieved or it hits a blocker requiring human judgment.

How Goal Mode Works Technically

Goal Mode operates on a four-phase loop that runs without user input until completion or intervention:

Plan phase: Codex reads the repository structure, identifies relevant files, and builds a step-by-step execution plan. This plan is shown to you before execution begins — you can edit it, approve it, or abort. Average plan generation time: 45–90 seconds for a medium-sized repository.
Act phase: Codex executes plan steps sequentially, writing code changes to a sandboxed working copy of your repository. It does not touch your main branch until the goal is complete.
Test phase: After each significant change, Codex runs your existing test suite (pytest, Jest, cargo test, or whatever is configured in your repo). Failures trigger the Review phase.
Review phase: On test failure, Codex reads the failure output, forms a hypothesis, and either tries an alternative approach or surfaces a blocker message if it determines the failure requires human context to resolve.

The loop runs asynchronously. You do not need to stay in the browser. OpenAI sends a notification (email or push) when the goal completes, hits a blocker, or reaches a configurable timeout (default: 4 hours, max: 12 hours).

The 6 New Role Plugins

Goal Mode launched GA with six role plugins that configure Codex's behavior for specific engineering tasks:

Plugin	Optimizes for	Typical goal type

| **RefactorBot** | Safe large-scale refactoring | "Migrate all uses of deprecated API X to the new API Y" |

| **SecurityAuditor** | Vulnerability identification + patching | "Find and fix all SQL injection vectors in the query layer" |

| **TestWriter** | Test coverage expansion | "Bring authentication module coverage from 40% to 85%" |

| **DocBot** | Documentation generation | "Generate JSDoc for all exported functions in src/api/" |

| **MigrationGuide** | Dependency upgrades | "Upgrade from React 18 to React 19, fixing all breaking changes" |

| **PerformanceProfiler** | Code optimization | "Reduce P99 latency in the order processing pipeline by 30%" |

Each plugin changes the underlying system prompt Codex uses, adjusting how cautious it is with changes, how aggressively it tests, and what kinds of interventions it surfaces to you. SecurityAuditor, for example, surfaces every potential issue as a blocker rather than auto-patching — because patching a security vulnerability without developer review is a trust boundary violation even in an agentic context.

Pricing: Where Goal Mode Gets Expensive

Goal Mode is included with ChatGPT Plus ($20/month), Pro ($200/month), and Team ($25/user/month) subscriptions — but with rate limits that differ substantially by tier:

Plan	Goal Mode sessions per month	Max session duration	Concurrent sessions

| ChatGPT Plus ($20/mo) | 10 | 4 hours | 1 |

| ChatGPT Pro ($200/mo) | Unlimited | 12 hours | 3 |

| ChatGPT Team ($25/user/mo) | 25 per user | 4 hours | 1 |

| API (pay-per-token) | Unlimited | No limit | Depends on rate limit tier |

The Plus plan's 10 sessions per month is the primary constraint for heavy users. A 4-hour session that covers a significant feature is one session. Run more than 10 meaningful autonomous tasks per month and you are on Plus hitting the ceiling regularly. For $200/month, Pro eliminates that ceiling but is a steep jump.

The API route exists for teams that need programmatic access: Goal Mode is available via the codex API endpoint with the goal_mode: true parameter. Pricing follows o4-mini token rates: $0.0011 input / $0.0044 output per 1K tokens. A full 4-hour autonomous session on a medium codebase consumes approximately 2–4 million tokens, costing $10–$20 per session via API.

Goal Mode vs. Claude Code Agent Mode

The honest comparison requires being specific about the use case, because these tools have genuinely different strengths:

Goal Mode advantages:

True asynchronous operation — set it and close the browser. Claude Code agent mode requires an active terminal session.
Built-in sandboxed execution — changes go to a working copy, not your local files, until you approve. Claude Code writes to your actual working directory by default.
Role plugins provide task-specific behavior tuning without manual system prompt engineering.
Notification-based completion fits into a "delegate and review" workflow better than interactive terminal sessions.

Claude Code agent mode advantages:

88.6% SWE-bench score (Opus 4.8) versus o4-mini's 68.4%. For complex engineering problems that require genuine reasoning, the capability gap is real.
1M token context window. Goal Mode effectively operates on the files it identifies as relevant — large monorepos with cross-cutting dependencies are harder for it to handle holistically.
MCP ecosystem. Claude Code can invoke external tools (databases, APIs, documentation) mid-session. Codex Goal Mode operates in a more sandboxed environment.
Flat-rate interactive pricing. For heavy users, Claude Code Max at $100/month flat is cheaper than ChatGPT Pro at $200/month.

The honest split: Goal Mode is better for well-defined, bounded tasks where you want true async operation and do not need the highest-capability model. RefactorBot running a migration, TestWriter pushing coverage, or DocBot generating docs are legitimate use cases where o4-mini's lower capability does not matter because the task is mechanical. Claude Code agent mode is better for complex feature work, architecture-level reasoning, and tasks that require integrating context across a large codebase.

Real Examples from the Beta

Three representative cases from the closed beta reports (sourced from OpenAI's blog post and beta participant threads):

Case 1: Django to FastAPI migration (7.2 hours). A developer set a goal to migrate a 15,000-line Django REST API to FastAPI. Goal Mode used MigrationGuide plugin, ran for 7.2 hours across two sessions, and completed 94% of the migration — surfacing 8 blockers where business logic required human decisions about async patterns. The developer reviewed and resolved blockers in 45 minutes total. Estimated manual equivalent: 3 days.

Case 2: Generating test suite for a payment module (3.1 hours). TestWriter plugin, payment module with zero existing tests, 87% coverage achieved in 3.1 hours. The developer noted that 12% of the generated tests had incorrect assertions about edge cases that required manual correction. Coverage was real; assertion quality required review.

Case 3: Security audit on a Node.js API (4 hours, timed out). SecurityAuditor ran for 4 hours (Plus plan limit) and surfaced 23 potential vulnerabilities before timeout. The developer reported 19 of 23 were legitimate issues, 3 were false positives, and 1 was a valid issue it did not catch. Not a clean pass, but 19 real vulnerabilities found autonomously is a strong result for a single session.

How to Start a Goal Mode Session

The workflow from the ChatGPT interface:

Open ChatGPT and select Codex from the left sidebar
Connect your repository via GitHub OAuth (first time only)
Click New Goal
Select a role plugin or choose "Custom" for free-form objectives
Write your goal statement — be specific about the target state, not the steps
Review the generated plan and approve or edit
Click Start Goal and optionally set a notification preference

Goal statements that work well: "Add Redis caching to all database queries in the user service, with TTLs appropriate to each query's data volatility, and add integration tests." Goal statements that fail: "Make the app faster." The more specific the target state, the better the plan quality.

Templates for writing effective Goal Mode prompts for different engineering tasks are available in the developer toolkit at wowhow.cloud.

At session end, Codex produces: a git diff of all changes, a test run summary, a log of every step taken, and a list of any unresolved blockers. Changes are held in a working copy until you review and approve the merge. You never get silent commits to your main branch.

Originally published at wowhow.cloud

DEV Community

OpenAI Codex Goal Mode Is Now GA — Multi-Hour Autonomous Coding Sessions

How Goal Mode Works Technically

The 6 New Role Plugins

Pricing: Where Goal Mode Gets Expensive

Goal Mode vs. Claude Code Agent Mode

Real Examples from the Beta

How to Start a Goal Mode Session

People Also Ask

What is OpenAI Codex Goal Mode?

How much does Codex Goal Mode cost?

Is Goal Mode better than Claude Code for autonomous coding?

What does a Codex Goal Mode session actually output?

At session end, Codex produces: a git diff of all changes, a test run summary, a log of every step taken, and a list of any unresolved blockers. Changes are held in a working copy until you review and approve the merge. You never get silent commits to your main branch.

Top comments (0)