DEV Community

Richard Gibbons
Richard Gibbons

Posted on • Originally published at digitalapplied.com on

GPT-5.2-Codex: OpenAI's Agentic Coding Model for Enterprise

OpenAI has released GPT-5.2-Codex, their most advanced agentic coding model for professional software engineering. With state-of-the-art benchmark scores, native context compaction for multi-hour coding sessions, and a real-world track record of discovering critical vulnerabilities, GPT-5.2-Codex represents OpenAI's response to the intensifying AI coding race.

Key Stats

Metric Value
SWE-Bench Pro 56.4%
Terminal-Bench 2.0 64%
Context Window 400K
API Input Cost $1.75/1M

Key Takeaways

  • State-of-the-Art Agentic Coding: GPT-5.2-Codex achieves 56.4% on SWE-Bench Pro and 64.0% on Terminal-Bench 2.0, making it OpenAI's most capable coding model for complex, multi-hour engineering tasks.
  • Context Compaction Breakthrough: Native context compaction allows the model to work coherently over millions of tokens in a single task—enabling project-scale refactors and deep debugging sessions that weren't previously possible.
  • Real-World Cybersecurity Proof: A security researcher used the predecessor model to discover multiple React vulnerabilities (CVE-2025-55182 and related), demonstrating practical value for defensive security workflows.
  • Competitive Positioning: While Claude Opus 4.5 leads on SWE-bench Verified (80.9%) and Gemini 3 Pro excels at algorithmic challenges, GPT-5.2-Codex differentiates through agentic endurance, Windows support, and cybersecurity capabilities.
  • Governance Innovation: OpenAI's invite-only trusted access pilot for vetted security professionals signals a maturing approach to dual-use AI capabilities—balancing accessibility with safety as models approach 'High' capability thresholds.

Introduction

OpenAI released GPT-5.2-Codex on December 18, 2025, positioning it as "the most advanced agentic coding model yet for complex, real-world software engineering." The release came amid intense competition—reportedly following an internal "code red" response to Google's Gemini 3 launch. For developers and enterprises evaluating AI coding tools, GPT-5.2-Codex offers a compelling combination of agentic endurance, cybersecurity capabilities, and deep ecosystem integration.

The headline capabilities are substantial: native context compaction enables working coherently over millions of tokens in a single task, the model achieves 56.4% on SWE-Bench Pro (state-of-the-art), and a real-world proof point demonstrates AI-assisted discovery of critical React vulnerabilities. The Codex platform ecosystem—CLI, IDE extension, cloud, and GitHub code review—now operates as a unified experience with 90% faster container caching.

Available Now: GPT-5.2-Codex is immediately available to all paid ChatGPT users (Plus, Pro, Business, Edu, Enterprise) across Codex CLI, IDE extensions, web, mobile, and GitHub code reviews. API access coming in the coming weeks.

GPT-5.2-Codex Technical Specifications

Key specs for developers and engineering teams:

Specification Value Notes
Model ID gpt-5-2-codex API identifier (coming soon)
SWE-Bench Pro 56.4% State-of-the-art
Terminal-Bench 2.0 64.0% Agentic terminal tasks
Context Window 400K input / 128K output With native compaction
API Pricing $1.75 / $14 Input / Output per 1M tokens
Knowledge Cutoff August 31, 2025 Significant upgrade

Features: Context Compaction, Windows Support, Cybersecurity, Vision Capable, MCP Connections, Trusted Access Pilot

What is GPT-5.2-Codex

GPT-5.2-Codex is OpenAI's latest agentic coding model, built on GPT-5.2 and further optimized for the Codex platform. The official tagline is "the most advanced agentic coding model for professional software engineering and defensive cybersecurity." It represents the third major capability jump in the Codex family, following GPT-5-Codex and GPT-5.1-Codex-Max.

An important distinction: "GPT-5.2-Codex" refers to the AI model itself, while "Codex" also refers to the product ecosystem (CLI, IDE extension, cloud, GitHub review). The model powers all Codex surfaces, now unified into a single product experience connected by your ChatGPT account.

Product Surfaces Where GPT-5.2-Codex Runs

  • Codex CLI: Open-source terminal interface with image attachment, to-do tracking, web search, and MCP connections
  • Codex IDE Extension: VS Code and Cursor integration with seamless cloud-to-local context transfer
  • Codex Cloud: Isolated container execution with 90% faster completion time via container caching
  • GitHub Code Review: Auto-reviews PRs when enabled, catches hundreds of issues daily at OpenAI
  • ChatGPT (Web/iOS): Full access through standard ChatGPT interface

Key Capabilities & Improvements

GPT-5.2-Codex introduces several significant improvements over previous Codex models. The core enhancements focus on enabling longer, more complex coding sessions with better performance across diverse environments.

Context Compaction

Work coherently over millions of tokens in a single task.

  • Automatic session compaction at context limits
  • Preserves task-relevant information
  • New /responses/compact API endpoint

Long-Horizon Performance

Sustained multi-step coding tasks over hours.

  • 7+ hour independent work sessions
  • Maintains continuity in large projects
  • Avoids repetition and state loss

Windows Environment

First Codex model with native Windows training.

  • Improved Windows environment compatibility
  • Native PowerShell understanding
  • Windows-specific tooling support

Vision Capabilities

Interpret screenshots, diagrams, and UI surfaces.

  • Design mockups to functional prototypes
  • Technical diagram interpretation
  • UI bug analysis from screenshots

Benchmark Performance

GPT-5.2-Codex achieves state-of-the-art results on benchmarks that measure real-world agentic coding capability. The SWE-Bench Pro and Terminal-Bench 2.0 benchmarks specifically test AI agents on complex, multi-step software engineering tasks.

Benchmark GPT-5.2-Codex GPT-5.2 GPT-5.1
SWE-Bench Pro 56.4% 55.6% 50.8%
Terminal-Bench 2.0 64.0% 62.2%
SWE-Bench Verified (Python) ~80%
AIME 2025 (Math) 100% 100%

What the Benchmarks Measure

SWE-Bench Pro: Given a code repository, the model must generate a patch to solve realistic software engineering tasks. Tests real-world bug fixing and code completion. GPT-5.2-Codex holds state-of-the-art as of December 18, 2025.

Terminal-Bench 2.0: Tests AI agents in real terminal environments: compiling code, training models, setting up servers, and running scripts. Measures tool-driven coding capability.

Benchmark Attribution Note: OpenAI's launch post states GPT-5.2-Codex achieves "state-of-the-art" but does not include explicit numeric scores in the post text. The specific percentages (56.4%, 64.0%) are reported by secondary sources—attribute carefully when citing.

Context Compaction Explained

Context compaction is arguably the most significant technical innovation in GPT-5.2-Codex. It enables the model to work coherently across millions of tokens in a single task—unlocking capabilities that weren't possible with fixed context windows.

How It Works

  1. Model approaches context window limits during work
  2. Automatic compaction preserves task-relevant information
  3. Dramatically reduces token footprint
  4. Continues working with full context awareness
  5. New /responses/compact API for developer control

What It Enables

  • Project-scale refactors
  • Deep debugging sessions over hours
  • Multi-hour agent loops
  • Dependency upgrades across entire projects

Token Efficiency Pattern

OpenAI's internal statistics reveal a striking efficiency pattern:

Task Difficulty Efficiency
Bottom 10% (Easy Tasks) 93.7% fewer tokens than GPT-5
Top 10% (Hard Tasks) 2x more time reasoning, editing, testing, iterating

"Why it feels fast, until it decides it should grind."

Cybersecurity Capabilities

GPT-5.2-Codex represents the third major capability jump in cybersecurity for the Codex family. OpenAI positions it as "significantly stronger than any previous model" for defensive security workflows—with real-world proof to back the claim.

The React2Shell Vulnerability Story (CVE-2025-55182)

OpenAI's flagship example of AI-assisted vulnerability discovery.

The Discovery: Andrew MacPherson (Principal Security Engineer at Privy, a Stripe company) used GPT-5.1-Codex-Max with Codex CLI to study the React2Shell vulnerability. While analyzing one vulnerability, the AI-assisted workflow discovered THREE additional vulnerabilities.

CVE Severity Type
CVE-2025-55182 CVSS 10.0 (Critical) RCE in React Server Components
CVE-2025-55183 CVSS 5.3 (Medium) Source Code Exposure
CVE-2025-55184 CVSS 7.5 (High) Denial of Service

Real-World Impact: Within hours of the December 3, 2025 disclosure, China state-nexus threat groups (Earth Lamia, Jackpot Panda) began exploitation. Microsoft identified several hundred compromised machines. Attackers deployed coin miners, Cobalt Strike, and established persistence.

Preparedness Framework Status

Domain Risk Level Notes
Biological & Chemical HIGH Treated as high-risk with additional mitigations
Cyber Medium Does NOT reach "High" threshold
AI Self-Improvement Medium Does NOT reach "High" threshold

Trusted Access Pilot: OpenAI offers an invite-only program for vetted cybersecurity professionals. Access is based on disclosure history and professional credentials, providing "more permissive models" for defensive security work.

Codex Platform Ecosystem

The December 2025 release includes major upgrades across all Codex surfaces—CLI, IDE extension, cloud, and code review. The platform now operates as a unified experience with significant performance improvements.

Codex CLI

Open-source, rebuilt for agentic workflows.

  • Attach images (screenshots, wireframes, diagrams)
  • To-do list tracking for complex work
  • Built-in web search capability
  • MCP connections support
  • Three approval modes: read-only, auto, full access

Codex Cloud

Isolated container execution with major performance gains.

Container Caching: 90% faster median completion time reduction.

  • Auto-scans for setup scripts
  • Configurable internet access (allowlist/denylist)
  • Network access disabled by default

Code Review Automation

Auto-reviews PRs when enabled for a repository.

OpenAI uses Codex code review internally, reporting that it reviews "the vast majority" of their PRs and catches "hundreds of issues every day."

  • Enable per-repository via GitHub integration
  • Can be invoked directly in PR threads
  • Catches logic bugs faster models overlook

Pricing & Access

GPT-5.2-Codex is available immediately to all paid ChatGPT users, with API access coming soon. The base pricing represents a 1.4x increase over GPT-5.1—a rare price increase reflecting the model's enhanced capabilities.

Token Type Cost per 1M Notes
Input Tokens $1.75 1.4x increase from GPT-5.1
Output Tokens $14.00 Premium pricing for advanced capabilities

Access Tiers

Tier Availability Notes
ChatGPT Plus/Pro/Business Available Now All Codex surfaces included
API Access Coming Soon "In the coming weeks"
Trusted Access Pilot Invite-Only Vetted security professionals

GPT-5.2-Codex vs Claude vs Gemini

December 2025 represents the peak of the AI coding wars, with three major models competing for developer mindshare. Each has distinct strengths—the optimal choice depends on your specific requirements.

Aspect GPT-5.2-Codex Claude Opus 4.5 Gemini 3 Flash
Release Date Dec 18, 2025 Nov 24, 2025 Dec 17, 2025
SWE-Bench Pro 56.4% ~55-56%
SWE-Bench Verified ~80% 80.9% 78%
Context Window 400K 200K 1M
Input Pricing $1.75/1M $15/1M $0.50/1M
Key Strength Agentic endurance, cybersecurity Code quality, complex analysis Speed, cost, multimodal

Choose GPT-5.2-Codex

  • Long-horizon agentic tasks (7+ hours)
  • Cybersecurity workflows
  • Windows environment support
  • GitHub/VS Code ecosystem

Choose Claude Opus 4.5

  • Maximum code quality
  • Complex analysis and refactoring
  • Nuanced instruction following
  • Anthropic ecosystem

Choose Gemini 3 Flash

  • Cost-sensitive development
  • Massive context needs (1M tokens)
  • Multimodal (video, audio)
  • Google Cloud integration

Multi-Model Strategy: Many experts advocate using different models for different tasks—Claude for quality, Codex for endurance, GPT for versatility, Gemini for speed. The framework simplifies model selection based on task requirements.

When NOT to Use GPT-5.2-Codex

Despite its impressive capabilities, GPT-5.2-Codex isn't the optimal choice for every use case. Understanding its limitations helps teams deploy it effectively and avoid scenarios where alternatives perform better.

Avoid GPT-5.2-Codex For

  • Quick one-off snippets: Overkill—use faster, cheaper models
  • Cost-sensitive high-volume: $1.75/1M input is 3.5x Gemini's price
  • Massive context requirements: 400K vs Gemini's 1M token window
  • Pure algorithmic challenges: Gemini 3 may outperform on math/algorithms

Use GPT-5.2-Codex For

  • Repo-wide refactors: Context compaction enables project-scale work
  • Multi-step bug fixes: Hours-long debugging sessions with context
  • Design-to-code workflows: Vision capabilities for mockups and diagrams
  • Defensive security work: Fuzzing, vulnerability analysis, code review

Common Mistakes to Avoid

Teams adopting GPT-5.2-Codex often make predictable mistakes that reduce value or increase costs. Avoiding these patterns helps maximize the model's practical benefits.

Using GPT-5.2-Codex for Simple Tasks

Mistake: Deploying the most expensive model for trivial code generation that cheaper models handle fine.

Fix: Use GPT-5.2-Codex for complex, multi-step tasks where context compaction and agentic capabilities matter. Use faster/cheaper models for quick snippets.

Ignoring the Trusted Access Pilot

Mistake: Security teams struggle with model restrictions when enhanced capabilities are available through the pilot program.

Fix: If you're a vetted security professional with disclosure history, apply for the trusted access pilot for unrestricted defensive security capabilities.

Not Using Context Compaction API

Mistake: Letting sessions fail at context limits instead of leveraging the new compaction endpoint.

Fix: Use the /responses/compact API endpoint for loss-aware compression in long-running sessions. The model can also automatically compact when approaching limits.

Expecting Immediate API Access

Mistake: Planning production integrations that depend on API access before it's available.

Fix: API access is "coming in the coming weeks." Use Codex CLI and IDE integration for immediate access. Plan API integrations for early 2026.

Ignoring Reasoning Level Configuration

Mistake: Using default "high" reasoning for all tasks without considering the new xhigh level or optimization opportunities.

Fix: GPT-5.2 offers reasoning levels: none, low, medium, high, and the new xhigh. Use xhigh for the most complex tasks. The model uses 93.7% fewer tokens on easy tasks—let it optimize.

Frequently Asked Questions

What is GPT-5.2-Codex and when was it released?

GPT-5.2-Codex is OpenAI's most advanced agentic coding model, released on December 18, 2025. Built on GPT-5.2 and optimized specifically for agentic coding workflows, it's designed for professional software engineering and defensive cybersecurity tasks. The model is available to all paid ChatGPT users (Plus, Pro, Business, Edu, Enterprise) across Codex CLI, IDE extensions, web, mobile, and GitHub code reviews. API access is expected in the coming weeks.

How does context compaction work in GPT-5.2-Codex?

Context compaction is a breakthrough capability that allows GPT-5.2-Codex to work coherently across millions of tokens in a single task. When approaching context window limits, the model automatically compacts its session while preserving task-relevant information. This dramatically reduces token footprint and enables project-scale refactors, deep debugging sessions, and multi-hour agent loops that weren't possible before. A new server-side API endpoint (/responses/compact) provides loss-aware compression for developers.

What are the benchmark scores for GPT-5.2-Codex vs competitors?

GPT-5.2-Codex achieves 56.4% on SWE-Bench Pro (state-of-the-art) and 64.0% on Terminal-Bench 2.0. For comparison: Claude Opus 4.5 scores ~80.9% on SWE-bench Verified (a different, Python-only benchmark), and Gemini 3 Pro scores 76.2% on SWE-bench Verified. Note that different benchmarks measure different capabilities—SWE-Bench Pro and Terminal-Bench focus specifically on agentic, multi-step coding tasks where GPT-5.2-Codex excels.

How much does GPT-5.2-Codex cost?

GPT-5.2 base pricing is $1.75 per million input tokens and $14 per million output tokens—a 1.4x increase from GPT-5.1. All paid ChatGPT plans (Plus, Pro, Business, Edu, Enterprise) include access to Codex surfaces. API access is coming soon. For the trusted access pilot program, which provides enhanced cybersecurity capabilities, access is invite-only for vetted security professionals.

What is the trusted access pilot program?

The trusted access pilot is an invite-only program for vetted cybersecurity professionals and organizations. It provides access to 'more permissive models' with unrestricted capabilities for defensive cybersecurity work—including red-teaming, vulnerability research, and malware analysis. Vetting is based on disclosure history and professional credentials. This approach balances accessibility with safety for dual-use cybersecurity capabilities.

How does GPT-5.2-Codex compare to Claude Opus 4.5?

Both are leading AI coding models with different strengths. GPT-5.2-Codex excels at: long-horizon agentic tasks (7+ hour sessions), Windows environment support, cybersecurity workflows, and ecosystem integration (VS Code, GitHub Copilot). Claude Opus 4.5 leads on: SWE-bench Verified (80.9%), production code quality, and refined instruction following. Choose GPT-5.2-Codex for agentic endurance and security work; choose Claude for maximum coding quality and complex analysis.

What is the React2Shell vulnerability discovery story?

A security researcher (Andrew MacPherson at Privy/Stripe) used GPT-5.1-Codex-Max with Codex CLI to study the React2Shell vulnerability (CVE-2025-55182). While analyzing one vulnerability, the AI-assisted workflow discovered THREE additional vulnerabilities. The original CVE-2025-55182 was a critical RCE (CVSS 10.0) in React Server Components, affecting React 19 ecosystem and Next.js. Within hours of disclosure on December 3, 2025, China state-nexus threat groups began exploitation. This demonstrates GPT-5.2-Codex's practical value for defensive security.

When will API access be available for GPT-5.2-Codex?

OpenAI announced that API access for GPT-5.2-Codex is 'coming in the coming weeks' from the December 18, 2025 release date. Currently, the model is available to all paid ChatGPT users across Codex CLI, IDE extensions (VS Code, Cursor), web, mobile, and GitHub code review workflows. The model ID for API use will be 'gpt-5-2-codex' when available.

What are the limitations of GPT-5.2-Codex?

Key limitations include: (1) 400K context window is smaller than Gemini's 1M tokens. (2) Price increase of 1.4x over GPT-5.1 makes it more expensive for high-volume use. (3) May be overkill for simple one-off code snippets. (4) API access not yet available. (5) Cybersecurity capabilities are restricted unless you qualify for the trusted access pilot. (6) Does not reach 'High' cyber capability threshold per OpenAI's Preparedness Framework, meaning more capable models are coming.

Should I use GPT-5.2-Codex or Gemini 3 Flash for coding tasks?

It depends on your use case. Use GPT-5.2-Codex for: long-horizon agentic tasks (multi-hour sessions), repo-wide refactors, Windows environments, cybersecurity work, and when ecosystem integration (Codex CLI, GitHub) matters. Use Gemini 3 Flash for: rapid prototyping, cost-sensitive development ($0.50/1M input vs $1.75/1M), massive context needs (1M tokens), and pure algorithmic challenges. For production workloads, consider a multi-model strategy using each model's strengths.

Top comments (0)