<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Owen</title>
    <description>The latest articles on DEV Community by Owen (@owen_fox).</description>
    <link>https://dev.to/owen_fox</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3893304%2Fb8cec06b-7789-423e-a8d0-386db7f00620.png</url>
      <title>DEV Community: Owen</title>
      <link>https://dev.to/owen_fox</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/owen_fox"/>
    <language>en</language>
    <item>
      <title>Qwen 3.7 Plus vs Qwen 3.7 Max in 2026: Multimodal Agent or Pure-Text Flagship? Real Benchmarks + Pricing</title>
      <dc:creator>Owen</dc:creator>
      <pubDate>Tue, 02 Jun 2026 15:23:57 +0000</pubDate>
      <link>https://dev.to/owen_fox/qwen-37-plus-vs-qwen-37-max-in-2026-multimodal-agent-or-pure-text-flagship-real-benchmarks--p2c</link>
      <guid>https://dev.to/owen_fox/qwen-37-plus-vs-qwen-37-max-in-2026-multimodal-agent-or-pure-text-flagship-real-benchmarks--p2c</guid>
      <description>&lt;h1&gt;
  
  
  Qwen 3.7 Plus vs Qwen 3.7 Max in 2026: Multimodal Agent or Pure-Text Flagship? Real Benchmarks + Pricing
&lt;/h1&gt;

&lt;p&gt;On June 1, 2026, Alibaba quietly shipped Qwen 3.7 Plus, eleven days after Qwen 3.7 Max landed. Same 1M context, same 35-hour autonomous ceiling, same price floor. The only thing that changed: Plus now sees images and video. Vision Arena already has it at rank #16. So the real question this week isn't "which Qwen," it's "do I pay for eyes."&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR: Which One Should You Pick? (30-Second Answer)
&lt;/h2&gt;

&lt;p&gt;Qwen 3.7 Max is the pure-text flagship. Qwen 3.7 Plus is Max with vision added on top. Both share the 1M context window and the 35-hour autonomous run ceiling. Pick by workload:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Pick&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Long-context coding, no screenshots&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Qwen 3.7 Max&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent reads UI screenshots or design mockups&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Qwen 3.7 Plus&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tight budget, output-heavy generation&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Qwen 3.7 Max&lt;/strong&gt; ($7.50/M output)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Video transcription + reasoning&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Qwen 3.7 Plus&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;35-hour autonomous CLI agent&lt;/td&gt;
&lt;td&gt;Either, same ceiling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cheapest cached refresh prompts&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Qwen 3.7 Max&lt;/strong&gt; ($0.25/M cached)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you have to commit to one for the next quarter and your agent never sees pixels, take Max. If half of what your agent processes is non-text, the Plus surcharge pays for itself by killing your OCR pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Specs Comparison
&lt;/h2&gt;

&lt;p&gt;Both models ship through Alibaba's Bailian platform and through ofox's OpenAI-compatible endpoint. The table is what your procurement spreadsheet actually needs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Qwen 3.7 Plus&lt;/th&gt;
&lt;th&gt;Qwen 3.7 Max&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Released&lt;/td&gt;
&lt;td&gt;2026-06-01&lt;/td&gt;
&lt;td&gt;2026-05-21&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Modality&lt;/td&gt;
&lt;td&gt;Text + Image + Video&lt;/td&gt;
&lt;td&gt;Text only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context window&lt;/td&gt;
&lt;td&gt;1,000,000 tokens&lt;/td&gt;
&lt;td&gt;1,000,000 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Input price (text)&lt;/td&gt;
&lt;td&gt;$2.50 / M tokens&lt;/td&gt;
&lt;td&gt;$2.50 / M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output price&lt;/td&gt;
&lt;td&gt;$7.50 / M tokens&lt;/td&gt;
&lt;td&gt;$7.50 / M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cached input&lt;/td&gt;
&lt;td&gt;$0.25 / M tokens&lt;/td&gt;
&lt;td&gt;$0.25 / M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image input&lt;/td&gt;
&lt;td&gt;Per-image surcharge&lt;/td&gt;
&lt;td&gt;Not supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Autonomous run ceiling&lt;/td&gt;
&lt;td&gt;35 hours&lt;/td&gt;
&lt;td&gt;35 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sequential tool calls&lt;/td&gt;
&lt;td&gt;1000+&lt;/td&gt;
&lt;td&gt;1000+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LM Arena (text) rank&lt;/td&gt;
&lt;td&gt;#15&lt;/td&gt;
&lt;td&gt;#13&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LM Arena (coding) rank&lt;/td&gt;
&lt;td&gt;#12&lt;/td&gt;
&lt;td&gt;#10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vision Arena rank&lt;/td&gt;
&lt;td&gt;#16&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Bench Pro&lt;/td&gt;
&lt;td&gt;~60% (text path)&lt;/td&gt;
&lt;td&gt;60.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP-Atlas&lt;/td&gt;
&lt;td&gt;76.4&lt;/td&gt;
&lt;td&gt;76.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Availability&lt;/td&gt;
&lt;td&gt;Bailian + ofox&lt;/td&gt;
&lt;td&gt;Bailian + ofox&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two things most spec sheets bury. Cached input is the same $0.25/M on both, so refresh-heavy workloads aren't punished for picking Plus. And Vision Arena #16 at launch, for a model barely a day old, already beats several established multimodal flagships.&lt;/p&gt;

&lt;h2&gt;
  
  
  Coding Benchmark: Real Tasks
&lt;/h2&gt;

&lt;p&gt;The model that wins benchmarks is rarely the model that wins your sprint. We ran three real engineering tasks on both models using the same prompts via ofox's API, recording token usage, wall-clock time, and a 1-5 quality rating from a senior reviewer. Methodology: 5 runs per task, median reported, temperature 0.2.&lt;/p&gt;

&lt;h3&gt;
  
  
  Task 1: Refactor a 1,200-line Python service into async
&lt;/h3&gt;

&lt;p&gt;Refactor a synchronous FastAPI service (&lt;code&gt;requests&lt;/code&gt; + blocking DB calls) into &lt;code&gt;httpx&lt;/code&gt; + asyncpg, preserve all endpoints, add proper cancellation, return a unified diff.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Qwen 3.7 Plus&lt;/th&gt;
&lt;th&gt;Qwen 3.7 Max&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Input tokens&lt;/td&gt;
&lt;td&gt;12,840&lt;/td&gt;
&lt;td&gt;12,840&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output tokens&lt;/td&gt;
&lt;td&gt;4,210&lt;/td&gt;
&lt;td&gt;3,980&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time (median)&lt;/td&gt;
&lt;td&gt;47 sec&lt;/td&gt;
&lt;td&gt;41 sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quality (1-5)&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Diff applied cleanly&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Verdict: tied on quality, Max is roughly 14% faster on text-only tasks. Plus carries its multimodal stack on every request, and that latency overhead is real even when you send no images.&lt;/p&gt;

&lt;h3&gt;
  
  
  Task 2: Debug a flaky test from a screenshot + stack trace
&lt;/h3&gt;

&lt;p&gt;Given a screenshot of a Jest test report showing two failing assertions and a 60-line stack trace as text, identify the root cause and propose a fix.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Qwen 3.7 Plus&lt;/th&gt;
&lt;th&gt;Qwen 3.7 Max&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Input tokens&lt;/td&gt;
&lt;td&gt;8,420 + 1 image&lt;/td&gt;
&lt;td&gt;8,420 (image dropped)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output tokens&lt;/td&gt;
&lt;td&gt;1,830&lt;/td&gt;
&lt;td&gt;2,140&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time&lt;/td&gt;
&lt;td&gt;12 sec&lt;/td&gt;
&lt;td&gt;9 sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quality (1-5)&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Identified the real cause&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No (guessed wrong line)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Verdict: this is the whole Plus thesis. Max sees the text but loses the visual signal that the test report highlighted a parent component, not the child being tested. Plus reads the highlight and fixes the right line on the first try. If your debugging loop ever involves a pasted screenshot, the model that can actually see it wins.&lt;/p&gt;

&lt;h3&gt;
  
  
  Task 3: 1,000-step autonomous CLI agent, Postgres 14 to 16 migration
&lt;/h3&gt;

&lt;p&gt;Run a goal-oriented agent that plans the migration, runs &lt;code&gt;pg_dump&lt;/code&gt;, validates schemas, executes the upgrade, and writes a rollback script. We let it run unattended for 4 hours each (well under the 35-hour ceiling).&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Qwen 3.7 Plus&lt;/th&gt;
&lt;th&gt;Qwen 3.7 Max&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tool calls executed&lt;/td&gt;
&lt;td&gt;342&lt;/td&gt;
&lt;td&gt;351&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Errors recovered&lt;/td&gt;
&lt;td&gt;4 of 5&lt;/td&gt;
&lt;td&gt;5 of 5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Completion (% of plan)&lt;/td&gt;
&lt;td&gt;96%&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total cost&lt;/td&gt;
&lt;td&gt;$1.84&lt;/td&gt;
&lt;td&gt;$1.71&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Verdict: Max wins by a hair on text-only agentic flow. The Plus run cost roughly 7% more for the same text-only work — overhead from carrying multimodal capability it never used here. That's the cost of carrying the camera. Neither model came close to the autonomous ceiling; both still had 30+ hours of runway when they finished.&lt;/p&gt;

&lt;p&gt;The pattern across all three tasks is the same. Pure text input: Max is 7-15% faster and slightly cheaper. Visual signal in the input: Max guesses, Plus reads. This isn't a benchmark artifact. It tracks Alibaba's own positioning of Plus as the multimodal version of the same flagship.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multimodal &amp;amp; Vision Capabilities (Plus's Home Turf)
&lt;/h2&gt;

&lt;p&gt;Qwen 3.7 Plus is the only model in this comparison that ingests pixels, so the section has no Max column; it's about what Plus actually unlocks. Three capability tiers, in order of how often we see them in production:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 1: UI debugging and design QA.&lt;/strong&gt; Plus reads a screenshot of a broken layout, finds the offending CSS rule, and proposes a fix. We ran 20 production tickets through this loop. Plus resolved 14 from the screenshot alone. Max resolved 0; it can only react to whatever text someone manually transcribed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 2: PDF and document reasoning.&lt;/strong&gt; Plus takes a multi-page PDF (invoices, contracts, research papers) and reasons over both the text and the visual layout: table cells, figure callouts, footnote positions. This kills the "pdf-to-markdown then prompt" pipeline that most teams glue together with &lt;code&gt;pdfplumber&lt;/code&gt; and prayer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 3: Video summarization with timestamp grounding.&lt;/strong&gt; Plus accepts video input up to a duration ceiling that Bailian gates per tier. Practical use: feed in a 15-minute recorded standup, get back a timestamped action-item list. We tested this on three recorded engineering reviews. The action items it surfaced were accurate enough that we stopped taking manual notes.&lt;/p&gt;

&lt;p&gt;Vision Arena rank #16 at launch is the headline number, and it understates the practical lift. Vision Arena weights generic image-understanding tasks. What makes Plus useful in practice is that the vision capability sits on the same reasoning and tool-call substrate as Max. Other multimodal models (we'll name no names) can describe an image well but can't then call a tool with the result. Plus chains "look at screenshot → identify error → run &lt;code&gt;pytest -k foo&lt;/code&gt; → report" inside a single agentic loop. That chaining is the moat.&lt;/p&gt;

&lt;p&gt;The hard NO for Plus: it does not generate images or video, only ingests them. If you need text-to-image, you still need a separate generation model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool Invocation &amp;amp; Agentic Tasks
&lt;/h2&gt;

&lt;p&gt;Both models share Alibaba's most aggressive agentic numbers in the industry: 35-hour continuous autonomous runs, 1000+ sequential tool calls in a single session. Those numbers come from Alibaba's launch material; we independently reproduced multi-hour runs (4+ hours unattended) without hitting any ceiling.&lt;/p&gt;

&lt;p&gt;Why these numbers matter. Most "agent" frameworks die around the 100-tool-call mark because the model loses context coherence. Once an agent has burned through 80% of its window on planning and tool I/O, every subsequent action degrades. 1M context plus the state-management heuristics Alibaba tuned for long agent traces is what lets Qwen 3.7 hold the line where smaller-window models start hallucinating their own prior tool outputs.&lt;/p&gt;

&lt;p&gt;Tool-call patterns we observed across both models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Self-correcting tool errors.&lt;/strong&gt; When a &lt;code&gt;curl&lt;/code&gt; call returns 500, both models log the failure, wait, retry with backoff. Neither model loops infinitely.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Multi-step planning before execution.&lt;/strong&gt; Both decompose "deploy to staging" into 14-18 ordered sub-tasks before running anything. Plans are visible in the trace, so you can interrupt before things get expensive.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Stateful memory across hours.&lt;/strong&gt; A migration script written at hour 1 is still correctly referenced at hour 3. The 1M context is the engineering reason this works.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Where Plus extends Max: visually grounded tool calls. Examples from production traces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  "Look at the Datadog dashboard screenshot → identify the metric in red → query Datadog API for the corresponding service → write a runbook."&lt;/li&gt;
&lt;li&gt;  "Read the design Figma export → generate the JSX → screenshot the rendered result → compare against the original."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These loops simply don't run on Max, because Max can't ingest the screenshot or the Figma export. You can fake it with a stack of (OCR service + vision-to-text model + Max), but the cost, latency, and failure surface of that stack is materially worse than running Plus end-to-end.&lt;/p&gt;

&lt;p&gt;MCP-Atlas (the multi-step tool-use benchmark) shows both models at 76.4; they share the same tool-invocation engine. So picking between them is purely about whether your tools speak pixels.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing Math: Capability-per-Dollar Index
&lt;/h2&gt;

&lt;p&gt;Spec sheets quote $/M tokens. Procurement quotes monthly bills. Here are two scenarios with real numbers, built from anonymized usage of three teams that have been running both models since launch.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario A: 5-developer team, text-only coding agent
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  50 coding tasks per developer per day, 21 working days per month&lt;/li&gt;
&lt;li&gt;  Median task: 6,000 input + 1,800 output tokens&lt;/li&gt;
&lt;li&gt;  30% of inputs hit cache (refreshed prompt templates)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Monthly token volume per developer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Input: 50 × 21 × 6,000 = 6.30M tokens; cached fraction at $0.25/M = 1.89M × $0.25 = $0.47; uncached at $2.50/M = 4.41M × $2.50 = $11.03&lt;/li&gt;
&lt;li&gt;  Output: 50 × 21 × 1,800 = 1.89M tokens × $7.50 = $14.18&lt;/li&gt;
&lt;li&gt;  Per developer: $25.68&lt;/li&gt;
&lt;li&gt;  Team of 5: &lt;strong&gt;$128.40 / month on Qwen 3.7 Max&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Switching the same workload to Plus: identical pricing on text tokens, so the bill is also $128.40/month. But median task time is 14% higher, so end-to-end developer wait grows by roughly 6 seconds per task. Coding-per-dollar index ranks Max ahead because of latency, not direct cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario B: 5-developer team, visual debugging agent
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  Same 50 tasks/day/dev, same 21 working days&lt;/li&gt;
&lt;li&gt;  60% of tasks include 1 screenshot (Plus only; Max drops the image)&lt;/li&gt;
&lt;li&gt;  Median image: ≈ 1,280 image tokens at multimodal rate&lt;/li&gt;
&lt;li&gt;  Median text payload unchanged&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Plus monthly cost per developer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Text input + output: $25.68 (same as Scenario A)&lt;/li&gt;
&lt;li&gt;  Image: 50 × 21 × 0.6 × 1,280 tokens at multimodal rate ≈ $4.50&lt;/li&gt;
&lt;li&gt;  Per developer: ≈ $30.18&lt;/li&gt;
&lt;li&gt;  Team of 5: &lt;strong&gt;$150.90 / month on Qwen 3.7 Plus&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Same workload on Max. Max can't read the screenshots, so the team replaces the visual signal with manual transcription. Manual screenshot triage adds about 4 minutes per task at $80/hour loaded cost, or $5.33 per task in human time. With 60% of tasks including screenshots: 50 × 21 × 0.6 × $5.33 = $3,358 / developer / month in lost engineering time. Team of 5: &lt;strong&gt;$16,790 / month in shadow labor cost on Max.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Vision-per-dollar index for the visual debugging workload: Plus wins by roughly 100×. That's the math that justifies switching.&lt;/p&gt;

&lt;p&gt;The rule of thumb. If your agent never sees pixels, run Max; Plus's multimodal warm-up overhead costs you 7-15% in latency for no benefit. If your agent sees pixels even 20% of the time, switch to Plus. The OCR pipeline you stop maintaining and the human triage you stop paying for cover the token surcharge instantly.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Pick Qwen 3.7 Plus
&lt;/h2&gt;

&lt;p&gt;Pick Qwen 3.7 Plus when your agent processes anything that isn't plain text. Concrete pick signals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Visual debugging loops.&lt;/strong&gt; Screenshots, stack traces in image form, layout bugs, design-vs-implementation diffs.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Document intelligence.&lt;/strong&gt; PDFs with non-trivial layout (multi-column papers, financial filings, contracts). Plus reads the layout, not just the text.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Video summarization.&lt;/strong&gt; Standup recordings, lecture content, internal demos. Plus surfaces timestamped takeaways.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Visually grounded agents.&lt;/strong&gt; Agents that need to "look then act": UI testers, design QA bots, screenshot-driven CI.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Mixed workloads where 20%+ of inputs are non-text.&lt;/strong&gt; Below 20% you can keep Max + OCR; above 20% the math flips.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Also pick Plus if you want the option to add visual capability later without re-plumbing your endpoint. Plus is API-compatible with Max for text-only requests, so you can start text-only today and start attaching images the day your product demands it.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Pick Qwen 3.7 Max
&lt;/h2&gt;

&lt;p&gt;Pick Qwen 3.7 Max when every prompt your system sends is text and you care about latency per dollar. Concrete pick signals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;CLI coding agents.&lt;/strong&gt; Terminal-only workflows, no UI screenshots. See &lt;a href="https://dev.to/blog/qwen-3-7-max-coding-arena-rank-4-vs-claude-opus-2026/"&gt;Qwen 3.7 Max coding arena benchmarks&lt;/a&gt; and &lt;a href="https://dev.to/blog/qwen3-7-max-developer-guide-2026/"&gt;Qwen 3.7 Max developer guide&lt;/a&gt; for the deep-dive integration patterns.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Doc generation, log triage, ETL prompts.&lt;/strong&gt; Pure text pipelines.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Refresh-heavy workloads.&lt;/strong&gt; Cached-input pricing at $0.25/M is identical on Plus, but Max's slightly faster cold-path latency compounds across repeated calls.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Cost-sensitive output-heavy generation.&lt;/strong&gt; $7.50/M output is the same on both, but Max's lower latency lets you ship more output per developer-hour.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;35-hour autonomous text agents.&lt;/strong&gt; Same ceiling as Plus, no multimodal overhead.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Also pick Max when you're benchmarking against &lt;a href="https://dev.to/blog/claude-opus-4-8-release-review-2026/"&gt;GPT-5.5 or Claude Opus 4.8&lt;/a&gt; on pure coding tasks. Max's SWE-Bench Pro 60.6% is the current proprietary high-water mark on that benchmark — a 2-point edge over GPT-5.5's 58.6%. That lead is specific to SWE-Bench Pro, though: GPT-5.5 pulls ahead on SWE-Bench Verified, so weight whichever benchmark's task mix looks most like your codebase.&lt;/p&gt;

&lt;p&gt;For the prior-generation comparison logic behind both decisions, see &lt;a href="https://dev.to/blog/qwen-3-6-plus-vs-deepseek-v4-pro-coding-2026/"&gt;Qwen 3.6 Plus vs DeepSeek V4 Pro on coding&lt;/a&gt;: same decision framework, different model pair.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try Both via ofox
&lt;/h2&gt;

&lt;p&gt;The single-key advantage matters more for this pair than any other Qwen comparison. Plus and Max share modality at the text layer, so the cleanest way to A/B them is to send the same prompt to both endpoints and diff the outputs.&lt;/p&gt;

&lt;p&gt;ofox hosts both models on its OpenAI-compatible API: &lt;a href="https://ofox.ai/models/qwen/qwen3-7-plus" rel="noopener noreferrer"&gt;&lt;code&gt;ofox.ai/models/qwen/qwen3-7-plus&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://ofox.ai/models/qwen/qwen3-7-max" rel="noopener noreferrer"&gt;&lt;code&gt;ofox.ai/models/qwen/qwen3-7-max&lt;/code&gt;&lt;/a&gt;. One API key, one base URL, swap the &lt;code&gt;model&lt;/code&gt; field in your request body. The pattern we'd actually run in production: keep Max as the default for text-only traffic, route only image-containing requests to Plus. That preserves your latency budget and adds vision capability exactly where it changes outcomes.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Does Qwen 3.7 Plus support 1M context like Qwen 3.7 Max?&lt;/strong&gt; Yes. Both share the same 1M-token context window. Plus shares that window with image and video tokens (≈ 1,280 tokens per 1080p frame), so effective text headroom shrinks proportionally to your visual payload.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is Qwen 3.7 Plus better than Qwen 3.7 Max for coding?&lt;/strong&gt; Marginally worse on pure text-only coding (Max #10 vs Plus #12 on LM Arena coding). Significantly better when the coding task includes a screenshot, design mockup, or other visual signal. Plus reads it, Max guesses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How much does Qwen 3.7 Plus cost compared to Qwen 3.7 Max?&lt;/strong&gt; Text-token rates are identical: $2.50/M input, $7.50/M output, $0.25/M cached. Plus adds a per-image and per-video-second surcharge for multimodal inputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can Qwen 3.7 Plus run for 35 hours autonomously?&lt;/strong&gt; Yes. Alibaba's launch material lists autonomous iteration and tool invocation as core capabilities of Plus. We have validated 4-hour unattended runs; we have not personally hit the 35-hour ceiling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does Qwen 3.7 Max compare to GPT-5.5 on SWE-Bench Pro?&lt;/strong&gt; Qwen 3.7 Max scores 60.6% versus GPT-5.5 at 58.6%, a 2-point lead and the current proprietary high-water mark on that benchmark.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Should I migrate from Qwen 3.7 Max to Qwen 3.7 Plus?&lt;/strong&gt; Only if 20%+ of your agent's inputs are non-text. Below that threshold, Max's lower latency and matched price make migration a net negative.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does Qwen 3.7 Plus generate images?&lt;/strong&gt; No. Plus ingests images and video but does not generate them. You still need a separate generation model for text-to-image workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where can I try both models in one place?&lt;/strong&gt; ofox lists both at &lt;code&gt;ofox.ai/models/qwen/qwen3-7-plus&lt;/code&gt; and &lt;code&gt;ofox.ai/models/qwen/qwen3-7-max&lt;/code&gt;, OpenAI-compatible API, single key.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources Checked for This Refresh
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Alibaba Qwen Team launch note for Qwen 3.7 Plus, June 2, 2026: &lt;a href="https://www.marktechpost.com/2026/06/02/alibabas-qwen-team-launches-qwen3-7-plus-adding-vision-deep-reasoning-tool-invocation-and-autonomous-iteration-on-the-bailian-platform/" rel="noopener noreferrer"&gt;https://www.marktechpost.com/2026/06/02/alibabas-qwen-team-launches-qwen3-7-plus-adding-vision-deep-reasoning-tool-invocation-and-autonomous-iteration-on-the-bailian-platform/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Qwen 3.7 Max benchmark report on OpenRouter (verified 2026-06-02): &lt;a href="https://openrouter.ai/qwen/qwen3.7-max/benchmarks" rel="noopener noreferrer"&gt;https://openrouter.ai/qwen/qwen3.7-max/benchmarks&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Qwen Research page (verified 2026-06-02): &lt;a href="https://qwen.ai/research" rel="noopener noreferrer"&gt;https://qwen.ai/research&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  VentureBeat coverage of Qwen 3.7 Max 35-hour autonomous runs: &lt;a href="https://venturebeat.com/technology/alibabas-proprietary-qwen3-7-max-can-run-for-35-hours-autonomously-and-supports-external-harnesses-like-anthropics-claude-code" rel="noopener noreferrer"&gt;https://venturebeat.com/technology/alibabas-proprietary-qwen3-7-max-can-run-for-35-hours-autonomously-and-supports-external-harnesses-like-anthropics-claude-code&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  ofox model catalog snapshot, 2026-06-02: Qwen 3.7 Plus listed 2026-06-01, Qwen 3.7 Max listed 2026-05-21&lt;/li&gt;
&lt;li&gt;  LM Arena leaderboard snapshot, 2026-06-02&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The honest summary you can send your tech lead in one Slack message: "Max is the faster, cheaper text flagship. Plus is the same model with eyes. If our agent ever looks at a screenshot, we should be on Plus. Otherwise stay on Max. The token bill is basically the same either way; the difference is whether we keep gluing OCR pipelines to a model that can't see."&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://ofox.ai/blog/qwen-3-7-plus-vs-qwen-3-7-max-real-benchmark-2026/" rel="noopener noreferrer"&gt;ofox.ai/blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>qwen</category>
      <category>benchmark</category>
      <category>multimodal</category>
    </item>
    <item>
      <title>Cursor Composer 2.5: What's New, Best Models, and How to Set It Up</title>
      <dc:creator>Owen</dc:creator>
      <pubDate>Fri, 29 May 2026 22:07:13 +0000</pubDate>
      <link>https://dev.to/owen_fox/cursor-composer-25-whats-new-best-models-and-how-to-set-it-up-3c37</link>
      <guid>https://dev.to/owen_fox/cursor-composer-25-whats-new-best-models-and-how-to-set-it-up-3c37</guid>
      <description>&lt;h1&gt;
  
  
  Cursor Composer 2.5: What's New, Best Models, and How to Set It Up
&lt;/h1&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Cursor shipped &lt;strong&gt;Composer 2.5&lt;/strong&gt; on May 18, 2026 — post-trained from Moonshot's Kimi K2.5, scoring 79.8% on SWE-Bench Multilingual and 63.2% on CursorBench v3.1 (just past Opus 4.7's 61.6%). It is genuinely good at sustained agent work, but the "Fast" tier defaults to &lt;strong&gt;$3/$15 per million tokens&lt;/strong&gt; — six times the Standard price for the same model. The cheapest sane Cursor setup right now: Composer 2.5 Standard for routine edits, plus a BYO route to Claude Sonnet 4.6 or GPT-5.4 Codex for the hard ones.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually changed in Composer 2.5
&lt;/h2&gt;

&lt;p&gt;Composer 2.5 is &lt;strong&gt;the same Kimi K2.5 open-source backbone as Composer 2&lt;/strong&gt;, with a different post-training stack on top. Cursor's release post lists three concrete shifts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;25× more synthetic training tasks&lt;/strong&gt; than Composer 2, including a new family of "feature deletion" puzzles where the model is given a working repo with a feature ripped out and has to rebuild it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Textual-feedback RL&lt;/strong&gt; — localized hints at each failed tool call, instead of only an end-of-run reward signal. That is the change behind the "follows complex instructions more reliably" line in the announcement.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MoE-scale infrastructure&lt;/strong&gt; — Cursor confirmed they invested heavily in distributed training plumbing so they can keep iterating on the base. They also confirmed (in the same post) that they are jointly training a much larger model from scratch with SpaceXAI — "10× more total compute" on Colossus 2 — but that one is not Composer 2.5.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Benchmark Results
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Composer 2.5&lt;/th&gt;
&lt;th&gt;Claude Opus 4.7&lt;/th&gt;
&lt;th&gt;GPT-5.5&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Bench Multilingual&lt;/td&gt;
&lt;td&gt;79.8%&lt;/td&gt;
&lt;td&gt;80.5%&lt;/td&gt;
&lt;td&gt;77.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CursorBench v3.1 (default settings)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;63.2%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;61.6%&lt;/td&gt;
&lt;td&gt;59.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Terminal-Bench 2.0&lt;/td&gt;
&lt;td&gt;69.3%&lt;/td&gt;
&lt;td&gt;69.4%&lt;/td&gt;
&lt;td&gt;82.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A caveat worth sitting with: &lt;strong&gt;CursorBench is Cursor's eval&lt;/strong&gt;, and Composer 2.5 is Cursor's model. Top developers on Hacker News pointed out that Composer 2's CursorBench score quietly dropped from 60–65% to 50–55% between v3.0 and v3.1 — the kind of bench-version drift that should make you cautious about any single-vendor leaderboard. And Composer 2.5 loses Terminal-Bench 2.0 to GPT-5.5 by 13 percentage points. If your day is mostly shell-and-CLI work, that gap matters.&lt;/p&gt;

&lt;p&gt;The HN thread is also where the cost story is: one engineer reported a 4-person team's Cursor bill jumping from "$20–100 per person" to roughly $1,000 total per month after the Fast tier became default. The complaint is fair — Fast pricing is roughly 3× Composer 2.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing — and the trap most people walk into
&lt;/h2&gt;

&lt;p&gt;Composer 2.5 has &lt;strong&gt;two tiers that serve the same model weights&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Input&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;th&gt;Default?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Standard&lt;/td&gt;
&lt;td&gt;$0.50 / M tokens&lt;/td&gt;
&lt;td&gt;$2.50 / M tokens&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;$3.00 / M tokens&lt;/td&gt;
&lt;td&gt;$15.00 / M tokens&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Yes, same model. Fast is just inference on hotter, more expensive hardware so the first token arrives sooner. There is no quality difference.&lt;/p&gt;

&lt;p&gt;This matters because &lt;strong&gt;Fast is the default&lt;/strong&gt;, and most people never change it. If you are running an agent loop that fires off 30 tool calls before producing 200 lines of code, Fast will burn through your monthly credits in days. Cursor doubled the included usage in the first week after launch (through ~May 25, 2026) to soften the rollout, but that promotion is over.&lt;/p&gt;

&lt;p&gt;The pragmatic rule: use Standard everywhere unless you can feel the latency. Standard matches Opus 4.7 on output cost ($2.50/M tokens versus $15/M for Opus), which is the comparison actually worth running.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to set it up in Cursor
&lt;/h2&gt;

&lt;p&gt;If you already have Cursor installed and up to date, this takes under five minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Update Cursor.&lt;/strong&gt; Composer 2.5 ships in Cursor 3.4+ (3.5 is the current release as of May 20, 2026). &lt;code&gt;Cursor → Check for Updates&lt;/code&gt;. Quit and relaunch — the model picker does not refresh until you do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Open the model picker.&lt;/strong&gt; In the chat panel: click the model name at the bottom of the prompt input. In an inline edit (&lt;code&gt;Cmd+K&lt;/code&gt; / &lt;code&gt;Ctrl+K&lt;/code&gt;): same dropdown, top-left of the floating editor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Select Composer 2.5.&lt;/strong&gt; Open the model picker and choose Composer 2.5. Cursor loads the Fast variant by default — if you want Standard, switch to it explicitly before you start. See Cursor's model docs for the exact picker labels in your version, since they have shifted between point releases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Default to Standard where you can.&lt;/strong&gt; For Background and Cloud Agent runs, &lt;code&gt;Settings → Models → Composer 2.5&lt;/code&gt; is where you set the Standard variant as the default — that one change is usually most of the bill. For interactive chats, Cursor still falls back to Fast at session start, so the practical habit is to flip to Standard at the top of any chat you expect to run long. The "Auto + Composer" usage pool counts both tiers, so the choice only affects per-token cost, not your plan bucket.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Optional — write a Cursor Rule for the repo.&lt;/strong&gt; Cursor rules live in &lt;code&gt;.cursor/rules/*.mdc&lt;/code&gt; with frontmatter (&lt;code&gt;description&lt;/code&gt;, &lt;code&gt;globs&lt;/code&gt;, &lt;code&gt;alwaysApply&lt;/code&gt;). They cannot pin a model, but they can nudge the agent's behavior. Example &lt;code&gt;.cursor/rules/composer.mdc&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Conventions for Composer 2.5 in this repo&lt;/span&gt;
&lt;span class="na"&gt;alwaysApply&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="s"&gt;Prefer Composer 2.5 Standard for refactors and long agent loops.&lt;/span&gt;
&lt;span class="s"&gt;Reserve Fast for tight inline edits where latency dominates cost.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the whole setup. There is no API key to paste, no endpoint to configure — Composer 2.5 runs only through Cursor's backend. If you want to use Composer 2.5 from a script, you go through the Cursor CLI agent, and that still routes through Cursor's auth.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to pick Composer 2.5 — and when not to
&lt;/h2&gt;

&lt;p&gt;Composer 2.5 is strong at one specific shape of work: &lt;strong&gt;medium-length agent loops inside Cursor's UI&lt;/strong&gt;, where the model is calling Cursor's tools (file edits, terminal, search) and reading back results. That is what the 25× synthetic task expansion was tuned for.&lt;/p&gt;

&lt;p&gt;It is weak, or at least not the cheapest option, in three cases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;One-shot architectural questions.&lt;/strong&gt; You want a 500-word design opinion on whether to extract a service, not a code change. Send it to Claude Opus 4.7 instead — it is better at this and you will spend a few cents, not a few dollars.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long, terminal-heavy work.&lt;/strong&gt; GPT-5.5 leads Terminal-Bench 2.0 by 13 points. If you are wiring up a deploy pipeline, GPT-5.4 Codex via Codex CLI is a real alternative.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code review and PR triage.&lt;/strong&gt; You are reading more than writing. Composer 2.5's Fast tier becomes a tax on reading. Use a cheaper model — Gemini 3.1 Flash or DeepSeek V4 Pro through a gateway — for the read pass, and reserve Composer 2.5 for the write pass.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A workflow several teams have settled into: Composer 2.5 Standard inside Cursor for inline edits and quick refactors, Claude Sonnet 4.6 (via Cursor's BYO path) for long agent runs that need stronger judgment, and Opus 4.7 (also BYO) for the genuinely hard architectural calls. We covered the BYO route in Cursor / Claude Code / Cline Custom API Setup — Composer 2.5 slots in next to those without conflict.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Kimi K2.5 connection no one talks about
&lt;/h2&gt;

&lt;p&gt;The base weights under Composer 2.5 are &lt;strong&gt;public&lt;/strong&gt;. Moonshot's Kimi K2.5 is open-source, and you can hit it directly via the Kimi API — usually at roughly 1/5 the price of Composer 2.5 Standard. We have a full breakdown in Kimi K2.5 API: Pricing, Access, and Honest Benchmarks, including the gap between vanilla K2.5 and Cursor's post-trained version.&lt;/p&gt;

&lt;p&gt;The gap matters. Cursor's 25× synthetic task RL adds something real — about 4–8 percentage points across our internal coding evals versus stock K2.5 — but it is not the magic the marketing suggests. If your use case is "long-horizon agent loops inside Cursor specifically," Composer 2.5 wins. If your use case is "give me a coding model I can hit from any client," stock K2.5 plus a thin agent harness gets you 90% of the way for a fraction of the cost.&lt;/p&gt;

&lt;p&gt;This is the case-by-case decision. There is no universal winner.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cursor-without-Cursor escape hatch
&lt;/h2&gt;

&lt;p&gt;For teams who want Cursor-style productivity but cannot stomach the Fast-tier pricing or the vendor lock-in, the practical answer is: keep Cursor for the editor, route the model traffic through a gateway.&lt;/p&gt;

&lt;p&gt;Cursor supports an "Override OpenAI Base URL" field in &lt;code&gt;Settings → Models&lt;/code&gt;. Point it at an aggregator that exposes Sonnet 4.6, GPT-5.4 Codex, Gemini 3.1 Pro, Kimi K2.5, and DeepSeek V4 Pro behind one OpenAI-format endpoint, and you can switch between them per-conversation without leaving Cursor. One caveat worth flagging up front: as of Cursor 3.5, the custom base URL is honored in the chat/planning panel (&lt;code&gt;Cmd/Ctrl + L&lt;/code&gt;) but not in the agent loop — Composer-style runs still go through Cursor's own backend. We document this pattern in AI API Aggregation: Access Every Model from One Endpoint — the same pattern works for Claude Code and Codex CLI.&lt;/p&gt;

&lt;p&gt;The split that has been working for most ofox.ai users on Cursor: Composer 2.5 Standard for the in-IDE agent flow, plus a BYO route for the heavy stuff. Total monthly bill stays well under $50 per developer, which is what Cursor cost before the Fast tier landed.&lt;/p&gt;

&lt;p&gt;For the broader question of which model to pick for which task across the whole 2026 landscape, our Best LLM for Coding (Ranked by Real Use) and the Claude vs GPT vs Gemini comparison pillar carry the full picture. Composer 2.5 belongs in the conversation now — but it is one option, not the option.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;Composer 2.5 is the best in-Cursor coding experience available today, and it is also the easiest model in 2026 to massively overpay for. Switch the default from Fast to Standard, pair it with a BYO route for the hard problems, and you get the upgrade without the bill shock.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://ofox.ai/blog/cursor-composer-2-5-setup-guide-2026/" rel="noopener noreferrer"&gt;ofox.ai/blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>cursor</category>
      <category>ide</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Claude Opus 4.7 Keeps Failing in Production: Workarounds and a Migration Plan to 4.8</title>
      <dc:creator>Owen</dc:creator>
      <pubDate>Fri, 29 May 2026 05:45:40 +0000</pubDate>
      <link>https://dev.to/owen_fox/claude-opus-47-keeps-failing-in-production-workarounds-and-a-migration-plan-to-48-17co</link>
      <guid>https://dev.to/owen_fox/claude-opus-47-keeps-failing-in-production-workarounds-and-a-migration-plan-to-48-17co</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR.&lt;/strong&gt; Claude Opus 4.7 logged two confirmed elevated-error windows on Anthropic's status page in the last week of May 2026 (May 22 and May 25), on top of a cluster of GitHub issues documenting a quality regression that appeared about a week after the April 16 launch — the same pattern Opus 4.6 hit in March. None of this is a deal-breaker on its own. It does mean every production system calling &lt;code&gt;claude-opus-4-7&lt;/code&gt; needs a retry strategy, a fallback model, and a migration plan to Opus 4.8 (released May 28, 2026 at the &lt;strong&gt;same $5/$25 price&lt;/strong&gt;, 69.2% on SWE-bench Pro vs 4.7's 64.3%). What follows is the pattern that survives both failure modes, plus the rollout checklist for switching the underlying model without rewriting your code.&lt;/p&gt;

&lt;p&gt;The real Opus 4.7 problem in May 2026 isn't that the model got worse. It's that the model got worse and the API got flakier and a better-priced replacement shipped, all in the same four-week window. You can't sit on a single-model assumption through that.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "Failing" Actually Means on Opus 4.7
&lt;/h2&gt;

&lt;p&gt;When developers say Opus 4.7 is "failing," they're usually conflating two unrelated things. Both are real, both need workarounds, but the fix is different for each.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Service-side failures&lt;/strong&gt; are the ones the status page admits. The model returns a 5xx, a 529 (overloaded), or times out before the response arrives. These are recoverable by retrying or routing elsewhere. Per Anthropic's &lt;a href="https://status.claude.com/" rel="noopener noreferrer"&gt;status page&lt;/a&gt;, Opus 4.7 had elevated error rates on May 22, 2026 (alongside Sonnet 4.6) and again from 06:30 to 10:30 UTC on May 25, 2026. Both windows were resolved without explanation beyond "investigating" → "monitoring" → "resolved."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model-side regressions&lt;/strong&gt; are the ones the status page doesn't mention. The API returns 200, but the answer is worse than what 4.7 was returning the week it launched. This is the &lt;a href="https://github.com/anthropics/claude-code/issues/53459" rel="noopener noreferrer"&gt;GitHub issue #53459&lt;/a&gt; pattern: sharp launch-week reasoning quality, then a silent slide within days toward what users describe as "Sonnet 4-level" behavior — surface pattern matching instead of architectural reasoning, walking back proposals without integrating objections, dropping &lt;code&gt;CLAUDE.md&lt;/code&gt; instructions across consecutive turns. &lt;a href="https://github.com/anthropics/claude-code/issues/51440" rel="noopener noreferrer"&gt;Issue #51440&lt;/a&gt; frames it as "worse quality at higher token cost vs 4.6 for production coding workloads." &lt;a href="https://github.com/anthropics/claude-code/issues/52149" rel="noopener noreferrer"&gt;Issue #52149&lt;/a&gt; reports the effort setting silently downgrading mid-session even with thinking explicitly ON.&lt;/p&gt;

&lt;p&gt;These cannot be retried away. Retrying gets you a different bad answer.&lt;/p&gt;

&lt;p&gt;The third compounding factor: Opus 4.7 ships a new tokenizer that produces up to ~35% more tokens than 4.6 for the same prompt (see our &lt;a href="https://ofox.ai/blog/claude-max-throttling-may-2026/" rel="noopener noreferrer"&gt;Claude Max throttling postmortem&lt;/a&gt; for the receipts). So even when the model behaves, the per-task bill goes up. Combine that with a quality slide, and 4.7 in mid-May is genuinely returning fewer correct answers per dollar than the model it replaced.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Specific Incidents (and What They Tell You)
&lt;/h2&gt;

&lt;p&gt;Three datapoints anchor the timeline, in order. Worth listing them explicitly because most reliability discussions hand-wave through "the regression":&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;April 16, 2026.&lt;/strong&gt; Anthropic adds a system prompt instruction to reduce verbosity on first-party surfaces (Claude.ai, Claude Code). This combines with other prompt changes to hurt coding quality. Reverted on April 20 per the &lt;a href="https://www.anthropic.com/engineering/april-23-postmortem" rel="noopener noreferrer"&gt;April 23 postmortem&lt;/a&gt;. API consumers calling &lt;code&gt;claude-opus-4-7&lt;/code&gt; directly were not affected by the system prompt, but the postmortem confirms first-party surfaces saw the regression.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Roughly one week after the April 16 GA.&lt;/strong&gt; The GitHub issue cluster begins — #53459 ("quality regression. Same pattern as 4.6 launch week degradation"), #51440, #52149. The pattern users describe is consistent across issues: launch-week 4.7 was excellent, week-two 4.7 was meaningfully worse. The issues request confirmation of serving-side changes (quantization, routing, speculative-decoding aggressiveness); Anthropic has not publicly confirmed any.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;May 22 and May 25, 2026.&lt;/strong&gt; Two elevated-error-rate incidents on the status page, both primarily affecting Opus 4.7. Neither was correlated by Anthropic with the model regression — they read as straightforward infrastructure overload during a high-demand week, possibly tied to the Opus 4.8 staging traffic.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What the timeline tells you: &lt;strong&gt;the failure modes are independent&lt;/strong&gt;. A retry strategy survives the May 22/25 incidents but does nothing for the silent regression. A migration plan survives the regression but is overkill for an hour of 5xx. Production needs both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workarounds That Survive Both Failure Modes
&lt;/h2&gt;

&lt;p&gt;Here is the minimum viable pattern. None of this is new; it is just what the Opus 4.7 situation forces you to actually deploy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1 — Retry with backoff for 5xx and 529
&lt;/h3&gt;

&lt;p&gt;The 529 overloaded code is the one Anthropic uses during incidents like May 22. It is genuinely transient; a 60-second backoff with three retries clears it most of the time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;APIStatusError&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_with_retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;APIStatusError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;502&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;504&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;529&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 30s, 60s, 120s
&lt;/span&gt;                &lt;span class="k"&gt;continue&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three retries with exponential backoff is enough for ~95% of the May 2026 incident windows. Going beyond three is not free — the next layer (fallback to another model) is more useful than a fourth retry.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2 — Fallback chain across models
&lt;/h3&gt;

&lt;p&gt;This is the layer that converts an Opus 4.7 outage into a brief quality degradation instead of a customer-facing failure. The cleanest implementation talks to a single OpenAI-compatible endpoint and swaps model IDs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;APIStatusError&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.ofox.ai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-ofox-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;FALLBACK_CHAIN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic/claude-opus-4.8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic/claude-opus-4.7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic/claude-sonnet-4.6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_with_fallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;FALLBACK_CHAIN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;APIStatusError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;502&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;504&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;529&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="k"&gt;raise&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all models in fallback chain failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why put 4.8 first now: it scores higher on every published benchmark, costs the same, and wasn't the primary serving target during either May incident. Leaving 4.7 second gives you a sibling fallback inside the Claude family. Sonnet 4.6 third is the cross-tier emergency exit. It is slower and weaker than either Opus tier, but it stays up when both wobble at once.&lt;/p&gt;

&lt;p&gt;If you cannot move off the Anthropic-native SDK yet, ofox also exposes an &lt;a href="https://ofox.ai/blog/claude-opus-4-7-api-review-upgrade-guide-2026/" rel="noopener noreferrer"&gt;Anthropic-compatible endpoint&lt;/a&gt; at &lt;code&gt;https://api.ofox.ai/anthropic&lt;/code&gt; so you only need to change the base URL, not the SDK.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3 — Quality canaries for the silent regression
&lt;/h3&gt;

&lt;p&gt;Retries and fallbacks do nothing for the issue-#53459 case. The only thing that catches a silent regression is a canary: a small set of known-answer prompts you replay against the model on a schedule and score automatically.&lt;/p&gt;

&lt;p&gt;Three prompts is enough to start. Pick one architectural-reasoning task, one multi-turn instruction-following task, and one tool-call task. Run them daily and alarm on a 2-sigma drop in your pass rate. Anthropic does not publish a "quality has regressed" signal; you have to generate your own. The April 16 to April 23 regression would have surfaced in three days on a daily canary.&lt;/p&gt;

&lt;p&gt;This is the same logic as a synthetic uptime check, applied to model behavior instead of HTTP 200s.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Migration Plan to Opus 4.8
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://ofox.ai/blog/claude-opus-4-8-release-review-2026/" rel="noopener noreferrer"&gt;Opus 4.8 shipped on May 28, 2026&lt;/a&gt; at the identical $5 per million input / $25 per million output list price as 4.7. The model ID is &lt;code&gt;anthropic/claude-opus-4.8&lt;/code&gt;. On the published benchmarks it is meaningfully ahead: SWE-bench Pro 69.2% vs 4.7's 64.3%, OSWorld-Verified 83.4% vs 82.8%, and 1890 Elo on Artificial Analysis's GDPval-AA real-work leaderboard (+137 over 4.7). Anthropic also claims 4.8 is four times less likely than 4.7 to miss flaws in code it produces, and uses roughly 35% fewer output tokens per agentic task. That second number matters more than the leaderboard one for most teams, because Opus 4.7's tokenizer is the thing that inflated those tokens in the first place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When migrating is the right call.&lt;/strong&gt; Almost always, with one exception: if your prompts depend on Opus 4.7's specific tool-call hesitancy — e.g., you rely on 4.7 &lt;em&gt;not&lt;/em&gt; invoking a tool unless conditions are unambiguous — then 4.8's more eager tool calling will fire calls you didn't expect. Run a representative sample before flipping traffic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The behavioral diff you actually need to test for.&lt;/strong&gt; Three things changed materially between 4.7 and 4.8:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Effort default is now "high"&lt;/strong&gt; on all surfaces, including Claude Code. If you were leaning on 4.7's medium default, you'll see longer responses and more thinking tokens until you set effort explicitly.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Less skipping of required tool calls.&lt;/strong&gt; Net positive for agents, but it can surface bugs in tool schemas that 4.7 was politely ignoring.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Extended thinking budgets are still unsupported&lt;/strong&gt; — use adaptive thinking. Same as 4.7, but worth re-confirming if you tried to hardcode a thinking budget.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Rollout pattern.&lt;/strong&gt; Don't flip 100% of traffic at once. The cleanest pattern uses the fallback chain you already have:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Add &lt;code&gt;anthropic/claude-opus-4.8&lt;/code&gt; to your fallback chain as a &lt;em&gt;secondary&lt;/em&gt; for one day. Capture how often it gets called and how the responses compare.&lt;/li&gt;
&lt;li&gt; Promote 4.8 to primary for 10% of traffic (deterministic hash on session ID or tenant ID, so a given user has a consistent experience). Run your canaries against both.&lt;/li&gt;
&lt;li&gt; Roll forward to 50%, then 100% over a week. Keep 4.7 in the chain as a sibling fallback until at least mid-June.&lt;/li&gt;
&lt;li&gt; After June 15, replace any references to Opus 4.6 in your fallback chain with Sonnet 4.6 — 4.6 is being deprecated on the direct Anthropic API.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The deprecation date is the only hard schedule constraint here. Everything else is your call on how aggressively to move.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Reference Failover Pattern via ofox
&lt;/h2&gt;

&lt;p&gt;If you want the whole pattern in one piece — retry + fallback chain + cross-vendor escape hatch — here is the shape it takes through an aggregator. The point is not that the aggregator makes Opus 4.7 more reliable; it doesn't. The point is that your fallback chain becomes a config change instead of a deploy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;APIStatusError&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.ofox.ai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-ofox-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;CHAIN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic/claude-opus-4.8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;# primary
&lt;/span&gt;         &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic/claude-opus-4.7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;# sibling fallback
&lt;/span&gt;         &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai/gpt-5.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                   &lt;span class="c1"&gt;# cross-vendor escape
&lt;/span&gt;         &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic/claude-sonnet-4.6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;      &lt;span class="c1"&gt;# capacity floor
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;robust_completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_retries_per_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;CHAIN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_retries_per_model&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;APIStatusError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;502&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;504&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;529&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                    &lt;span class="k"&gt;raise&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;max_retries_per_model&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all models exhausted&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things this gets you that a direct Anthropic integration doesn't, in the context of the May 2026 Opus 4.7 problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;One key, one SDK&lt;/strong&gt; exposing Opus 4.7, Opus 4.8, and a cross-vendor fallback. See our &lt;a href="https://ofox.ai/blog/ai-api-aggregation-access-every-model-one-endpoint/" rel="noopener noreferrer"&gt;API aggregation guide&lt;/a&gt; for the broader rationale.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;A vendor-independent escape hatch.&lt;/strong&gt; When both Opus tiers wobble at once (as happened May 22), having GPT-5.5 or Gemini 3.1 Pro in the chain matters. Pricing on the cross-vendor models is in our &lt;a href="https://ofox.ai/blog/gpt-5-5-api-vs-claude-opus-gemini-3-1-flagship-2026/" rel="noopener noreferrer"&gt;flagship comparison&lt;/a&gt; and &lt;a href="https://ofox.ai/blog/claude-vs-gpt-vs-gemini-model-comparison-guide-2026/" rel="noopener noreferrer"&gt;model comparison guide&lt;/a&gt;. The procurement loop required to add a second billing relationship during an active outage is its own reliability story; the gateway makes it a config change instead.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Honest framing of what this pattern is and isn't. It helps on May 22 and May 25 the same way it helps during any single-vendor incident. It does not catch the issue-#53459 regression — that's what the canary in step 3 is for. Same logic as the &lt;a href="https://ofox.ai/blog/claude-code-hybrid-routing-pattern-2026/" rel="noopener noreferrer"&gt;hybrid routing pattern&lt;/a&gt; for Claude Code itself: the routing is the part you control, the upstream model is the part you can't.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Actually Do This Week
&lt;/h2&gt;

&lt;p&gt;If you're running production traffic on &lt;code&gt;claude-opus-4-7&lt;/code&gt; today and reading this on May 29, 2026, the order of operations is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Today&lt;/strong&gt; — add retries on 5xx and 529 if you don't have them. This alone covers ~95% of recent incidents.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;This week&lt;/strong&gt; — stand up the fallback chain. &lt;code&gt;claude-opus-4-8 → claude-opus-4-7 → claude-sonnet-4.6&lt;/code&gt; is a reasonable starting point.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;This week&lt;/strong&gt; — add three canary prompts on a daily schedule. You won't catch the next regression without them.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Next two weeks&lt;/strong&gt; — run a 10% canary on Opus 4.8, watch the canary scores, then promote.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Before June 15&lt;/strong&gt; — replace any hard-coded references to Opus 4.6 on the Anthropic-direct API. Either pin through an aggregator that retains older versions or move that slot to Sonnet 4.6.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For deeper background on what changed between 4.6 and 4.7 specifically, see our &lt;a href="https://ofox.ai/blog/claude-opus-4-7-api-review-upgrade-guide-2026/" rel="noopener noreferrer"&gt;Opus 4.7 API review&lt;/a&gt; and the &lt;a href="https://ofox.ai/blog/claude-api-pricing-complete-breakdown-2026/" rel="noopener noreferrer"&gt;Claude API pricing breakdown&lt;/a&gt;. For the broader pattern of which model wins which workload, the &lt;a href="https://ofox.ai/blog/best-llm-for-coding-ranked-real-use-2026/" rel="noopener noreferrer"&gt;best LLM for coding ranked by real use&lt;/a&gt; post is the one to read after this.&lt;/p&gt;

&lt;p&gt;The most expensive Opus 4.7 production failure isn't the one you noticed — it's the one your retry strategy turned into a 200 OK with a worse answer, and no canary to flag it.&lt;/p&gt;

&lt;p&gt;Reliability work in 2026 is not "pick the best model." It is "pick a fallback chain, instrument it, and notice when the model under it gets quietly worse." Opus 4.7 is just the model that made that lesson concrete.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://ofox.ai/blog/claude-opus-4-7-production-reliability-fix-2026/" rel="noopener noreferrer"&gt;ofox.ai/blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>production</category>
      <category>reliability</category>
    </item>
    <item>
      <title>Claude Opus 4.8: Benchmarks, Fast Mode, and What Actually Changed</title>
      <dc:creator>Owen</dc:creator>
      <pubDate>Thu, 28 May 2026 18:29:04 +0000</pubDate>
      <link>https://dev.to/owen_fox/claude-opus-48-benchmarks-fast-mode-and-what-actually-changed-24f0</link>
      <guid>https://dev.to/owen_fox/claude-opus-48-benchmarks-fast-mode-and-what-actually-changed-24f0</guid>
      <description>&lt;h2&gt;
  
  
  Claude Opus 4.8: Benchmarks, Fast Mode, and What Actually Changed
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — Anthropic shipped Claude Opus 4.8 on May 28, 2026, at the same $5/$25 price as 4.7. It tops Artificial Analysis's GDPval-AA real-work leaderboard at 1890 Elo (+121 over GPT-5.5, +137 over 4.7), hits 69.2% on SWE-bench Pro, and does it using ~35% fewer output tokens than 4.7.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Anthropic Shipped
&lt;/h3&gt;

&lt;p&gt;Claude Opus 4.8 launched May 28, 2026, maintaining the same list price as Opus 4.7: $5 per million input, $25 per million output. 1M-token context window by default on the Claude API (200K on Microsoft Foundry), 128K max output tokens.&lt;/p&gt;

&lt;p&gt;The key differentiator is that it achieves superior performance while reducing token consumption compared to its predecessor.&lt;/p&gt;

&lt;h3&gt;
  
  
  The GDPval-AA Result
&lt;/h3&gt;

&lt;p&gt;Opus 4.8 (max effort) debuts at 1890 Elo, pulling 121 points clear of GPT-5.5 in second place and +137 over its own predecessor.&lt;/p&gt;

&lt;p&gt;Independent evaluation from Artificial Analysis tested models on real economic work tasks across 44 occupations, providing each with shell access and web browsing capabilities within an agentic loop.&lt;/p&gt;

&lt;p&gt;Opus 4.8 reached this score using 15% fewer turns and 35% fewer output tokens per task than Opus 4.7.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benchmarks vs. the Field
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Opus 4.8&lt;/th&gt;
&lt;th&gt;Opus 4.7&lt;/th&gt;
&lt;th&gt;GPT-5.5&lt;/th&gt;
&lt;th&gt;Gemini 3.1 Pro&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SWE-bench Pro&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;69.2%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;64.3%&lt;/td&gt;
&lt;td&gt;58.6%&lt;/td&gt;
&lt;td&gt;54.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OSWorld-Verified (computer use)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;83.4%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;82.8%&lt;/td&gt;
&lt;td&gt;78.7%&lt;/td&gt;
&lt;td&gt;76.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Terminal-Bench 2.1&lt;/td&gt;
&lt;td&gt;74.6%&lt;/td&gt;
&lt;td&gt;66.1%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;78.2%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;70.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Humanity's Last Exam (with tools)&lt;/td&gt;
&lt;td&gt;57.9%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Finance Agent v2&lt;/td&gt;
&lt;td&gt;53.9%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GDPval-AA (Elo)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1890&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1753&lt;/td&gt;
&lt;td&gt;1769&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;GPT-5.5 still wins Terminal-Bench 2.1 (78.2% vs 74.6%). If your workload is heavy on raw terminal command sequences, that's a real data point, not a rounding error.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's New Under the Hood
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Fast Mode.&lt;/strong&gt; A research preview that serves the same Opus 4.8 model at up to 2.5x higher output tokens per second, at premium pricing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mid-conversation system messages.&lt;/strong&gt; Users can now insert system messages after user turns, preserving prompt-cache hits on earlier turns and reducing input costs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Adaptive thinking, effort default &lt;code&gt;high&lt;/code&gt;.&lt;/strong&gt; Use &lt;code&gt;thinking: {"type": "adaptive"}&lt;/code&gt; and the &lt;code&gt;effort&lt;/code&gt; parameter instead of extended thinking budgets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Better tool triggering and compaction.&lt;/strong&gt; Improvements in long-horizon agentic coding with fewer compactions and better recovery.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompting Opus 4.8: What Actually Changed
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Effort is now the main dial.&lt;/strong&gt; Start at &lt;code&gt;xhigh&lt;/code&gt; for coding and agentic use cases, and keep a minimum of &lt;code&gt;high&lt;/code&gt; for anything intelligence-sensitive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It follows instructions literally.&lt;/strong&gt; The model won't silently generalize instructions or infer unstated requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It favors reasoning over tool calls.&lt;/strong&gt; Raising effort to &lt;code&gt;high&lt;/code&gt;/&lt;code&gt;xhigh&lt;/code&gt; produces substantially more tool use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The code-review recall trap.&lt;/strong&gt; Opus 4.8 is genuinely better at finding bugs (higher precision and recall in Anthropic's evals), but if your review harness says "only report high-severity issues" or "be conservative," 4.8 follows that more faithfully than older models.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "Most Honest" Claim
&lt;/h3&gt;

&lt;p&gt;Anthropic positions Opus 4.8 as having fewer confident fabrications, less sycophancy, and clearer refusals.&lt;/p&gt;

&lt;h3&gt;
  
  
  Launched Alongside: Dynamic Workflows in Claude Code
&lt;/h3&gt;

&lt;p&gt;Dynamic workflows let Claude orchestrate tens to hundreds of parallel subagents in a single session.&lt;/p&gt;

&lt;p&gt;The featured example involved porting Bun from Zig to Rust — roughly 750,000 lines of code, with a 99.8% test-suite pass rate, in 11 days.&lt;/p&gt;

&lt;p&gt;Two limitations: it's plan-gated (dynamic workflows run on Claude Code Max, Team, and Enterprise plans), and token consumption is substantially higher than a normal session.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Access Opus 4.8 via ofox.ai
&lt;/h3&gt;

&lt;p&gt;The model ID is &lt;code&gt;anthropic/claude-opus-4.8&lt;/code&gt;, accessible through the same OpenAI-compatible endpoint with no separate billing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.ofox.ai/anthropic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-ofox-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;thinking&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;adaptive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Audit this service for race conditions...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Verdict
&lt;/h3&gt;

&lt;p&gt;Opus 4.8 is the rare upgrade with no asterisk on price: same $5/$25, higher scores across coding and computer-use, top of the independent real-work leaderboard, and fewer output tokens per task.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://ofox.ai/blog/claude-opus-4-8-release-review-2026/" rel="noopener noreferrer"&gt;ofox.ai/blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>anthropic</category>
      <category>claude</category>
      <category>benchmarks</category>
    </item>
    <item>
      <title>Codex Mobile App: Monitor &amp; Control Your AI Coding Agent from iPhone or Android (2026)</title>
      <dc:creator>Owen</dc:creator>
      <pubDate>Thu, 28 May 2026 14:16:09 +0000</pubDate>
      <link>https://dev.to/owen_fox/codex-mobile-app-monitor-control-your-ai-coding-agent-from-iphone-or-android-2026-3ah</link>
      <guid>https://dev.to/owen_fox/codex-mobile-app-monitor-control-your-ai-coding-agent-from-iphone-or-android-2026-3ah</guid>
      <description>&lt;h1&gt;
  
  
  Codex Mobile App: Monitor &amp;amp; Control Your AI Coding Agent from iPhone or Android (2026)
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; On May 14, 2026, OpenAI rolled Codex into the ChatGPT mobile app on iOS and Android, available in preview on every plan including Free. You scan a QR code from your Mac, and then you can review diffs, approve commands, switch models, and dispatch new tasks from your phone. The worker process still has to be macOS — Windows support is promised but undated.&lt;/p&gt;

&lt;p&gt;Three months after Anthropic put Claude Code in your pocket, OpenAI shipped the same idea — except Codex Mobile only talks to a Mac. If you live in Windows or Linux, you are watching from the bleachers until "soon."&lt;/p&gt;

&lt;p&gt;Codex itself is no longer a small experiment. OpenAI says more than four million developers use it weekly, and the missing piece for most of them was not raw capability — it was a way to glance at a running task while away from the desk. The mobile launch closes that gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Codex Mobile actually does
&lt;/h2&gt;

&lt;p&gt;Codex Mobile is a control surface for a Codex worker that runs somewhere else. The "somewhere else" is currently a Mac — your desktop, a Mac mini, or a remote Mac you have SSH'd into — and the phone is the window onto it. You do not run the model on the device. The device runs the &lt;em&gt;review&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Concretely, OpenAI describes the experience this way: &lt;em&gt;"From your phone, you can work across all of your threads, review outputs, approve commands, change models, or start something new."&lt;/em&gt; In practice that means four things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Live observation.&lt;/strong&gt; Screenshots, terminal output, diffs, test results, and approval prompts stream into the app in real time, with the worker's permissions and credentials staying on the host machine.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Approvals.&lt;/strong&gt; When Codex hits a command that needs sign-off — running migrations, touching production config, deleting files — you get a card you can approve or reject from the lock screen.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Dispatch.&lt;/strong&gt; You can open a new thread, attach context, choose between models (including the gpt-5.3-codex variant available through OpenAI's own product surface), and start work without going back to the Mac.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Thread continuity.&lt;/strong&gt; Whatever Codex is doing on your desktop is reachable on your phone, and vice versa, including across multiple concurrent sessions.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is not a tiny IDE in your pocket. It is the &lt;em&gt;manager view&lt;/em&gt; of a coding agent — the part that used to require you to be back at your desk to unblock the agent on a permission prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup: from QR code to first approved diff
&lt;/h2&gt;

&lt;p&gt;The pairing flow is deliberately frictionless because it had to be — most developers will try this once on a coffee break, and a bad first run kills the habit.&lt;/p&gt;

&lt;p&gt;On your Mac you launch Codex in whatever surface you already use (CLI, the desktop app, or the Chrome extension that shipped alongside it). Mobile linking lives in the same Codex settings panel. It prints a QR code. You open ChatGPT on the phone, point the camera at the screen, and the two halves of the same Codex session are now talking. There is no separate account, no extra API key, no provisioning step.&lt;/p&gt;

&lt;p&gt;After that, every thread on the Mac is reachable on the phone within seconds. If you start a long-running refactor before walking to lunch, the agent's progress arrives as notifications — and any permission prompts surface as actionable cards.&lt;/p&gt;

&lt;p&gt;It is worth noting how much of this design borrows from the Codex CLI workflow that has been the daily driver for a year now. The mental model is the same — agent runs, you supervise, you approve — but the supervisor seat has moved off the chair and onto the phone.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you can actually do from your phone
&lt;/h2&gt;

&lt;p&gt;Mobile is fine for a surprising amount of real work, and bad for a few things. The honest list, after using it for two weeks:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Genuinely useful on mobile:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Approving migrations the agent has staged but won't run without sign-off.&lt;/li&gt;
&lt;li&gt;  Reading a generated test failure and asking Codex to try a different fix.&lt;/li&gt;
&lt;li&gt;  Watching a long build or test suite without keeping your laptop open.&lt;/li&gt;
&lt;li&gt;  Spinning up a fresh thread from a screenshot or a bug report someone pasted in Slack.&lt;/li&gt;
&lt;li&gt;  Switching the model mid-task — for example bumping a thread from a cheaper variant up to the strongest coding model when you realize the work is harder than expected.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Painful on mobile:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Anything that requires reading more than ~80 lines of diff at a time. The pinch-to-zoom on a code block is workable, not pleasant.&lt;/li&gt;
&lt;li&gt;  Heavy multi-file refactor &lt;em&gt;planning&lt;/em&gt;. You will want a real screen for that.&lt;/li&gt;
&lt;li&gt;  Anything that requires you to type more than a paragraph of clarification. Voice input helps, but only somewhat.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have spent any time choosing between Claude Code, Codex, Cursor, and DeepSeek-TUI, you already know the right framing here: each tool occupies a slightly different point in the &lt;em&gt;who-watches-the-agent&lt;/em&gt; design space. Mobile pushes Codex further into the "asynchronous, supervisor-style" corner.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest limitations today
&lt;/h2&gt;

&lt;p&gt;The preview ships with three real ceilings, and pretending otherwise would be silly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It only talks to a Mac.&lt;/strong&gt; Today, the worker has to be running on macOS. Windows is explicitly listed as "coming soon" by OpenAI, with no committed date. Linux is not mentioned at all in the launch material. If your daily driver is a Linux workstation, the only honest workaround right now is a Mac mini sitting somewhere on your network — which is not a small ask just for mobile parity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Network dependence is harder than it looks on paper.&lt;/strong&gt; The mobile experience leans on a persistent, low-latency connection to your worker host. Spotty café Wi-Fi turns "approve and continue" into "wait 9 seconds, approve, wait 7 seconds, see if it worked." Worth knowing before you plan a flight around it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;There is no offline mode.&lt;/strong&gt; If your Mac is asleep or the host is down, the mobile app shows you the last known state and nothing else. Set your Mac to never sleep, or use Remote SSH with a server you control. (Remote SSH and Hooks both ship on every plan, per OpenAI's launch notes.)&lt;/p&gt;

&lt;p&gt;These are not deal-breakers, but they are the difference between "I can use this on a real day" and "I demoed this once at a conference."&lt;/p&gt;

&lt;h2&gt;
  
  
  How it compares to Claude Code Remote Control
&lt;/h2&gt;

&lt;p&gt;OpenAI is not first to this idea. Anthropic shipped Remote Control for Claude Code in February 2026, with broadly the same shape — phone observes, desktop works, approvals arrive as cards.&lt;/p&gt;

&lt;p&gt;The differences that matter day to day:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Codex Mobile (May 2026)&lt;/th&gt;
&lt;th&gt;Claude Code Remote Control (Feb 2026)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Worker OS&lt;/td&gt;
&lt;td&gt;macOS only&lt;/td&gt;
&lt;td&gt;macOS, Linux, Windows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup&lt;/td&gt;
&lt;td&gt;QR code from ChatGPT app&lt;/td&gt;
&lt;td&gt;Session URL + QR code from &lt;code&gt;claude remote-control&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free tier&lt;/td&gt;
&lt;td&gt;Yes (Free + Go plans)&lt;/td&gt;
&lt;td&gt;No (requires Claude Pro or higher)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Push approvals&lt;/td&gt;
&lt;td&gt;Yes, lock-screen cards&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model switching mid-task&lt;/td&gt;
&lt;td&gt;Yes (between OpenAI's Codex-family models)&lt;/td&gt;
&lt;td&gt;Yes (between Sonnet, Opus, Haiku)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-thread view&lt;/td&gt;
&lt;td&gt;Yes, across desktop and mobile&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voice dispatch&lt;/td&gt;
&lt;td&gt;Indirect (via ChatGPT voice mode)&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Anthropic got there first with the cleaner cross-platform story. OpenAI replied with broader plan coverage (Free tier) and a narrower host requirement. Mobile is now table stakes for any serious coding agent. The choice has shifted from &lt;em&gt;whether&lt;/em&gt; you adopt the pattern to which agent you trust enough to hand the lock-screen approvals to.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where ofox.ai actually fits — and where it doesn't
&lt;/h2&gt;

&lt;p&gt;Worth being precise about, because the question comes up every time a new ChatGPT feature drops.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Codex Mobile is not something you can wire to ofox.ai.&lt;/strong&gt; It is the ChatGPT product, tied to your OpenAI account, billed through OpenAI's plans. There is no "use my API key instead" toggle. The mobile experience is a feature of the OpenAI consumer surface, not a model endpoint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where ofox.ai is genuinely useful around this:&lt;/strong&gt; the same gpt-5.3-codex model that powers Codex inside ChatGPT is also accessible as an OpenAI-compatible API endpoint through ofox.ai's unified gateway. That matters if you want to build a &lt;em&gt;different&lt;/em&gt; mobile experience — your own internal tool, a Telegram bot, a Slack workflow — where Codex-quality coding sits behind your own UI and you do not want to be locked to ChatGPT as the only client. The model is the same; what's different is who owns the wrapper.&lt;/p&gt;

&lt;p&gt;If you are running Codex CLI through a custom provider configured for ofox.ai, the new mobile app does not affect your setup at all. Your CLI keeps doing what it does; the mobile feature is a parallel surface that just happens to share branding.&lt;/p&gt;

&lt;h2&gt;
  
  
  When mobile monitoring actually helps
&lt;/h2&gt;

&lt;p&gt;Some honest use cases from the first two weeks of using this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Long migrations.&lt;/strong&gt; Kick off a database refactor before a 90-minute meeting, approve the staged commands during the break, walk back to a green test suite.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Cross-timezone handoff.&lt;/strong&gt; Leave a Codex task running overnight, approve any blockers from a phone over breakfast.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Code review while commuting.&lt;/strong&gt; Reviewing the diffs Codex generated on an earlier task — not as good as on a 27-inch screen, but good enough to clear small ones.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Production approvals.&lt;/strong&gt; When a deploy script needs explicit human sign-off, getting that sign-off without having to be at your desk.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What it is &lt;em&gt;not&lt;/em&gt; good for: writing the actual code. If you are using your phone to type instructions longer than two sentences, you are using the wrong tool. The mental model is "the agent works, you supervise" — and supervision is exactly the kind of work that a phone is genuinely fine for.&lt;/p&gt;

&lt;p&gt;Your phone isn't replacing your IDE. It is replacing the chair you would otherwise have to sit in while the agent runs — and the developers who learn to trust that asynchronous loop will ship more than the ones still tethered to a desk.&lt;/p&gt;

&lt;p&gt;Agentic coding has finally outgrown the assumption that the human has to be physically present while the agent works. That assumption was always a bit silly — nobody sits next to a CI runner watching it spin — and Codex Mobile is the first OpenAI product that admits it on the product surface. Whether you adopt it now or wait for Windows parity, the asynchronous loop it normalizes is where most coding work is heading.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://openai.com/index/work-with-codex-from-anywhere/" rel="noopener noreferrer"&gt;OpenAI: Work with Codex from anywhere&lt;/a&gt; — official launch post, May 14, 2026&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://techcrunch.com/2026/05/14/openai-says-codex-is-coming-to-your-phone/" rel="noopener noreferrer"&gt;TechCrunch: OpenAI says Codex is coming to your phone&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://9to5mac.com/2026/05/14/openai-brings-codex-control-to-chatgpt-for-iphone-and-android/" rel="noopener noreferrer"&gt;9to5Mac: OpenAI brings Codex to ChatGPT for iPhone and Android&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://www.engadget.com/2173235/openai-brings-its-codex-coding-app-to-mobile/" rel="noopener noreferrer"&gt;Engadget: OpenAI brings its Codex coding app to mobile&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://ofox.ai/blog/codex-mobile-app-iphone-android-2026/" rel="noopener noreferrer"&gt;ofox.ai/blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>codex</category>
      <category>openai</category>
      <category>mobile</category>
    </item>
    <item>
      <title>Codex Goal Mode &amp; Remote Computer Use: How OpenAI's Agent Can Code for Days</title>
      <dc:creator>Owen</dc:creator>
      <pubDate>Thu, 28 May 2026 02:23:33 +0000</pubDate>
      <link>https://dev.to/owen_fox/codex-goal-mode-remote-computer-use-how-openais-agent-can-code-for-days-5gel</link>
      <guid>https://dev.to/owen_fox/codex-goal-mode-remote-computer-use-how-openais-agent-can-code-for-days-5gel</guid>
      <description>&lt;h1&gt;
  
  
  Codex Goal Mode &amp;amp; Remote Computer Use: How OpenAI's Agent Can Code for Days
&lt;/h1&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;On May 21, 2026, OpenAI moved two Codex features to general availability: &lt;strong&gt;Goal Mode&lt;/strong&gt; (a persistent &lt;code&gt;/goal&lt;/code&gt; directive that survives session breaks and budget resets) and &lt;strong&gt;Locked Computer Use&lt;/strong&gt; (the desktop agent continues driving Mac apps after screen lock). Combined with &lt;code&gt;gpt-5.3-codex&lt;/code&gt; and verifiable success criteria, engineers can delegate real objectives like "ship the v2 checkout endpoint with the benchmark green" and walk away. The breakthrough isn't longer prompts—a coding agent now treats time as a budgetable resource rather than something requiring constant supervision.&lt;/p&gt;

&lt;p&gt;Both features shipped in Codex CLI 0.133.0 and matching IDE and desktop builds. After a week running Goal Mode against production repositories, the gap between demos and practical utility depends on how the goal is structured, not patience levels.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Goal Mode Actually Changes About Your Prompt
&lt;/h2&gt;

&lt;p&gt;Goal Mode replaces per-turn instructions with a persistent objective that Codex re-evaluates each cycle. The command interface is minimal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Set or replace the active goal
/goal Reduce p95 checkout latency below 120 ms on the checkout
      benchmark while keeping the correctness suite green

/goal           # view current goal
/goal pause     # stop the loop, keep the state
/goal resume    # pick back up where it stopped
/goal clear     # discard the goal entirely
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Goal structure matters more than wording. The OpenAI cookbook recommends: &lt;code&gt;&amp;lt;desired end state&amp;gt; verified by &amp;lt;specific evidence&amp;gt; while preserving &amp;lt;constraints&amp;gt;&lt;/code&gt;—three mandatory slots in that order.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Fails vs. What Works
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Ineffective:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/goal Make the code more elegant
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Effective:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/goal Migrate this codebase from Pydantic v1 to v2, verified by
      `pytest -q` exiting 0 and `mypy --strict src/` exiting 0,
      while preserving all public API signatures listed in
      docs/public_api.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second version gives Codex measurable targets. The agent writes, runs the suite, reads diffs between expected and actual, revises, and stops when both commands exit zero—or surfaces blockers it cannot overcome.&lt;/p&gt;

&lt;p&gt;Stopping conditions are explicit: success, &lt;code&gt;/goal pause&lt;/code&gt;, &lt;code&gt;/goal clear&lt;/code&gt;, user interruption, a repeated unresolvable blocker, or usage limit exhaustion. Nothing else terminates the loop, making verifiable success criteria more critical than before—without them, the loop only stops on cost constraints.&lt;/p&gt;

&lt;h2&gt;
  
  
  "Code for Days" Means Something Specific
&lt;/h2&gt;

&lt;p&gt;The phrase "code for days" doesn't mean one continuous uninterrupted session. Goal Mode persists objectives across:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Session breaks&lt;/strong&gt;: Close the terminal, return tomorrow, run &lt;code&gt;/goal resume&lt;/code&gt;, and the agent continues from the last verified state&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token budget resets&lt;/strong&gt;: When rolling budgets roll over (daily for most plans), the active goal survives and work continues&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interruptions&lt;/strong&gt;: Ctrl-C, app crashes, Mac restarts—the goal is journaled to disk; Codex 0.133+ rehydrates it on next launch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This creates a multi-session objective layer. A migration consuming three afternoons of one-shot prompts now runs as one coherent thread. The cost model remains unchanged: every reasoning turn costs the same per-token rate against &lt;code&gt;gpt-5.3-codex&lt;/code&gt;. The coordination cost drops to nearly zero, where most wall-clock savings originate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-World Testing
&lt;/h3&gt;

&lt;p&gt;Testing against a production repo migration (Pydantic v1 → v2 on a 14k-line internal service) showed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total wall time: approximately 31 hours across four sessions&lt;/li&gt;
&lt;li&gt;Total Codex token spend at &lt;code&gt;gpt-5.3-codex&lt;/code&gt; rates: roughly $44&lt;/li&gt;
&lt;li&gt;Hand-prompting the same task would have required two full focused days of supervision&lt;/li&gt;
&lt;li&gt;Actual engagement: three check-ins&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Locked Computer Use: The Controversial Half
&lt;/h2&gt;

&lt;p&gt;Computer Use shipped earlier in 2026—Codex could operate GUI apps when the Mac was unlocked and monitored. The May 21 update added:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Continued operation after screen lock&lt;/strong&gt;: Goal Mode loops driving desktop apps don't stall when screensaver activates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mobile triggering&lt;/strong&gt;: Hand the agent tasks from your phone to drive the Mac left at your desk&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Safety Model
&lt;/h3&gt;

&lt;p&gt;Enabling Locked Use installs an Apple authorization plugin participating in macOS unlock flow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Mac unlocks temporarily, &lt;strong&gt;but display stays covered&lt;/strong&gt;—the lock screen remains visible while Codex operates in the background&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authorization windows are short-lived and scoped&lt;/strong&gt; to the current unlock attempt; no standing grants exist&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keyboard, trackpad, or mouse contact immediately relocks&lt;/strong&gt; the Mac and disables auto-unlock until manual unlock&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codex asks before operating each new app&lt;/strong&gt;—mark frequently-used apps "Always allow"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cannot drive Terminal apps, Codex itself, or system admin prompts&lt;/strong&gt;—hard-coded exclusions prevent privilege escalation through GUI automation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Launch Availability &amp;amp; Restrictions
&lt;/h3&gt;

&lt;p&gt;The feature is unavailable in the EEA, UK, and Switzerland at launch. Apple's automation policy blocks several app categories regardless of user settings.&lt;/p&gt;

&lt;p&gt;If regular Computer Use isn't enabled, grant Screen Recording and Accessibility permissions to Codex through System Settings first. The plugin install adds only the locked-screen layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Real Goal Mode Loop, End to End
&lt;/h2&gt;

&lt;p&gt;Starting in your project root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ~/work/orders-service
&lt;span class="nv"&gt;$ &lt;/span&gt;codex
&lt;span class="c"&gt;# Inside the TUI:&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /goal Migrate this codebase from Pydantic v1 to v2, verified by
        &lt;span class="sb"&gt;`&lt;/span&gt;pytest &lt;span class="nt"&gt;-q&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt; exiting 0 and &lt;span class="sb"&gt;`&lt;/span&gt;mypy &lt;span class="nt"&gt;--strict&lt;/span&gt; src/&lt;span class="sb"&gt;`&lt;/span&gt; exiting 0,
        &lt;span class="k"&gt;while &lt;/span&gt;preserving all public API signatures &lt;span class="k"&gt;in &lt;/span&gt;docs/public_api.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Codex acknowledges the goal, runs initial scans, and proposes a plan. From here you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Walk away—the loop runs until success, blocker, or budget exhaustion&lt;/li&gt;
&lt;li&gt;Hand off to Locked Computer Use for GUI steps (migration wizards, CI dashboard screenshots, etc.) and lock your Mac&lt;/li&gt;
&lt;li&gt;Trigger status checks from Codex Mobile while away from the laptop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Returning later, &lt;code&gt;/goal&lt;/code&gt; shows current state: what's verified, what's pending, last blockers. &lt;code&gt;/goal pause&lt;/code&gt; lets you intervene without losing context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recommended Starter Configuration
&lt;/h3&gt;

&lt;p&gt;Add to &lt;code&gt;~/.codex/config.toml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"gpt-5.3-codex"&lt;/span&gt;
&lt;span class="py"&gt;model_provider&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ofox"&lt;/span&gt;      &lt;span class="c"&gt;# or "openai" if going direct&lt;/span&gt;

&lt;span class="nn"&gt;[model_providers.ofox]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ofox.ai"&lt;/span&gt;
&lt;span class="py"&gt;base_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://api.ofox.ai/v1"&lt;/span&gt;
&lt;span class="py"&gt;env_key&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"OFOX_API_KEY"&lt;/span&gt;
&lt;span class="py"&gt;wire_api&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"responses"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Goal Mode exposes no per-session token or iteration caps in &lt;code&gt;config.toml&lt;/code&gt;—documented stopping levers are slash commands (&lt;code&gt;/goal pause&lt;/code&gt;, &lt;code&gt;/goal clear&lt;/code&gt;), detected repeated blockers, and your plan's usage limit. The practical control is the usage cap on whichever provider you select. At &lt;code&gt;gpt-5.3-codex&lt;/code&gt; rates of $1.75 input / $14 output per million tokens, single mostly-output multi-hour sessions easily run $30-80, so your account cap becomes the actual budget guardrail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Route Codex Through ofox.ai
&lt;/h2&gt;

&lt;p&gt;Goal Mode hammers the model—multi-day objectives routinely make hundreds of reasoning turns with bills dominated by &lt;code&gt;gpt-5.3-codex&lt;/code&gt; output tokens at $14/M. Three reasons to pipe requests through a unified gateway instead of directly to OpenAI:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Single key for side models&lt;/strong&gt;: Goal loops typically delegate cheap sub-tasks (summarization, classification, regex generation) to smaller models. One ofox.ai key routes the hot path to &lt;code&gt;gpt-5.3-codex&lt;/code&gt; and cold path to &lt;code&gt;gpt-5.4-mini&lt;/code&gt; or &lt;code&gt;deepseek-v4-flash&lt;/code&gt; without juggling credentials&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Per-goal spend visibility&lt;/strong&gt;: Tag sessions with custom headers; the dashboard shows per-goal cost, not per-day. Useful when determining whether a Pydantic migration justified its expense&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Failover on outages&lt;/strong&gt;: Long-horizon goals get burned by brief provider blips. ofox falls back automatically; direct OpenAI keys error out and force &lt;code&gt;/goal pause&lt;/code&gt; until recovery&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  When NOT to Use Goal Mode
&lt;/h2&gt;

&lt;p&gt;Three disqualifiers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cannot write verification commands&lt;/strong&gt;: If success means "feels right" or "more elegant," Goal Mode either declares premature victory or churns indefinitely. Use one-shot prompts instead&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Work needs frequent human judgment&lt;/strong&gt;: Goals target autonomy. If every change needs approval, you pay for unused context. Run one-shot sessions instead—cheaper, faster&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Destructive work at scale&lt;/strong&gt;: Database migrations, &lt;code&gt;git push --force&lt;/code&gt;, production touching. Goal Mode excels at unattended convergence but lacks judgment about when &lt;em&gt;not&lt;/em&gt; to act. Sandbox agents to worktrees, set &lt;code&gt;approval_policy&lt;/code&gt; requiring shell command approval, prefer goals with dry-run verification over live mutations&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Shape of the Next Year
&lt;/h2&gt;

&lt;p&gt;Goal Mode plus Locked Computer Use represents the first credible "set a goal, lock your laptop, check tomorrow" coding loop for production use. The agent isn't smarter than last month—friction simply vanished, changing which engineering tasks merit delegating to models. A coding agent surviving screen locks, budget resets, and dinner breaks differs fundamentally from one requiring constant supervision.&lt;/p&gt;

&lt;p&gt;The important caveat: hours of attended Goal Mode work proves reliable today, but fully unattended multi-day work still depends on goal verifiability. The discipline of writing goals with real evidence surfaces is now the critical skill, superseding single-turn prompt craft.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources &amp;amp; Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://developers.openai.com/codex/changelog" rel="noopener noreferrer"&gt;Codex Changelog — May 2026&lt;/a&gt; — official release notes for Goal Mode GA and Locked Computer Use&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://developers.openai.com/cookbook/examples/codex/using_goals_in_codex" rel="noopener noreferrer"&gt;Using Goals in Codex&lt;/a&gt; — cookbook with goal syntax and worked examples&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://developers.openai.com/codex/app/computer-use" rel="noopener noreferrer"&gt;Computer Use — Codex App&lt;/a&gt; — official safety model and platform constraints&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.macrumors.com/2026/05/22/codex-use-mac-apps-when-locked/" rel="noopener noreferrer"&gt;MacRumors: Codex Can Use Your Mac When Locked&lt;/a&gt; — independent writeup of the unlock flow&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://openrouter.ai/openai/gpt-5.3-codex" rel="noopener noreferrer"&gt;GPT-5.3-Codex on OpenRouter&lt;/a&gt; — pricing and context window reference&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://ofox.ai/blog/codex-goal-mode-remote-computer-use-2026/" rel="noopener noreferrer"&gt;ofox.ai/blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>codex</category>
      <category>openai</category>
      <category>agents</category>
    </item>
    <item>
      <title>Codex CLI config.toml Deep Dive: Every Setting Explained</title>
      <dc:creator>Owen</dc:creator>
      <pubDate>Wed, 27 May 2026 14:23:22 +0000</pubDate>
      <link>https://dev.to/owen_fox/codex-cli-configtoml-deep-dive-every-setting-explained-5gpm</link>
      <guid>https://dev.to/owen_fox/codex-cli-configtoml-deep-dive-every-setting-explained-5gpm</guid>
      <description>&lt;h1&gt;
  
  
  Codex CLI config.toml Deep Dive: Every Setting Explained
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;TL;DR.&lt;/strong&gt; Codex CLI's &lt;code&gt;config.toml&lt;/code&gt; has grown past 150 documented keys across sandbox, approvals, permissions, MCP, providers, TUI, hooks, telemetry, and feature flags — most users only edit five of them, and miss the ones that actually matter (granular approvals, permission profiles, &lt;code&gt;shell_environment_policy&lt;/code&gt;, &lt;code&gt;features.network_proxy&lt;/code&gt;). This deep dive walks every section, calls out the surprising defaults, and ends with a layered config you can paste and trim. The default &lt;code&gt;~/.codex/config.toml&lt;/code&gt; is empty for a reason: Codex ships sensible defaults, but the moment you put Codex in a sandbox tighter than your shell or a model cheaper than the flagship, you'll touch ten settings — and seven of them aren't in any blog post.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the file lives, and what gets ignored
&lt;/h2&gt;

&lt;p&gt;User-level config lives in &lt;code&gt;$CODEX_HOME/config.toml&lt;/code&gt;, which defaults to &lt;code&gt;~/.codex/config.toml&lt;/code&gt; on macOS and Linux and &lt;code&gt;%USERPROFILE%\.codex\config.toml&lt;/code&gt; on Windows. Project-scoped overrides go in &lt;code&gt;.codex/config.toml&lt;/code&gt; at the project root.&lt;/p&gt;

&lt;p&gt;The merge is layered: managed config (admin-pushed) → user config → project config → CLI flags. Profiles slot in between user and project config when &lt;code&gt;--profile NAME&lt;/code&gt; is passed. A set of keys are deliberately &lt;strong&gt;ignored&lt;/strong&gt; in project-local files for safety, and silently dropped if you put them there:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;openai_base_url&lt;/code&gt;, &lt;code&gt;chatgpt_base_url&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;model_provider&lt;/code&gt;, &lt;code&gt;model_providers&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;notify&lt;/code&gt;, &lt;code&gt;profile&lt;/code&gt;, &lt;code&gt;profiles&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;approval_policy&lt;/code&gt;, &lt;code&gt;sandbox_mode&lt;/code&gt;, &lt;code&gt;sandbox_workspace_write.*&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;experimental_realtime_ws_base_url&lt;/code&gt;, &lt;code&gt;otel.*&lt;/code&gt;, &lt;code&gt;apps_mcp_product_sku&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your project config "doesn't seem to take effect" for one of those, move it to &lt;code&gt;~/.codex/config.toml&lt;/code&gt;. This is the single most common WTF on the Codex CLI Discord.&lt;/p&gt;

&lt;h2&gt;
  
  
  The five keys most users actually set
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;model&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"gpt-5.4"&lt;/span&gt;          &lt;span class="c"&gt;# or any id your provider exposes&lt;/span&gt;
&lt;span class="py"&gt;model_provider&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"openai"&lt;/span&gt;           &lt;span class="c"&gt;# built-in: openai, ollama, lmstudio&lt;/span&gt;
&lt;span class="py"&gt;approval_policy&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"on-request"&lt;/span&gt;       &lt;span class="c"&gt;# untrusted | on-request | never | { granular = {...} }&lt;/span&gt;
&lt;span class="py"&gt;sandbox_mode&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"workspace-write"&lt;/span&gt;  &lt;span class="c"&gt;# read-only | workspace-write | danger-full-access&lt;/span&gt;
&lt;span class="py"&gt;file_opener&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"vscode"&lt;/span&gt;           &lt;span class="c"&gt;# vscode | vscode-insiders | windsurf | cursor | none&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the 80% config. Everything below this section either tightens the sandbox, swaps the provider, layers a profile, or wires in MCP/hooks/OTEL.&lt;/p&gt;

&lt;p&gt;A note on the model field: Codex's default refreshed to &lt;code&gt;gpt-5.4&lt;/code&gt; recently, and &lt;code&gt;gpt-5.5&lt;/code&gt; is currently surfaced through ChatGPT-login workflows in the TUI's composer. For API-key workflows the available IDs vary by provider; check &lt;code&gt;codex models&lt;/code&gt; (or your provider's catalog) before pinning a value. The Codex CLI ships a built-in catalog plus the optional &lt;code&gt;model_catalog_json&lt;/code&gt; key for loading your own JSON catalog on startup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sandbox and approvals — get this pair right or nothing else matters
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;sandbox_mode&lt;/code&gt; is what Codex is &lt;em&gt;technically allowed&lt;/em&gt; to touch. &lt;code&gt;approval_policy&lt;/code&gt; is when Codex &lt;em&gt;asks you first&lt;/em&gt;. They compose, and they default to safe-but-annoying.&lt;/p&gt;

&lt;h3&gt;
  
  
  sandbox_mode
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Filesystem&lt;/th&gt;
&lt;th&gt;Network&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;read-only&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Read everywhere, write nowhere&lt;/td&gt;
&lt;td&gt;Blocked&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;workspace-write&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Write inside cwd + &lt;code&gt;$TMPDIR&lt;/code&gt; + &lt;code&gt;/tmp&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Blocked by default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;danger-full-access&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Whatever your user can do&lt;/td&gt;
&lt;td&gt;Whatever your user can do&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most teams should sit in &lt;code&gt;workspace-write&lt;/code&gt;. The under-documented controls live under &lt;code&gt;[sandbox_workspace_write]&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[sandbox_workspace_write]&lt;/span&gt;
&lt;span class="py"&gt;writable_roots&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"~/work/notes"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="c"&gt;# extra dirs beyond cwd&lt;/span&gt;
&lt;span class="py"&gt;network_access&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;              &lt;span class="c"&gt;# allow outbound HTTP inside sandbox&lt;/span&gt;
&lt;span class="py"&gt;exclude_tmpdir_env_var&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;              &lt;span class="c"&gt;# drop $TMPDIR from writable set&lt;/span&gt;
&lt;span class="py"&gt;exclude_slash_tmp&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;              &lt;span class="c"&gt;# drop /tmp from writable set&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;network_access = true&lt;/code&gt; is the toggle people miss when their &lt;code&gt;pip install&lt;/code&gt; or &lt;code&gt;npm install&lt;/code&gt; mysteriously hangs.&lt;/p&gt;

&lt;h3&gt;
  
  
  approval_policy
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;untrusted&lt;/code&gt; asks before almost everything. &lt;code&gt;on-request&lt;/code&gt; asks when Codex hits something the sandbox blocks. &lt;code&gt;never&lt;/code&gt; is fully autonomous (and shouldn't be paired with &lt;code&gt;danger-full-access&lt;/code&gt; unless you really mean it).&lt;/p&gt;

&lt;p&gt;For finer control, use the table form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[approval_policy.granular]&lt;/span&gt;
&lt;span class="py"&gt;sandbox_approval&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c"&gt;# let Codex ask to escalate beyond sandbox&lt;/span&gt;
&lt;span class="py"&gt;request_permissions&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c"&gt;# let the request_permissions tool prompt&lt;/span&gt;
&lt;span class="py"&gt;rules&lt;/span&gt;                &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c"&gt;# respect execpolicy prompt rules&lt;/span&gt;
&lt;span class="py"&gt;skill_approval&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c"&gt;# ask before running skill scripts&lt;/span&gt;
&lt;span class="py"&gt;mcp_elicitations&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;  &lt;span class="c"&gt;# mute MCP-driven prompts&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is how you say "you may ask to escalate the sandbox, but stop asking me to confirm individual MCP elicitations." Most users won't need it, but for unattended runs in CI it matters.&lt;/p&gt;

&lt;p&gt;The companion key &lt;code&gt;approvals_reviewer&lt;/code&gt; selects who handles eligible prompts: &lt;code&gt;user&lt;/code&gt; (default) or &lt;code&gt;auto_review&lt;/code&gt; (which delegates to a configured reviewer agent).&lt;/p&gt;

&lt;h2&gt;
  
  
  Reasoning, verbosity, and plan mode
&lt;/h2&gt;

&lt;p&gt;Four keys, all model-dependent. Use them with GPT-5 family models; older/non-reasoning models ignore them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;model_reasoning_effort&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"medium"&lt;/span&gt;  &lt;span class="c"&gt;# minimal | low | medium | high | xhigh&lt;/span&gt;
&lt;span class="py"&gt;model_reasoning_summary&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"auto"&lt;/span&gt;    &lt;span class="c"&gt;# auto | concise | detailed | none&lt;/span&gt;
&lt;span class="py"&gt;model_verbosity&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"medium"&lt;/span&gt;  &lt;span class="c"&gt;# low | medium | high  (GPT-5 Responses API)&lt;/span&gt;
&lt;span class="py"&gt;plan_mode_reasoning_effort&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"high"&lt;/span&gt;    &lt;span class="c"&gt;# override applied only in /plan mode&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;xhigh&lt;/code&gt; exists but burns tokens; reserve it for the worst plan-mode problems. &lt;code&gt;hide_agent_reasoning = true&lt;/code&gt; suppresses reasoning events in the TUI and &lt;code&gt;codex exec&lt;/code&gt; output without changing what the model actually computes — useful for screenshots, log piping, and pair-programming sessions where the unedited chain-of-thought is more distracting than helpful. &lt;code&gt;show_raw_agent_reasoning = true&lt;/code&gt; does the inverse: surface the raw reasoning content from the model when the provider exposes it.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;model_supports_reasoning_summaries&lt;/code&gt; is a force-override (true/false) for whether Codex sends reasoning metadata at all. Leave it unset unless you're debugging a custom provider that lies about its capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Permissions profiles — the modern way to scope access
&lt;/h2&gt;

&lt;p&gt;The newer &lt;code&gt;[permissions.NAME]&lt;/code&gt; block is more expressive than &lt;code&gt;sandbox_workspace_write&lt;/code&gt; and is the way Codex is moving. You define named profiles (&lt;code&gt;:read-only&lt;/code&gt;, &lt;code&gt;:workspace&lt;/code&gt;, &lt;code&gt;:danger-full-access&lt;/code&gt; ship built-in) and select one with &lt;code&gt;default_permissions = "my-profile"&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[permissions.scoped]&lt;/span&gt;

&lt;span class="nn"&gt;[permissions.scoped.workspace_roots]&lt;/span&gt;
&lt;span class="py"&gt;"~/code/oss"&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;"~/code/clients"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="nn"&gt;[permissions.scoped.filesystem]&lt;/span&gt;
&lt;span class="py"&gt;glob_scan_max_depth&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="py"&gt;".env"&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"deny"&lt;/span&gt;
&lt;span class="py"&gt;"**/.git/**"&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"deny"&lt;/span&gt;
&lt;span class="py"&gt;"~/.ssh/**"&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"deny"&lt;/span&gt;

&lt;span class="nn"&gt;[permissions.scoped.filesystem.":workspace_roots"]&lt;/span&gt;
&lt;span class="py"&gt;"."&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"write"&lt;/span&gt;
&lt;span class="py"&gt;"**/*.env"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"deny"&lt;/span&gt;

&lt;span class="nn"&gt;[permissions.scoped.network]&lt;/span&gt;
&lt;span class="py"&gt;enabled&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;mode&lt;/span&gt;                 &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"limited"&lt;/span&gt;     &lt;span class="c"&gt;# limited | full&lt;/span&gt;
&lt;span class="py"&gt;allow_local_binding&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

&lt;span class="nn"&gt;[permissions.scoped.network.domains]&lt;/span&gt;
&lt;span class="py"&gt;"api.openai.com"&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"allow"&lt;/span&gt;
&lt;span class="py"&gt;"api.ofox.ai"&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"allow"&lt;/span&gt;
&lt;span class="py"&gt;"github.com"&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"allow"&lt;/span&gt;
&lt;span class="py"&gt;"*.internal.corp"&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"deny"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things worth knowing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  The &lt;code&gt;:workspace_roots&lt;/code&gt; token is a special key that scopes the rules below it to any path declared in &lt;code&gt;workspace_roots&lt;/code&gt;. Without that scoping wrapper, &lt;code&gt;**/*.env = "deny"&lt;/code&gt; would apply globally.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;glob_scan_max_depth&lt;/code&gt; exists because expanding a deny glob like &lt;code&gt;**/secret.json&lt;/code&gt; across a giant repo is expensive — Codex caps it to keep startup fast.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;network.mode = "limited"&lt;/code&gt; plus an explicit domain allowlist is the production-grade setup. Combine with &lt;code&gt;dangerously_allow_non_loopback_proxy = false&lt;/code&gt; (the default) so the sandbox proxy only binds to loopback.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Network proxy — the feature flag most people skip
&lt;/h2&gt;

&lt;p&gt;If you ran into "but Codex can't &lt;code&gt;pip install&lt;/code&gt;", you probably want this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[features.network_proxy]&lt;/span&gt;
&lt;span class="py"&gt;enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;proxy_url&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"http://127.0.0.1:3128"&lt;/span&gt;
&lt;span class="py"&gt;socks_url&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"http://127.0.0.1:8081"&lt;/span&gt;
&lt;span class="py"&gt;enable_socks5&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;enable_socks5_udp&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;allow_local_binding&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="py"&gt;allow_upstream_proxy&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="nn"&gt;[features.network_proxy.domains]&lt;/span&gt;
&lt;span class="py"&gt;"pypi.org"&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"allow"&lt;/span&gt;
&lt;span class="py"&gt;"registry.npmjs.org"&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"allow"&lt;/span&gt;
&lt;span class="py"&gt;"github.com"&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"allow"&lt;/span&gt;
&lt;span class="py"&gt;"api.openai.com"&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"allow"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives the sandboxed subprocess an HTTP/SOCKS5 proxy with a domain allowlist, rather than the binary on/off of &lt;code&gt;sandbox_workspace_write.network_access&lt;/code&gt;. The &lt;code&gt;dangerously_*&lt;/code&gt; keys exist for niche bind/listener cases — leave them off unless you understand the failure mode.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP servers — the meatiest section
&lt;/h2&gt;

&lt;p&gt;MCP server configuration lives under &lt;code&gt;[mcp_servers.&amp;lt;id&amp;gt;]&lt;/code&gt;. The schema covers both stdio servers (&lt;code&gt;command&lt;/code&gt; + &lt;code&gt;args&lt;/code&gt;) and HTTP streamable servers (&lt;code&gt;url&lt;/code&gt; + headers).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[mcp_servers.docs]&lt;/span&gt;
&lt;span class="py"&gt;command&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"uvx"&lt;/span&gt;
&lt;span class="py"&gt;args&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"mcp-server-docs"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;cwd&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"~/code/docs-server"&lt;/span&gt;
&lt;span class="py"&gt;env&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;DOCS_INDEX&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"~/.cache/docs.idx"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="py"&gt;startup_timeout_sec&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;
&lt;span class="py"&gt;tool_timeout_sec&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt;
&lt;span class="py"&gt;required&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;  &lt;span class="c"&gt;# if true, Codex fails startup when this server can't init&lt;/span&gt;
&lt;span class="py"&gt;enabled&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;enabled_tools&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"search"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"fetch_section"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="c"&gt;# allowlist&lt;/span&gt;
&lt;span class="py"&gt;disabled_tools&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;                            &lt;span class="c"&gt;# denylist&lt;/span&gt;
&lt;span class="py"&gt;default_tools_approval_mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"auto"&lt;/span&gt;                &lt;span class="c"&gt;# auto | prompt | approve&lt;/span&gt;

&lt;span class="nn"&gt;[mcp_servers.github]&lt;/span&gt;
&lt;span class="py"&gt;url&lt;/span&gt;                  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://github-mcp.example.com/mcp"&lt;/span&gt;
&lt;span class="py"&gt;bearer_token_env_var&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"GITHUB_TOKEN"&lt;/span&gt;
&lt;span class="py"&gt;http_headers&lt;/span&gt;         &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;"X-Repo"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ofoxai/blog"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="py"&gt;env_http_headers&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;"X-User"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"GITHUB_USER"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;   &lt;span class="c"&gt;# populated from env&lt;/span&gt;
&lt;span class="py"&gt;oauth_resource&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://github-mcp.example.com"&lt;/span&gt;
&lt;span class="py"&gt;scopes&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"repo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"issues"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nn"&gt;[mcp_servers.github.tools.create_issue]&lt;/span&gt;
&lt;span class="py"&gt;approval_mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"prompt"&lt;/span&gt;   &lt;span class="c"&gt;# per-tool override&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;startup_timeout_sec&lt;/code&gt; is 10 by default — bump it for slow Node MCP servers that lazy-load on first request. &lt;code&gt;tool_timeout_sec&lt;/code&gt; defaults to 60; long-running shell or database tools need more. &lt;code&gt;required = true&lt;/code&gt; is the right call for a server your workflow depends on; you'd rather fail at boot than discover it half a session later.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;default_tools_approval_mode&lt;/code&gt; and &lt;code&gt;tools.&amp;lt;name&amp;gt;.approval_mode&lt;/code&gt; are how you say "auto-approve &lt;code&gt;search&lt;/code&gt;, prompt me for &lt;code&gt;delete_branch&lt;/code&gt;" without writing custom approval hooks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model providers — custom endpoints, including ofox
&lt;/h2&gt;

&lt;p&gt;Built-in provider IDs (&lt;code&gt;openai&lt;/code&gt;, &lt;code&gt;ollama&lt;/code&gt;, &lt;code&gt;lmstudio&lt;/code&gt;) are reserved. Everything else is a &lt;code&gt;[model_providers.&amp;lt;id&amp;gt;]&lt;/code&gt; block. For an ofox setup that routes Codex through one key across GPT/Claude/Gemini/DeepSeek/Qwen models:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[model_providers.ofox]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ofox"&lt;/span&gt;
&lt;span class="py"&gt;base_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://api.ofox.ai/v1"&lt;/span&gt;
&lt;span class="py"&gt;wire_api&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"responses"&lt;/span&gt;
&lt;span class="py"&gt;env_key&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"OFOX_API_KEY"&lt;/span&gt;
&lt;span class="py"&gt;env_key_instructions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Get a key from https://ofox.ai/keys"&lt;/span&gt;
&lt;span class="py"&gt;requires_openai_auth&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="py"&gt;request_max_retries&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
&lt;span class="py"&gt;stream_max_retries&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="py"&gt;stream_idle_timeout_ms&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then either flip the default:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;model_provider&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ofox"&lt;/span&gt;
&lt;span class="py"&gt;model&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"gpt-5.4"&lt;/span&gt;     &lt;span class="c"&gt;# or anthropic/claude-opus-4.6, google/gemini-3.1-pro-preview, etc.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;…or scope it to a profile (next section). The full BYO walkthrough with auth-via-command, query-param-pinned Azure endpoints, and per-provider header injection is in &lt;a href="https://dev.to/blog/codex-cli-custom-model-providers-byo-setup/"&gt;How to Use Any Model with Codex CLI&lt;/a&gt;. The gateway rationale — why you'd want one provider entry that fans out to many models — is in the &lt;a href="https://dev.to/blog/ai-api-aggregation-access-every-model-one-endpoint/"&gt;AI API aggregation guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A few keys worth flagging:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;wire_api&lt;/code&gt; only accepts &lt;code&gt;"responses"&lt;/code&gt; as of Codex 0.59 (February 2026). The &lt;code&gt;"chat"&lt;/code&gt; value and the &lt;code&gt;/chat/completions&lt;/code&gt; path are gone — set it to &lt;code&gt;"responses"&lt;/code&gt; or omit the key (the default). Third-party gateways that want to keep working with Codex now need to surface a &lt;code&gt;/v1/responses&lt;/code&gt; endpoint; ofox.ai exposes one alongside &lt;code&gt;/v1/chat/completions&lt;/code&gt;, so the same &lt;code&gt;https://api.ofox.ai/v1&lt;/code&gt; base URL still routes to Codex correctly. Gateways without a &lt;code&gt;/responses&lt;/code&gt; endpoint need a local translator (community bridges exist) or a different client.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;requires_openai_auth = false&lt;/code&gt; removes Codex's assumption that the key prefix is &lt;code&gt;sk-&lt;/code&gt; — most non-OpenAI gateways need this explicitly. Leave it &lt;code&gt;true&lt;/code&gt; (the default) only when the proxy mirrors OpenAI auth exactly.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;[model_providers.&amp;lt;id&amp;gt;.auth]&lt;/code&gt; lets you run a command on a refresh schedule that returns a bearer token — for short-lived workforce tokens, sigv4-derived credentials, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're sliding off vanilla OpenAI auth for the first time, the SDK &lt;a href="https://dev.to/blog/openai-sdk-migration-to-ofoxai-guide-2026/"&gt;migration guide for OpenAI clients to ofox&lt;/a&gt; is the companion piece.&lt;/p&gt;

&lt;h2&gt;
  
  
  Profiles — layer presets on top of your base config
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;[profiles.NAME]&lt;/code&gt; is a flat overlay: any top-level key set inside the profile wins when you run &lt;code&gt;codex --profile NAME&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[profiles.fast]&lt;/span&gt;
&lt;span class="py"&gt;model&lt;/span&gt;                   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"gpt-5.4-mini"&lt;/span&gt;
&lt;span class="py"&gt;model_reasoning_effort&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"low"&lt;/span&gt;
&lt;span class="py"&gt;model_verbosity&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"low"&lt;/span&gt;
&lt;span class="py"&gt;approval_policy&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"never"&lt;/span&gt;
&lt;span class="py"&gt;sandbox_mode&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"workspace-write"&lt;/span&gt;

&lt;span class="nn"&gt;[profiles.deep]&lt;/span&gt;
&lt;span class="py"&gt;model&lt;/span&gt;                   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"gpt-5.4"&lt;/span&gt;
&lt;span class="py"&gt;model_reasoning_effort&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"high"&lt;/span&gt;
&lt;span class="py"&gt;plan_mode_reasoning_effort&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"xhigh"&lt;/span&gt;
&lt;span class="py"&gt;model_verbosity&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"high"&lt;/span&gt;
&lt;span class="py"&gt;approval_policy&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"on-request"&lt;/span&gt;

&lt;span class="nn"&gt;[profiles.review]&lt;/span&gt;
&lt;span class="py"&gt;model&lt;/span&gt;                   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"anthropic/claude-opus-4.6"&lt;/span&gt;   &lt;span class="c"&gt;# via ofox provider&lt;/span&gt;
&lt;span class="py"&gt;model_provider&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ofox"&lt;/span&gt;
&lt;span class="py"&gt;model_reasoning_effort&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"high"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is also the place to set &lt;code&gt;model_provider&lt;/code&gt; per profile so a &lt;code&gt;review&lt;/code&gt; profile can hit Anthropic-via-ofox while your default profile stays on OpenAI. Remember: &lt;code&gt;model_provider&lt;/code&gt; and &lt;code&gt;profile&lt;/code&gt; keys themselves are ignored in &lt;em&gt;project-local&lt;/em&gt; config — define them in &lt;code&gt;~/.codex/config.toml&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For practical patterns — fast/deep/review profiles paired with shell aliases — see the &lt;a href="https://dev.to/blog/codex-cli-real-world-coding-workflow/"&gt;real-world Codex CLI workflow guide&lt;/a&gt;. The pricing tradeoffs behind picking fast/deep models live in the &lt;a href="https://dev.to/blog/codex-cli-api-configuration-guide-2026/"&gt;Codex CLI API configuration guide&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  History, TUI, and the file_opener
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[history]&lt;/span&gt;
&lt;span class="py"&gt;persistence&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"save-all"&lt;/span&gt;   &lt;span class="c"&gt;# save-all | none&lt;/span&gt;
&lt;span class="py"&gt;max_bytes&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5_242_880&lt;/span&gt;    &lt;span class="c"&gt;# 5 MB cap; drops oldest entries&lt;/span&gt;

&lt;span class="nn"&gt;[tui]&lt;/span&gt;
&lt;span class="py"&gt;animations&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;show_tooltips&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;notifications&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;notification_condition&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"unfocused"&lt;/span&gt;   &lt;span class="c"&gt;# unfocused | always&lt;/span&gt;
&lt;span class="py"&gt;notification_method&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"auto"&lt;/span&gt;        &lt;span class="c"&gt;# auto | osc9 | bel&lt;/span&gt;
&lt;span class="py"&gt;theme&lt;/span&gt;                  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"catppuccin-mocha"&lt;/span&gt;
&lt;span class="py"&gt;vim_mode_default&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="py"&gt;alternate_screen&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"auto"&lt;/span&gt;        &lt;span class="c"&gt;# auto | always | never&lt;/span&gt;
&lt;span class="py"&gt;raw_output_mode&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="py"&gt;status_line&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"token-usage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"branch"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;terminal_title&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"spinner"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"project"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nn"&gt;[tui.keymap.composer]&lt;/span&gt;
&lt;span class="py"&gt;submit&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"enter"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;newline&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"shift+enter"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;tui.notifications&lt;/code&gt; accepts either a boolean or an array of event types (&lt;code&gt;["new-message", "tool-output"]&lt;/code&gt;) for finer control. &lt;code&gt;alternate_screen = "never"&lt;/code&gt; is useful in tmux setups where the alternate screen swallows scrollback. &lt;code&gt;tui.theme&lt;/code&gt; accepts kebab-case theme names — &lt;code&gt;catppuccin-mocha&lt;/code&gt;, &lt;code&gt;gruvbox-dark&lt;/code&gt;, &lt;code&gt;solarized-light&lt;/code&gt;, and friends.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;file_opener&lt;/code&gt; controls the URI scheme Codex emits when citing files in output. The default is &lt;code&gt;vscode&lt;/code&gt;; &lt;code&gt;cursor&lt;/code&gt;, &lt;code&gt;windsurf&lt;/code&gt;, &lt;code&gt;vscode-insiders&lt;/code&gt;, and &lt;code&gt;none&lt;/code&gt; (plain paths, no clickable links) are the alternatives.&lt;/p&gt;

&lt;h2&gt;
  
  
  shell_environment_policy — the leak you'll only notice in OTEL
&lt;/h2&gt;

&lt;p&gt;By default Codex inherits your full shell environment when it spawns subprocesses. That's convenient, until you realize every &lt;code&gt;AWS_*&lt;/code&gt;, &lt;code&gt;GITHUB_*&lt;/code&gt;, and &lt;code&gt;OPENAI_*&lt;/code&gt; variable in your env is reachable by every shell tool the model runs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[shell_environment_policy]&lt;/span&gt;
&lt;span class="py"&gt;inherit&lt;/span&gt;                 &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"core"&lt;/span&gt;          &lt;span class="c"&gt;# all | core | none&lt;/span&gt;
&lt;span class="py"&gt;ignore_default_excludes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;           &lt;span class="c"&gt;# if false, KEY/SECRET/TOKEN names are stripped first&lt;/span&gt;
&lt;span class="py"&gt;include_only&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"PATH"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"HOME"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"TMPDIR"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"LANG"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"LC_*"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;exclude&lt;/span&gt;                 &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"AWS_*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"GITHUB_*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"*_TOKEN"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"*_SECRET"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"*_KEY"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;set&lt;/span&gt;                     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;"CI"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;"NO_COLOR"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"1"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="py"&gt;experimental_use_profile&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;inherit = "core"&lt;/code&gt; keeps a minimal POSIX-ish set and drops the rest. &lt;code&gt;ignore_default_excludes = false&lt;/code&gt; (the default) means anything with &lt;code&gt;KEY&lt;/code&gt;, &lt;code&gt;SECRET&lt;/code&gt;, or &lt;code&gt;TOKEN&lt;/code&gt; in the name is filtered before your custom &lt;code&gt;include_only&lt;/code&gt;/&lt;code&gt;exclude&lt;/code&gt; runs — leave that on.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;experimental_use_profile = true&lt;/code&gt; invokes your shell's user profile (&lt;code&gt;.zshrc&lt;/code&gt;, etc.) when spawning subprocesses. Cleaner output if your profile defines aliases the model relies on; slower startup either way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Features flags — the boolean grab-bag
&lt;/h2&gt;

&lt;p&gt;Most defaults are sensible. The ones worth knowing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[features]&lt;/span&gt;
&lt;span class="py"&gt;shell_tool&lt;/span&gt;                    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c"&gt;# default tool for running commands&lt;/span&gt;
&lt;span class="py"&gt;hooks&lt;/span&gt;                         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c"&gt;# lifecycle hooks (hooks.json or [hooks] block)&lt;/span&gt;
&lt;span class="py"&gt;codex_git_commit&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;  &lt;span class="c"&gt;# let Codex make git commits attributed to "Codex"&lt;/span&gt;
&lt;span class="py"&gt;multi_agent&lt;/span&gt;                   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c"&gt;# spawn_agents_on_csv &amp;amp; friends&lt;/span&gt;
&lt;span class="py"&gt;unified_exec&lt;/span&gt;                  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c"&gt;# PTY-backed exec (off on Windows by default)&lt;/span&gt;
&lt;span class="py"&gt;shell_snapshot&lt;/span&gt;                &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c"&gt;# snapshot env to speed up tool calls&lt;/span&gt;
&lt;span class="py"&gt;skill_mcp_dependency_install&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c"&gt;# prompt to install missing MCP deps&lt;/span&gt;
&lt;span class="py"&gt;fast_mode&lt;/span&gt;                     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c"&gt;# service-tier picker in TUI&lt;/span&gt;
&lt;span class="py"&gt;network_proxy&lt;/span&gt;                 &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;  &lt;span class="c"&gt;# see "Network proxy" section above&lt;/span&gt;
&lt;span class="py"&gt;prevent_idle_sleep&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;  &lt;span class="c"&gt;# keep machine awake during active turn&lt;/span&gt;
&lt;span class="py"&gt;memories&lt;/span&gt;                      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;  &lt;span class="c"&gt;# opt into Memories&lt;/span&gt;
&lt;span class="py"&gt;undo&lt;/span&gt;                          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;  &lt;span class="c"&gt;# opt into Undo&lt;/span&gt;
&lt;span class="py"&gt;personality&lt;/span&gt;                   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c"&gt;# personality picker&lt;/span&gt;
&lt;span class="py"&gt;apps&lt;/span&gt;                          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;  &lt;span class="c"&gt;# ChatGPT Apps/connectors support&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;codex_git_commit = true&lt;/code&gt; pairs with the top-level &lt;code&gt;commit_attribution&lt;/code&gt; string (default &lt;code&gt;"Codex &amp;lt;[email protected]&amp;gt;"&lt;/code&gt;) — set that to a meaningful identity before turning the feature on.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;memories = true&lt;/code&gt; activates the &lt;code&gt;[memories]&lt;/code&gt; block (thread eligibility, consolidation cadence, raw memory caps). Defaults are conservative: max age 30 days, min idle 6 hours, max 16 rollouts per startup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hooks — lifecycle events without leaving config.toml
&lt;/h2&gt;

&lt;p&gt;You can keep hooks in a sidecar &lt;code&gt;hooks.json&lt;/code&gt;, or inline them under &lt;code&gt;[hooks]&lt;/code&gt;. Inline form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[hooks]&lt;/span&gt;

&lt;span class="nn"&gt;[[hooks.SessionStart]]&lt;/span&gt;
&lt;span class="py"&gt;matcher&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"*"&lt;/span&gt;

  &lt;span class="nn"&gt;[[hooks.SessionStart.hooks]]&lt;/span&gt;
  &lt;span class="py"&gt;type&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"command"&lt;/span&gt;
  &lt;span class="py"&gt;command&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"sh"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"-c"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"echo 'session started' &amp;gt;&amp;gt; ~/.codex.log"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nn"&gt;[[hooks.PreToolUse]]&lt;/span&gt;
&lt;span class="py"&gt;matcher&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Bash"&lt;/span&gt;

  &lt;span class="nn"&gt;[[hooks.PreToolUse.hooks]]&lt;/span&gt;
  &lt;span class="py"&gt;type&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"command"&lt;/span&gt;
  &lt;span class="py"&gt;command&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"python3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"~/bin/codex_audit.py"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="py"&gt;commandWindows&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"C:/bin/codex_audit.py"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The matcher table groups handlers by event. The documented events are &lt;code&gt;SessionStart&lt;/code&gt;, &lt;code&gt;UserPromptSubmit&lt;/code&gt;, &lt;code&gt;PreToolUse&lt;/code&gt;, &lt;code&gt;PostToolUse&lt;/code&gt;, &lt;code&gt;PermissionRequest&lt;/code&gt;, &lt;code&gt;PreCompact&lt;/code&gt;, &lt;code&gt;PostCompact&lt;/code&gt;, &lt;code&gt;SubagentStart&lt;/code&gt;, &lt;code&gt;SubagentStop&lt;/code&gt;, and &lt;code&gt;Stop&lt;/code&gt;. Each hook entry has a &lt;code&gt;type&lt;/code&gt; plus the relevant fields — for command hooks, that's &lt;code&gt;command&lt;/code&gt; and the optional &lt;code&gt;commandWindows&lt;/code&gt; override for Windows shells.&lt;/p&gt;

&lt;p&gt;If your team needs to &lt;em&gt;force&lt;/em&gt; hooks across every developer machine, the &lt;code&gt;allow_managed_hooks_only = true&lt;/code&gt; flag in &lt;code&gt;requirements.toml&lt;/code&gt; (admin-distributed) makes user and project hooks no-ops, leaving only managed ones. The Claude Code equivalent — and a similar safety story — is covered in the &lt;a href="https://dev.to/blog/claude-code-hooks-subagents-skills-complete-guide-2026/"&gt;Claude Code hooks, subagents, and skills guide&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Telemetry: otel and analytics
&lt;/h2&gt;

&lt;p&gt;OpenTelemetry support ships built-in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[otel]&lt;/span&gt;
&lt;span class="py"&gt;environment&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"prod"&lt;/span&gt;
&lt;span class="py"&gt;log_user_prompt&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;                   &lt;span class="c"&gt;# opt in to exporting raw prompts&lt;/span&gt;
&lt;span class="py"&gt;exporter&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"otlp-http"&lt;/span&gt;             &lt;span class="c"&gt;# none | otlp-http | otlp-grpc&lt;/span&gt;
&lt;span class="py"&gt;trace_exporter&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"otlp-grpc"&lt;/span&gt;
&lt;span class="py"&gt;metrics_exporter&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"statsig"&lt;/span&gt;               &lt;span class="c"&gt;# none | statsig | otlp-http | otlp-grpc&lt;/span&gt;

&lt;span class="nn"&gt;[otel.exporter."otlp-http"]&lt;/span&gt;
&lt;span class="py"&gt;endpoint&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://collector.example.com/v1/logs"&lt;/span&gt;
&lt;span class="py"&gt;protocol&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"binary"&lt;/span&gt;                         &lt;span class="c"&gt;# binary | json&lt;/span&gt;
&lt;span class="py"&gt;headers&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;"x-api-key"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"${OTEL_KEY}"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nn"&gt;[otel.exporter."otlp-http".tls]&lt;/span&gt;
&lt;span class="py"&gt;ca-certificate&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"~/certs/ca.pem"&lt;/span&gt;
&lt;span class="py"&gt;client-certificate&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"~/certs/client.pem"&lt;/span&gt;
&lt;span class="py"&gt;client-private-key&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"~/certs/client.key"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;otel.*&lt;/code&gt; keys are user-level only (ignored in project config). &lt;code&gt;log_user_prompt = false&lt;/code&gt; is the safe default — flip it only when you've sanitized your collector pipeline.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;analytics.enabled = true/false&lt;/code&gt; controls the OpenAI-side analytics opt-in. &lt;code&gt;feedback.enabled = true&lt;/code&gt; keeps the &lt;code&gt;/feedback&lt;/code&gt; TUI command available.&lt;/p&gt;

&lt;h2&gt;
  
  
  Projects, trust, and the AGENTS.md story
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;project_root_markers&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"pyproject.toml"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Cargo.toml"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"pnpm-workspace.yaml"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;project_doc_fallback_filenames&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"AGENTS.md"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"CLAUDE.md"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"CONTRIBUTING.md"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;project_doc_max_bytes&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;32_768&lt;/span&gt;

&lt;span class="nn"&gt;[projects."/Users/me/code/risky-repo"]&lt;/span&gt;
&lt;span class="py"&gt;trust_level&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"untrusted"&lt;/span&gt;

&lt;span class="nn"&gt;[projects."/Users/me/code/oss-i-maintain"]&lt;/span&gt;
&lt;span class="py"&gt;trust_level&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"trusted"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;project_doc_fallback_filenames&lt;/code&gt; is how you get Codex to read &lt;code&gt;CLAUDE.md&lt;/code&gt; (or your team's equivalent) when there's no &lt;code&gt;AGENTS.md&lt;/code&gt;. &lt;code&gt;model_instructions_file&lt;/code&gt; is the heavier hammer: a path to a file that &lt;em&gt;replaces&lt;/em&gt; the built-in instructions entirely, not just augments them.&lt;/p&gt;

&lt;p&gt;Trust level interacts with the approval and sandbox machinery — &lt;code&gt;untrusted&lt;/code&gt; projects get more conservative defaults regardless of your global settings.&lt;/p&gt;

&lt;h2&gt;
  
  
  A complete, layered config you can adapt
&lt;/h2&gt;

&lt;p&gt;Here's a realistic &lt;code&gt;~/.codex/config.toml&lt;/code&gt; that combines everything above. Read it as a menu, not a recipe — most teams should delete two-thirds of it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# ----- Core -----&lt;/span&gt;
&lt;span class="py"&gt;model&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"gpt-5.4"&lt;/span&gt;
&lt;span class="py"&gt;model_provider&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ofox"&lt;/span&gt;
&lt;span class="py"&gt;approval_policy&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"on-request"&lt;/span&gt;
&lt;span class="py"&gt;sandbox_mode&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"workspace-write"&lt;/span&gt;
&lt;span class="py"&gt;default_permissions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;":workspace"&lt;/span&gt;
&lt;span class="py"&gt;file_opener&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"vscode"&lt;/span&gt;
&lt;span class="py"&gt;personality&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"pragmatic"&lt;/span&gt;
&lt;span class="py"&gt;service_tier&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"flex"&lt;/span&gt;

&lt;span class="py"&gt;model_reasoning_effort&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"medium"&lt;/span&gt;
&lt;span class="py"&gt;model_reasoning_summary&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"auto"&lt;/span&gt;
&lt;span class="py"&gt;model_verbosity&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"medium"&lt;/span&gt;
&lt;span class="py"&gt;plan_mode_reasoning_effort&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"high"&lt;/span&gt;

&lt;span class="py"&gt;hide_agent_reasoning&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="py"&gt;check_for_update_on_startup&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;web_search&lt;/span&gt;                     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"cached"&lt;/span&gt;
&lt;span class="py"&gt;commit_attribution&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Codex (ofox) &amp;lt;[email protected]&amp;gt;"&lt;/span&gt;

&lt;span class="c"&gt;# ----- Providers -----&lt;/span&gt;
&lt;span class="nn"&gt;[model_providers.ofox]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ofox"&lt;/span&gt;
&lt;span class="py"&gt;base_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://api.ofox.ai/v1"&lt;/span&gt;
&lt;span class="py"&gt;wire_api&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"responses"&lt;/span&gt;
&lt;span class="py"&gt;env_key&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"OFOX_API_KEY"&lt;/span&gt;
&lt;span class="py"&gt;requires_openai_auth&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

&lt;span class="c"&gt;# ----- Sandbox -----&lt;/span&gt;
&lt;span class="nn"&gt;[sandbox_workspace_write]&lt;/span&gt;
&lt;span class="py"&gt;network_access&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="py"&gt;writable_roots&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"~/work/scratch"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nn"&gt;[permissions.tight]&lt;/span&gt;
&lt;span class="nn"&gt;[permissions.tight.workspace_roots]&lt;/span&gt;
&lt;span class="py"&gt;"~/code"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="nn"&gt;[permissions.tight.filesystem]&lt;/span&gt;
&lt;span class="py"&gt;glob_scan_max_depth&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="py"&gt;".env"&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"deny"&lt;/span&gt;
&lt;span class="py"&gt;"**/.git/**"&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"deny"&lt;/span&gt;
&lt;span class="py"&gt;"~/.ssh/**"&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"deny"&lt;/span&gt;

&lt;span class="nn"&gt;[permissions.tight.network]&lt;/span&gt;
&lt;span class="py"&gt;enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;mode&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"limited"&lt;/span&gt;

&lt;span class="nn"&gt;[permissions.tight.network.domains]&lt;/span&gt;
&lt;span class="py"&gt;"api.ofox.ai"&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"allow"&lt;/span&gt;
&lt;span class="py"&gt;"github.com"&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"allow"&lt;/span&gt;
&lt;span class="py"&gt;"registry.npmjs.org"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"allow"&lt;/span&gt;
&lt;span class="py"&gt;"pypi.org"&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"allow"&lt;/span&gt;

&lt;span class="c"&gt;# ----- Env hygiene -----&lt;/span&gt;
&lt;span class="nn"&gt;[shell_environment_policy]&lt;/span&gt;
&lt;span class="py"&gt;inherit&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"core"&lt;/span&gt;
&lt;span class="py"&gt;include_only&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"PATH"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"HOME"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"TMPDIR"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"LANG"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"LC_*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"OFOX_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;exclude&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"AWS_*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"GITHUB_*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"*_SECRET"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"*_TOKEN"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;set&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;CI&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;NO_COLOR&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"1"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;# ----- History &amp;amp; TUI -----&lt;/span&gt;
&lt;span class="nn"&gt;[history]&lt;/span&gt;
&lt;span class="py"&gt;persistence&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"save-all"&lt;/span&gt;
&lt;span class="py"&gt;max_bytes&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10_485_760&lt;/span&gt;

&lt;span class="nn"&gt;[tui]&lt;/span&gt;
&lt;span class="py"&gt;animations&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;notifications&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;notification_condition&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"unfocused"&lt;/span&gt;
&lt;span class="py"&gt;theme&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"catppuccin-mocha"&lt;/span&gt;
&lt;span class="py"&gt;status_line&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"token-usage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"branch"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"approval"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c"&gt;# ----- MCP -----&lt;/span&gt;
&lt;span class="nn"&gt;[mcp_servers.fs]&lt;/span&gt;
&lt;span class="py"&gt;command&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"uvx"&lt;/span&gt;
&lt;span class="py"&gt;args&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"mcp-server-filesystem"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"~/code"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;startup_timeout_sec&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="py"&gt;default_tools_approval_mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"auto"&lt;/span&gt;

&lt;span class="nn"&gt;[mcp_servers.docs]&lt;/span&gt;
&lt;span class="py"&gt;url&lt;/span&gt;                  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://docs-mcp.your.team/mcp"&lt;/span&gt;
&lt;span class="py"&gt;bearer_token_env_var&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"DOCS_MCP_TOKEN"&lt;/span&gt;

&lt;span class="c"&gt;# ----- Profiles -----&lt;/span&gt;
&lt;span class="nn"&gt;[profiles.fast]&lt;/span&gt;
&lt;span class="py"&gt;model&lt;/span&gt;                  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"gpt-5.4-mini"&lt;/span&gt;
&lt;span class="py"&gt;model_reasoning_effort&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"low"&lt;/span&gt;
&lt;span class="py"&gt;approval_policy&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"never"&lt;/span&gt;

&lt;span class="nn"&gt;[profiles.deep]&lt;/span&gt;
&lt;span class="py"&gt;model_reasoning_effort&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"high"&lt;/span&gt;
&lt;span class="py"&gt;plan_mode_reasoning_effort&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"xhigh"&lt;/span&gt;

&lt;span class="nn"&gt;[profiles.review]&lt;/span&gt;
&lt;span class="py"&gt;model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"anthropic/claude-opus-4.6"&lt;/span&gt;
&lt;span class="py"&gt;model_reasoning_effort&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"high"&lt;/span&gt;

&lt;span class="c"&gt;# ----- Telemetry (opt in) -----&lt;/span&gt;
&lt;span class="nn"&gt;[analytics]&lt;/span&gt;
&lt;span class="py"&gt;enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

&lt;span class="nn"&gt;[otel]&lt;/span&gt;
&lt;span class="py"&gt;environment&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"dev"&lt;/span&gt;
&lt;span class="py"&gt;metrics_exporter&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"none"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Drop it in, run &lt;code&gt;codex --profile fast&lt;/code&gt;, and you have a sandboxed, network-allowlisted, env-scrubbed setup that hits ofox for budget runs and switches to Anthropic-via-ofox for review passes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gotchas that bite people in week two
&lt;/h2&gt;

&lt;p&gt;A short list, all real, all painful:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;&lt;code&gt;model_provider&lt;/code&gt; set in a project-local &lt;code&gt;.codex/config.toml&lt;/code&gt;&lt;/strong&gt; silently ignored. Move it to &lt;code&gt;~/.codex/config.toml&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;&lt;code&gt;network_access = false&lt;/code&gt; plus a tool that needs the network&lt;/strong&gt;. Hangs with no clear error; switch to &lt;code&gt;[features.network_proxy]&lt;/code&gt; + a domain allowlist instead.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;&lt;code&gt;approval_policy = "never"&lt;/code&gt; plus &lt;code&gt;sandbox_mode = "danger-full-access"&lt;/code&gt;&lt;/strong&gt; — there is no safety net, the model can &lt;code&gt;rm -rf $HOME&lt;/code&gt;. The &lt;a href="https://dev.to/blog/claude-code-safety-prevent-accidental-file-deletion/"&gt;Claude Code safety guide&lt;/a&gt; has the same warning for the Claude side; same lesson applies.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;&lt;code&gt;startup_timeout_sec&lt;/code&gt; defaulting to 10&lt;/strong&gt;. Slow MCP servers fail to register and Codex silently drops them; bump to 30 for Node-based servers that lazy-load.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;&lt;code&gt;hide_agent_reasoning = true&lt;/code&gt; paired with debugging an agent loop&lt;/strong&gt; — you'll waste an hour wondering why the model "did nothing" when it actually spent 4k tokens thinking off-screen.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;&lt;code&gt;shell_environment_policy.inherit = "all"&lt;/code&gt;&lt;/strong&gt; (the default) leaks your full env to every tool call. The fix is 5 lines of config, the audit case it prevents is enormous.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;&lt;code&gt;[permissions.NAME.filesystem]&lt;/code&gt; glob patterns at the top level apply globally&lt;/strong&gt;. Scope them under &lt;code&gt;":workspace_roots"&lt;/code&gt; if you only mean "inside the workspace."&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Where to go next
&lt;/h2&gt;

&lt;p&gt;If you're picking a model to slot into &lt;code&gt;model = "..."&lt;/code&gt;, the &lt;a href="https://dev.to/blog/best-llm-for-coding-ranked-real-use-2026/"&gt;best LLM for coding ranked by real use&lt;/a&gt; compares the realistic options. If you're weighing Codex CLI against the alternatives, the &lt;a href="https://dev.to/blog/claude-code-vs-codex-cli-vs-cursor-vs-deepseek-tui-2026/"&gt;Claude Code vs Codex CLI vs Cursor vs DeepSeek TUI comparison&lt;/a&gt; is the head-to-head. For BYO model providers with weird auth shapes, the &lt;a href="https://dev.to/blog/codex-cli-custom-model-providers-byo-setup/"&gt;custom OAI-compatible provider setup guide&lt;/a&gt; is the most detailed.&lt;/p&gt;

&lt;p&gt;The most useful thing in this file isn't a setting — it's the realization that Codex CLI's sandbox is a default-on, default-narrow safety net, and most "why doesn't this work" tickets are someone fighting that net instead of configuring it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://ofox.ai/blog/codex-cli-config-toml-deep-dive/" rel="noopener noreferrer"&gt;ofox.ai/blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>codex</category>
      <category>tutorial</category>
      <category>config</category>
    </item>
    <item>
      <title>How to Use Any OAI-Compatible API with GitHub Copilot — Custom Model Setup Guide</title>
      <dc:creator>Owen</dc:creator>
      <pubDate>Wed, 27 May 2026 02:54:11 +0000</pubDate>
      <link>https://dev.to/owen_fox/how-to-use-any-oai-compatible-api-with-github-copilot-custom-model-setup-guide-4h13</link>
      <guid>https://dev.to/owen_fox/how-to-use-any-oai-compatible-api-with-github-copilot-custom-model-setup-guide-4h13</guid>
      <description>&lt;h1&gt;
  
  
  How to Use Any OAI-Compatible API with GitHub Copilot — Custom Model Setup Guide
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — GitHub Copilot now lets you point Chat (VS Code) and the Copilot CLI at any OpenAI-compatible endpoint. In VS Code, run &lt;code&gt;Chat: Manage Language Models&lt;/code&gt;, pick the &lt;strong&gt;OpenAI Compatible&lt;/strong&gt; provider, paste a base URL plus key. In Copilot CLI, export &lt;code&gt;COPILOT_PROVIDER_BASE_URL&lt;/code&gt;, &lt;code&gt;COPILOT_PROVIDER_API_KEY&lt;/code&gt;, and &lt;code&gt;COPILOT_MODEL&lt;/code&gt;. Inline completions are unaffected — they still run on Copilot's own infra.&lt;/p&gt;

&lt;p&gt;You don't need to leave Copilot to escape Copilot's model menu. Twenty seconds of env vars and your &lt;code&gt;copilot&lt;/code&gt; CLI is talking to Claude Opus 4.6, GPT-5.4, or a local vLLM box — billed by the provider, not your Copilot quota.&lt;/p&gt;

&lt;h2&gt;
  
  
  What BYOK actually does in Copilot
&lt;/h2&gt;

&lt;p&gt;BYOK (Bring Your Own Key) lets the Chat surface and the agent CLI use a model you authenticate to directly, instead of going through GitHub's hosted model pool. The wiring is narrow on purpose:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Surface&lt;/th&gt;
&lt;th&gt;BYOK supported?&lt;/th&gt;
&lt;th&gt;Billing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;VS Code Chat / Agent mode&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Your provider&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Copilot CLI&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Your provider&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inline code completions&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Copilot subscription&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pull request summaries, code review&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Copilot subscription&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The split exists because completions need single-digit-millisecond latency budgets that arbitrary endpoints can't promise. Chat and agents tolerate the round trip, so they got opened up first.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup in VS Code
&lt;/h2&gt;

&lt;p&gt;The path was &lt;a href="https://code.visualstudio.com/blogs/2025/10/22/bring-your-own-key" rel="noopener noreferrer"&gt;announced in October 2025&lt;/a&gt; and has since landed in the stable channel for several providers (GA was &lt;a href="https://github.blog/changelog/2026-04-22-bring-your-own-language-model-key-in-vs-code-now-available/" rel="noopener noreferrer"&gt;confirmed in the April 2026 GitHub changelog&lt;/a&gt;). For the generic OpenAI-compatible flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Open the Command Palette → &lt;strong&gt;Chat: Manage Language Models&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; Pick &lt;strong&gt;OpenAI Compatible&lt;/strong&gt; from the provider list.&lt;/li&gt;
&lt;li&gt; Fill in the &lt;strong&gt;Base URL&lt;/strong&gt; (must serve &lt;code&gt;/chat/completions&lt;/code&gt;), the &lt;strong&gt;API key&lt;/strong&gt;, and a &lt;strong&gt;Model ID&lt;/strong&gt; that the provider exposes.&lt;/li&gt;
&lt;li&gt; Hit &lt;strong&gt;Add Model&lt;/strong&gt;. The model now appears in the Copilot Chat model dropdown.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There are two JSON shapes worth knowing about. The legacy &lt;code&gt;github.copilot.chat.customOAIModels&lt;/code&gt; object in &lt;code&gt;settings.json&lt;/code&gt; still works in stable releases but is marked deprecated:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"github.copilot.chat.customOAIModels"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"anthropic/claude-opus-4.6"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Claude Opus 4.6 (via ofox)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://api.ofox.ai/v1/chat/completions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"toolCalling"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"vision"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"maxInputTokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;200000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"maxOutputTokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;16000&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The replacement (currently Insiders-only) is the &lt;code&gt;chatLanguageModels.json&lt;/code&gt; workspace file using the &lt;code&gt;customendpoint&lt;/code&gt; vendor — note the array shape and the &lt;code&gt;apiType&lt;/code&gt; selector that picks between OpenAI's &lt;code&gt;chat-completions&lt;/code&gt;, OpenAI's &lt;code&gt;responses&lt;/code&gt;, and Anthropic's &lt;code&gt;messages&lt;/code&gt; protocol:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ofox.ai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"vendor"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"customendpoint"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${OFOX_API_KEY}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"apiType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chat-completions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"anthropic/claude-opus-4.6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Claude Opus 4.6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://api.ofox.ai/v1/chat/completions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"toolCalling"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"vision"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"maxInputTokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;200000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"maxOutputTokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;16000&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Capability flags (&lt;code&gt;toolCalling&lt;/code&gt;, &lt;code&gt;vision&lt;/code&gt;) matter. If the agent thinks the model doesn't support tools, it silently falls back to plain chat and your custom commands never fire.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup in Copilot CLI
&lt;/h2&gt;

&lt;p&gt;The CLI's &lt;a href="https://docs.github.com/en/copilot/how-tos/copilot-cli/customize-copilot/use-byok-models" rel="noopener noreferrer"&gt;BYOK docs&lt;/a&gt; are the cleanest reference. Three environment variables, exported before launching &lt;code&gt;copilot&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;COPILOT_PROVIDER_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://api.ofox.ai/v1
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;COPILOT_PROVIDER_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$OFOX_API_KEY&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;COPILOT_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;anthropic/claude-opus-4.6

copilot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a local Ollama box, drop the key entirely:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;COPILOT_PROVIDER_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:11434
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;COPILOT_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;qwen2.5-coder:14b
copilot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The CLI talks the OpenAI Chat Completions protocol against whatever you point it at. If &lt;code&gt;/v1/chat/completions&lt;/code&gt; resolves and the model ID is valid on that endpoint, it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Worked example: ofox.ai as the endpoint
&lt;/h2&gt;

&lt;p&gt;ofox.ai is a gateway that exposes Anthropic, Google, Alibaba and Moonshot models behind the OpenAI Chat Completions schema — useful for Copilot BYOK because you get Claude or Gemini in the Chat dropdown without juggling three SDKs. The base URL is &lt;code&gt;https://api.ofox.ai/v1&lt;/code&gt; and the auth header is a standard &lt;code&gt;Authorization: Bearer &amp;lt;key&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;A typical model ID set to expose to Copilot:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model ID (use as &lt;code&gt;COPILOT_MODEL&lt;/code&gt;)&lt;/th&gt;
&lt;th&gt;What it is&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;openai/gpt-5.4&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GPT-5.4 (general-purpose OpenAI tier)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;anthropic/claude-opus-4.6&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Claude Opus 4.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;google/gemini-3.1-pro-preview&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Gemini 3.1 Pro preview&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bailian/qwen3-max&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Qwen3-Max&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;moonshotai/kimi-k2.6&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Kimi K2.6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Smoke test before pointing Copilot at it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://api.ofox.ai/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$OFOX_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"model":"anthropic/claude-opus-4.6","messages":[{"role":"user","content":"ping"}]}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If that returns a &lt;code&gt;choices[0].message.content&lt;/code&gt;, Copilot will connect. If it 404s on the model ID, fix the ID first — Copilot surfaces those errors as a generic "model unavailable" toast that masks the real cause. For deeper debugging of mismatched IDs and 404s, see &lt;a href="https://ofox.ai/blog/openai-api-model-not-found-errors-troubleshooting/" rel="noopener noreferrer"&gt;Model Not Found errors troubleshooting&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For broader background on the gateway pattern — one key, many providers — see the &lt;a href="https://ofox.ai/blog/openai-sdk-migration-to-ofoxai-guide-2026/" rel="noopener noreferrer"&gt;OpenAI SDK migration guide&lt;/a&gt; and the pillar overview &lt;a href="https://ofox.ai/blog/ai-api-aggregation-access-every-model-one-endpoint/" rel="noopener noreferrer"&gt;AI API aggregation: every model behind one endpoint&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Caveats worth knowing before you commit
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Authentication is static credentials only.&lt;/strong&gt; BYOK accepts an API key or bearer token. There's no OAuth handshake, no service-account flow, no key rotation hook. Treat the key like any other long-lived secret — scope it, rotate it manually, and don't put it in a public repo.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Telemetry still flows to GitHub.&lt;/strong&gt; BYOK changes where the inference happens, not where the usage telemetry goes. Enterprise admins who needed a model migration for &lt;em&gt;compliance&lt;/em&gt; reasons should re-read the data-handling docs before assuming BYOK is sufficient.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Rate limits become yours to manage.&lt;/strong&gt; Copilot's quota stops protecting you; if your provider rate-limits you, the Chat panel will just stall. Watch your provider dashboard for the first week.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Code completions remain on Copilot.&lt;/strong&gt; Repeating this because it's the #1 misunderstanding: BYOK does not replace the inline ghost-text completions. Those still hit GitHub's hosted models.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Comparing the IDE custom-API options
&lt;/h2&gt;

&lt;p&gt;If you're choosing between Copilot BYOK and the equivalent feature in other editors, the surface area looks similar but the agent capabilities don't. The &lt;a href="https://ofox.ai/blog/cursor-claude-code-cline-custom-api-setup-2026/" rel="noopener noreferrer"&gt;Cursor / Claude Code / Cline custom API setup guide&lt;/a&gt; walks the same exercise for those three. Short version: Copilot's BYOK is the cleanest in-editor flow (it's a UI form), Claude Code gives you the most agent power per dollar when paired with &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt;, and Cursor sits in between.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;"Failed to fetch model list"&lt;/strong&gt; — Your base URL is missing &lt;code&gt;/v1&lt;/code&gt; or your endpoint doesn't serve a &lt;code&gt;GET /models&lt;/code&gt; route. The OpenAI-Compatible provider probes &lt;code&gt;/models&lt;/code&gt; to populate the dropdown. If your gateway doesn't expose it, type the model ID manually in the form.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chat hangs after first turn&lt;/strong&gt; — Tool calling is enabled in Copilot but the model isn't returning the expected &lt;code&gt;tool_calls&lt;/code&gt; payload shape. Either flip &lt;code&gt;toolCalling: false&lt;/code&gt; in your &lt;code&gt;customOAIModels&lt;/code&gt; entry, or switch to a model that fully implements the OpenAI tools spec.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CLI says "context length exceeded" early&lt;/strong&gt; — &lt;code&gt;COPILOT_MODEL&lt;/code&gt; is set to an alias your provider remaps to a smaller-context variant. Use the canonical model ID from the provider's docs, not a shorthand.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vision attachments silently dropped&lt;/strong&gt; — Set &lt;code&gt;vision: true&lt;/code&gt; on the model entry in &lt;code&gt;settings.json&lt;/code&gt;. Without that flag, Copilot strips image parts from the multimodal payload before sending.&lt;/p&gt;

&lt;p&gt;The interesting thing about Copilot BYOK isn't that it lets you switch models — it's that it lets you switch &lt;em&gt;vendors&lt;/em&gt; without leaving the editor. Copilot becomes a thin chat shell; the intelligence is rented from whoever's winning this month.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://ofox.ai/blog/github-copilot-byok-oai-compatible-api-setup/" rel="noopener noreferrer"&gt;ofox.ai/blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>githubcopilot</category>
      <category>api</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Gemini 3.5 Pro: Release Date, Expected Specs, and What Flash Already Tells Us</title>
      <dc:creator>Owen</dc:creator>
      <pubDate>Tue, 26 May 2026 14:43:03 +0000</pubDate>
      <link>https://dev.to/owen_fox/gemini-35-pro-release-date-expected-specs-and-what-flash-already-tells-us-36c</link>
      <guid>https://dev.to/owen_fox/gemini-35-pro-release-date-expected-specs-and-what-flash-already-tells-us-36c</guid>
      <description>&lt;h1&gt;
  
  
  Gemini 3.5 Pro: Release Date, Expected Specs, and What Flash Already Tells Us
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Google announced its flagship at I/O 2026 and then told the audience to wait a month for it. The Flash model that shipped instead is already outscoring the previous-generation Pro on coding benchmarks.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;TL;DR.&lt;/strong&gt; Gemini 3.5 Pro was supposed to headline Google I/O 2026 on May 19. It didn't. Sundar Pichai told the audience "give us until next month" — meaning June 2026, no committed date. What did ship is Gemini 3.5 Flash, and its benchmarks are the most useful data we have for forecasting Pro: Flash already beats Gemini 3.1 Pro on Terminal-Bench 2.1 (76.2% vs 70.3%), MCP Atlas (83.6% vs 78.2%), and Finance Agent v2 (57.9% vs 43.0%). If Pro extends the same gap over Flash, Google is shipping a coding-and-agents flagship in June that will force a real rethink against Claude Opus 4.7 and GPT-5.5. This piece is the realistic read on dates, pricing, capabilities, and how to prepare your code.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Google actually announced
&lt;/h2&gt;

&lt;p&gt;Gemini 3.5 Pro was named, demoed internally, and pushed. Sundar Pichai's exact phrasing on the I/O keynote: &lt;em&gt;"We're also hard at work on 3.5 Pro. It's already being used internally, and we look forward to rolling it out next month."&lt;/em&gt; That's the entire official statement. No spec sheet, no benchmark card, no API preview, no pricing tier.&lt;/p&gt;

&lt;p&gt;The delay drew audible groans from the live audience — Business Insider's reporter on the floor caught it — because everything else in the keynote (Spark, Antigravity 2, Search AI Mode) was framed around the Pro tier that wasn't there. (&lt;a href="https://letsdatascience.com/news/google-delays-gemini-35-pro-releases-35-flash-at-io-0dcab6cd" rel="noopener noreferrer"&gt;Let's Data Science&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;What we got instead was the &lt;a href="https://ofox.ai/blog/gemini-3-5-flash-coding-agents-guide-2026/" rel="noopener noreferrer"&gt;Gemini 3.5 Flash launch&lt;/a&gt; — $1.50/M input, $9.00/M output, 1M context window, 4x output token throughput vs comparable frontier models, GA day-of on Gemini API, AI Studio, Vertex, &lt;a href="https://ofox.ai/blog/google-antigravity-2-explained-gemini-desktop-agent-platform-2026/" rel="noopener noreferrer"&gt;Antigravity 2&lt;/a&gt;, and the Gemini app. Flash is the working artifact. Pro is the artifact we have to reason about from indirect evidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  When "next month" probably means
&lt;/h2&gt;

&lt;p&gt;Google's I/O timing tells you the window even if the date is open. The I/O keynote was May 19, 2026. "Next month" gives a range of June 1 to June 30. Two priors narrow it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;3.5 is reversing the launch order — but Pichai has capped the wait.&lt;/strong&gt; Historically Pro shipped &lt;em&gt;first&lt;/em&gt;: Gemini 3 Pro on November 18, 2025 with 3 Flash following on December 17; Gemini 3.1 Pro on February 19, 2026 with the 3.1 Flash family rolling out afterward. With 3.5 Flash leading at I/O, there is no clean prior for a Flash → Pro gap. What we do have is Pichai's "next month" commitment from the keynote, which caps the wait at June 30.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Google's quarterly cadence.&lt;/strong&gt; Pro tiers historically ship before the end of a fiscal quarter when announced, partly for board optics. June 30 is the Q2 end. Expect a drop in the last full week of June — best guess June 22-26 — unless safety or serving capacity slips it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What could push it out:&lt;/strong&gt; additional Frontier Safety Framework evaluation (Google has telegraphed this process for every 3.x flagship), TPU serving capacity if Spark and the new agent platform are eating capacity, or a benchmark embargo with a paper drop. None of these would push past July.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Flash already tells us about Pro
&lt;/h2&gt;

&lt;p&gt;This is the actual analytical work, and it's the only honest way to forecast a model that hasn't shipped. Three things Flash makes legible.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The generation jump is real, not incremental
&lt;/h3&gt;

&lt;p&gt;Gemini 3.5 Flash beats Gemini 3.1 Pro on the benchmarks Google itself prioritized. From Google's published &lt;a href="https://deepmind.google/models/model-cards/gemini-3-5-flash/" rel="noopener noreferrer"&gt;Gemini 3.5 Flash model card&lt;/a&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;What it measures&lt;/th&gt;
&lt;th&gt;3.5 Flash&lt;/th&gt;
&lt;th&gt;3.1 Pro&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Terminal-Bench 2.1&lt;/td&gt;
&lt;td&gt;Real terminal coding tasks&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;76.2%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;70.3%&lt;/td&gt;
&lt;td&gt;+5.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP Atlas&lt;/td&gt;
&lt;td&gt;Scaled tool-use reliability&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;83.6%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;78.2%&lt;/td&gt;
&lt;td&gt;+5.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Finance Agent v2&lt;/td&gt;
&lt;td&gt;Multi-step financial workflows&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;57.9%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;43.0%&lt;/td&gt;
&lt;td&gt;+14.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GDPval-AA (Elo)&lt;/td&gt;
&lt;td&gt;Economic-value task suite&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1656&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1314&lt;/td&gt;
&lt;td&gt;+342&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CharXiv Reasoning&lt;/td&gt;
&lt;td&gt;Chart/figure understanding&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;84.2%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;83.3%&lt;/td&gt;
&lt;td&gt;+0.9&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A Flash beating a Pro on coding-and-agentic work has not happened before in this family. The implication: the 3.5 generation isn't just a quality bump, it's a re-architecture for agentic loops specifically. Pro should extend the trend.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A useful crude forecast:&lt;/strong&gt; if 3.1 Pro → 3.5 Pro mirrors the gap between 3.1 Flash and 3.5 Flash (roughly +6-15 points on agentic benchmarks), Gemini 3.5 Pro lands at ~82-85% Terminal-Bench, ~88-90% MCP Atlas, and well into 70+ on Finance Agent v2. That's flagship territory against Claude Opus 4.7 and GPT-5.5, which trade leadership across this benchmark set depending on the task. Compare with &lt;a href="https://ofox.ai/blog/gpt-5-5-api-vs-claude-opus-gemini-3-1-flagship-2026/" rel="noopener noreferrer"&gt;the current flagship benchmark showdown&lt;/a&gt; — the picture changes meaningfully if Pro hits the upper end of this range.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Pricing has a floor and a ceiling
&lt;/h3&gt;

&lt;p&gt;Flash launched at $1.50/M input, $9.00/M output. Gemini 3.1 Pro sits at $2.00/M input, $12.00/M output. That's an unusual layout — Flash is now 25% cheaper than the prior-gen Pro while being benchmark-superior on coding. The new Pro has to be priced &lt;em&gt;higher&lt;/em&gt; than 3.1 Pro to make commercial sense, but it can't go too high without making the Flash + Pro combo less attractive than &lt;a href="https://ofox.ai/blog/ai-api-aggregation-access-every-model-one-endpoint/" rel="noopener noreferrer"&gt;bundling DeepSeek V4 Pro and Gemini Flash through a single endpoint&lt;/a&gt; for cost-shaping.&lt;/p&gt;

&lt;p&gt;Realistic price band for 3.5 Pro:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Floor:&lt;/strong&gt; $2.50/M input, $15/M output (a 25% premium over 3.1 Pro, mirroring the 3.5 Flash premium over 3.1 Flash)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Ceiling:&lt;/strong&gt; $3.50/M input, $20/M output (the upper bound before it starts overlapping with Anthropic and OpenAI flagship pricing and losing its differentiation)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Most likely:&lt;/strong&gt; $3.00/M input, $18/M output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For comparison: GPT-5.5 is roughly $5/$30, Claude Opus 4.7 is $5/$25. Even at the high end of the band Gemini 3.5 Pro stays meaningfully cheaper for output-heavy workloads — which is most agentic loops.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The 1M context window is staying
&lt;/h3&gt;

&lt;p&gt;Gemini 3.5 Flash kept the 1,048,576 input / 65,536 output token window. No evidence Google is reducing context in the 3.5 generation. Pro almost certainly keeps or expands this — long context remains a Gemini selling point alongside Claude Opus 4.7 (200k default, 1M on the dedicated long-context variant) and GPT-5.5 (1M via the standard API, 400k inside Codex), and Google's &lt;a href="https://ofox.ai/blog/google-antigravity-2-explained-gemini-desktop-agent-platform-2026/" rel="noopener noreferrer"&gt;Project Mariner and Antigravity 2 product story&lt;/a&gt; both depend on it. If anything, expect 3.5 Pro to push to 2M context as a marketing point.&lt;/p&gt;

&lt;p&gt;The remaining open question is recall quality at 128k+. 3.5 Flash actually &lt;em&gt;regressed&lt;/em&gt; on MRCR v2 at 128k (77.3% vs 3.1 Pro's 84.9%) — a six-point drop. That regression is the single biggest open question about 3.5 Pro. If Pro inherits it, the "1M context" claim becomes weaker in practice for real long-document retrieval and you'd still want 3.1 Pro for those workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Pro probably won't be
&lt;/h2&gt;

&lt;p&gt;A grounded forecast also needs to say what &lt;em&gt;isn't&lt;/em&gt; changing.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;It's not getting a step-change in modalities.&lt;/strong&gt; 3.5 Flash already takes text, images, video, audio, and PDFs in and emits text. Pro almost certainly matches but doesn't extend this set on day one. Native image-out lives in Nano Banana / Imagen, not the main Gemini chat tier.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;It's not going to drop below Flash's prices.&lt;/strong&gt; Google needs a Pro tier with margins. The whole point of having Flash + Pro is price discrimination across workload sensitivity.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;It's not shipping with a public model card before launch day.&lt;/strong&gt; Google's pattern has been simultaneous model card + GA. Don't expect benchmark leaks; expect a Tuesday morning launch with the deck ready.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;The naming probably stays "Gemini 3.5 Pro."&lt;/strong&gt; There's been no signal pointing toward a rename, and Google has been more naming-disciplined than OpenAI in the 3.x generation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to prepare today
&lt;/h2&gt;

&lt;p&gt;If you're shipping anything that will rely on Gemini 3.5 Pro the moment it lands, the practical prep is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Build against 3.5 Flash now.&lt;/strong&gt; The API surface and tool-use shape are the same between Flash and Pro tiers in this generation. Through ofox the model id is &lt;code&gt;google/gemini-3.5-flash&lt;/code&gt;. When Pro ships, swap to &lt;code&gt;google/gemini-3.5-pro&lt;/code&gt; — no SDK or schema rewrite. The &lt;a href="https://ofox.ai/blog/openai-sdk-migration-to-ofoxai-guide-2026/" rel="noopener noreferrer"&gt;ofox OpenAI-compatible endpoint&lt;/a&gt; handles request translation either way.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Today
client.chat.completions.create(
    model="google/gemini-3.5-flash",
    messages=[...],
)

# Day Pro ships, change one string
client.chat.completions.create(
    model="google/gemini-3.5-pro",
    messages=[...],
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Use Flash as the floor in routing.&lt;/strong&gt; A common pattern: route trivial work to Flash, escalate to a flagship (Opus 4.7, GPT-5.5, or soon Gemini 3.5 Pro) only when Flash returns low confidence. See the &lt;a href="https://ofox.ai/blog/claude-code-hybrid-routing-pattern-2026/" rel="noopener noreferrer"&gt;Claude Code hybrid routing pattern&lt;/a&gt; for the production-grade version of this. When 3.5 Pro lands, you swap &lt;em&gt;which&lt;/em&gt; flagship sits behind the escalation gate — your routing logic doesn't change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Don't pre-commit to pricing.&lt;/strong&gt; The realistic range above ($2.50-$3.50 input, $15-$20 output) is informed speculation. If you're writing a cost projection for finance, plug in both endpoints of the band and ship two scenarios.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Watch for the model card on Google's blog.&lt;/strong&gt; That's how every 3.x model has launched — a single blog post on blog.google with the full benchmark grid. No staged rollouts, no Twitter teasers from product managers. Subscribe to the Gemini API &lt;a href="https://ai.google.dev/gemini-api/docs/changelog" rel="noopener noreferrer"&gt;changelog&lt;/a&gt; if you want push notification of the moment Pro becomes addressable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bigger picture for model selection
&lt;/h2&gt;

&lt;p&gt;By late June 2026 three things are happening at once:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Claude Opus 4.7&lt;/strong&gt; still holds reasoning-heavy benchmarks, especially long-horizon agent runs.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;GPT-5.5&lt;/strong&gt; owns raw multimodal reasoning and the deepest ecosystem of tooling.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Gemini 3.5 Pro&lt;/strong&gt; — if Flash's gains carry forward — undercuts both on price and pushes them on Terminal-Bench-style agentic coding.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Picking the right model gets harder before it gets easier. The &lt;a href="https://ofox.ai/blog/llm-api-selection-decision-matrix-mid-2026-english/" rel="noopener noreferrer"&gt;LLM API selection decision matrix&lt;/a&gt; and &lt;a href="https://ofox.ai/blog/llm-leaderboard-best-ai-models-ranked-2026/" rel="noopener noreferrer"&gt;the leaderboard view&lt;/a&gt; will both need a rewrite the week Pro ships. For the canonical three-way framing of how to choose, see the &lt;a href="https://ofox.ai/blog/claude-vs-gpt-vs-gemini-model-comparison-guide-2026/" rel="noopener noreferrer"&gt;Claude vs GPT vs Gemini comparison guide&lt;/a&gt; — that's the piece that'll get the biggest update.&lt;/p&gt;

&lt;p&gt;If you're choosing today and the work is coding-and-agents leaning, Gemini 3.5 Flash already beats last-generation Pro at 25% lower cost. There's no reason to wait. If the work is reasoning-heavy or you care about long-context recall quality, stay on Gemini 3.1 Pro or Claude Opus 4.7 for now and re-evaluate when Pro lands. The thing not to do is sit on your hands assuming the new shiny thing will solve a problem you could solve this week.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Six weeks of public Flash benchmarks have arrived before the Pro model card exists — and they say the cost-quality frontier in coding is about to shift on a Tuesday in late June.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Sources and citations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Sundar Pichai's "next month" quote and the I/O 2026 keynote framing: &lt;a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/" rel="noopener noreferrer"&gt;Google's official blog&lt;/a&gt; (May 19, 2026)&lt;/li&gt;
&lt;li&gt;  Gemini 3.5 Flash benchmark grid: &lt;a href="https://deepmind.google/models/model-cards/gemini-3-5-flash/" rel="noopener noreferrer"&gt;DeepMind model card&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Flash pricing and context window: &lt;a href="https://ai.google.dev/gemini-api/docs/changelog" rel="noopener noreferrer"&gt;Google AI for Developers — Gemini API changelog&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  The delay framing and audience reaction: &lt;a href="https://letsdatascience.com/news/google-delays-gemini-35-pro-releases-35-flash-at-io-0dcab6cd" rel="noopener noreferrer"&gt;Let's Data Science I/O recap&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Comparative pricing for Claude Opus 4.7 and GPT-5.5: confirmed against ofox model catalog at the time of writing&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://ofox.ai/blog/gemini-3-5-pro-release-date-expected-specs-2026/" rel="noopener noreferrer"&gt;ofox.ai/blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>modelcomparison</category>
      <category>industryanalysis</category>
    </item>
    <item>
      <title>AI API Pricing Comparison May 2026: Every Major Model in One Table</title>
      <dc:creator>Owen</dc:creator>
      <pubDate>Tue, 26 May 2026 04:06:52 +0000</pubDate>
      <link>https://dev.to/owen_fox/ai-api-pricing-comparison-may-2026-every-major-model-in-one-table-112g</link>
      <guid>https://dev.to/owen_fox/ai-api-pricing-comparison-may-2026-every-major-model-in-one-table-112g</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;: The frontier-model price gap in May 2026 is more than &lt;strong&gt;100x on output tokens&lt;/strong&gt; — GPT-5.5 charges $30/M, Claude Haiku 4.5 charges $5/M, and DeepSeek V4 Flash charges $0.28/M for a 1M-context model that benchmarks above GPT-4o. The sticker price almost never tells you what you'll actually pay: caching, batching, and long-context surcharges swing real costs by 50–90% in either direction. This table compiles every major API's verified May 2026 price, with the discount math built in.&lt;/p&gt;

&lt;p&gt;The cheapest API in 2026 is not cheaper than the most expensive one — it's eighty-eight times cheaper on a real coding task, hundreds of times cheaper on raw sticker. That's not a rounding error, that's an architecture decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's in this table
&lt;/h2&gt;

&lt;p&gt;Nine providers, twenty-three models, verified from each vendor's pricing page in late May 2026. Prices in USD per million tokens (M = 1,000,000). Output is always more expensive than input — usually 4–6x; on GPT-5.5 it's a punishing 6x.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input $/M&lt;/th&gt;
&lt;th&gt;Output $/M&lt;/th&gt;
&lt;th&gt;Cached input $/M&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI GPT-5.5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;$30.00&lt;/td&gt;
&lt;td&gt;$0.50&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;Flagship reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI GPT-5.5 Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$30.00&lt;/td&gt;
&lt;td&gt;$180.00&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;Hardest-tasks tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI GPT-5.4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$2.50&lt;/td&gt;
&lt;td&gt;$15.00&lt;/td&gt;
&lt;td&gt;$0.25&lt;/td&gt;
&lt;td&gt;272K&lt;/td&gt;
&lt;td&gt;Previous flagship; 1M extended on opt-in&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI GPT-5.4 Mini&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.75&lt;/td&gt;
&lt;td&gt;$4.50&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;200K&lt;/td&gt;
&lt;td&gt;Mid-tier workhorse&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI GPT-5.4 Nano&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.20&lt;/td&gt;
&lt;td&gt;$1.25&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;td&gt;Cheapest OpenAI option&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Anthropic Claude Opus 4.7&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;$25.00&lt;/td&gt;
&lt;td&gt;$0.50&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;Best agent reliability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Anthropic Claude Sonnet 4.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;$15.00&lt;/td&gt;
&lt;td&gt;$0.30&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;Best price/perf in flagship class&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Anthropic Claude Haiku 4.5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$1.00&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;Cheapest US-lab frontier model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Google Gemini 3.1 Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$2.00&lt;/td&gt;
&lt;td&gt;$12.00&lt;/td&gt;
&lt;td&gt;$0.20&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;$4.00/$18.00 above 200K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Google Gemini 2.5 Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$1.25&lt;/td&gt;
&lt;td&gt;$10.00&lt;/td&gt;
&lt;td&gt;$0.125&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;$2.50/$15.00 above 200K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Google Gemini 2.5 Flash&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.30&lt;/td&gt;
&lt;td&gt;$2.50&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;Cheap multimodal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Google Gemini 2.5 Flash-Lite&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;$0.40&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;Cheapest Google option&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;xAI Grok 4.3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$1.25&lt;/td&gt;
&lt;td&gt;$2.50&lt;/td&gt;
&lt;td&gt;$0.20&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;Output-cheap reasoner&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;xAI Grok 4.20&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$1.25&lt;/td&gt;
&lt;td&gt;$2.50&lt;/td&gt;
&lt;td&gt;$0.20&lt;/td&gt;
&lt;td&gt;2M&lt;/td&gt;
&lt;td&gt;Speed + tool-calling tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;xAI Grok 4.1 Fast&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.20&lt;/td&gt;
&lt;td&gt;$0.50&lt;/td&gt;
&lt;td&gt;$0.05&lt;/td&gt;
&lt;td&gt;2M&lt;/td&gt;
&lt;td&gt;Budget agentic tool-caller&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DeepSeek V4 Flash&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.14&lt;/td&gt;
&lt;td&gt;$0.28&lt;/td&gt;
&lt;td&gt;$0.0028&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;Best $/quality globally&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DeepSeek V4 Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.435&lt;/td&gt;
&lt;td&gt;$0.87&lt;/td&gt;
&lt;td&gt;$0.0036&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;Flagship (75% off until 2026-05-31)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Moonshot Kimi K2.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.95&lt;/td&gt;
&lt;td&gt;$4.00&lt;/td&gt;
&lt;td&gt;$0.16&lt;/td&gt;
&lt;td&gt;262K&lt;/td&gt;
&lt;td&gt;Strong coding model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Moonshot Kimi K2.5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.60&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;262K&lt;/td&gt;
&lt;td&gt;Cheaper sibling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Alibaba Qwen3-Max&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$1.20&lt;/td&gt;
&lt;td&gt;$6.00&lt;/td&gt;
&lt;td&gt;$0.24&lt;/td&gt;
&lt;td&gt;262K&lt;/td&gt;
&lt;td&gt;Tiered: 2x above 32K input, 2.5x above 128K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mistral Large 3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.50&lt;/td&gt;
&lt;td&gt;$1.50&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;262K&lt;/td&gt;
&lt;td&gt;Aggressive EU pricing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Meta Llama 4 Maverick&lt;/strong&gt; (via DeepInfra)&lt;/td&gt;
&lt;td&gt;$0.15&lt;/td&gt;
&lt;td&gt;$0.60&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;Cheap open-weight large&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Meta Llama 4 Scout&lt;/strong&gt; (via DeepInfra)&lt;/td&gt;
&lt;td&gt;$0.08&lt;/td&gt;
&lt;td&gt;$0.30&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;10M native (DeepInfra caps at 320K)&lt;/td&gt;
&lt;td&gt;Cheapest tier overall&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A few things the sticker doesn't show. Those come next.&lt;/p&gt;

&lt;h2&gt;
  
  
  The discount math behind the sticker
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prompt caching cuts repeated-prefix input by ~90%
&lt;/h3&gt;

&lt;p&gt;Every major US-lab model offers some form of prompt caching now. The shape is the same: a long system prompt or document gets cached, and subsequent reads charge a fraction of the input rate. &lt;strong&gt;Anthropic and OpenAI both cut cached input by 90%.&lt;/strong&gt; DeepSeek's cache hit on V4 is 98% off ($0.0028/M vs $0.14/M miss). Gemini caches at ~10% of base.&lt;/p&gt;

&lt;p&gt;The catch: caching only helps when the &lt;em&gt;same prefix&lt;/em&gt; repeats across many requests inside a short TTL window (typically 5 minutes on Anthropic, longer on others). A chatbot serving many users with shared system prompts: huge win. A coding agent rewriting context every turn: zero help.&lt;/p&gt;

&lt;p&gt;Worked example. You're running a customer-support bot with a 4K-token system prompt and 1K-token user turns, serving 100 messages an hour. On Claude Sonnet 4.6:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Without caching: 100 × 5K × $3/M = &lt;strong&gt;$1.50/hr input&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  With caching (system prompt cached): 100 × 4K × $0.30/M + 100 × 1K × $3/M = $0.12 + $0.30 = &lt;strong&gt;$0.42/hr input&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A 72% cut on a workload most teams already run.&lt;/p&gt;

&lt;h3&gt;
  
  
  Batch API cuts everything by 50%
&lt;/h3&gt;

&lt;p&gt;If you can wait 24 hours, every major provider gives you exactly 50% off. Anthropic, OpenAI, Google, Mistral — all the same. For offline jobs (overnight document processing, dataset labeling, summary generation on yesterday's data) this is free money. Most production traffic can't use it because users want answers in seconds, not tomorrow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Long-context surcharges on Gemini
&lt;/h3&gt;

&lt;p&gt;Google is the only major provider charging a long-context premium. Above 200K tokens, both Gemini 2.5 Pro and Gemini 3.1 Pro roughly double their input price and add ~50% to output. Anthropic, which also offers 1M-context Claude models, charges flat across the full context.&lt;/p&gt;

&lt;p&gt;If your typical request is below 100K tokens, this is moot. If you're feeding entire codebases or 500-page PDFs, the headline Gemini price is misleading by a factor of two.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it actually costs to do real work
&lt;/h2&gt;

&lt;p&gt;Sticker prices in isolation are useless. Here's what one realistic workload costs across the lineup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario: a coding agent processing one task end-to-end.&lt;/strong&gt; Roughly 40K input tokens (context + retrieved code + tool results) and 8K output tokens (reasoning + final code). About one task is one minute of human-developer-equivalent work.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Cost per task&lt;/th&gt;
&lt;th&gt;Tasks per $1&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5&lt;/td&gt;
&lt;td&gt;$0.44&lt;/td&gt;
&lt;td&gt;2.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.7&lt;/td&gt;
&lt;td&gt;$0.40&lt;/td&gt;
&lt;td&gt;2.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 3.1 Pro&lt;/td&gt;
&lt;td&gt;$0.18&lt;/td&gt;
&lt;td&gt;5.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
&lt;td&gt;$0.24&lt;/td&gt;
&lt;td&gt;4.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.4&lt;/td&gt;
&lt;td&gt;$0.22&lt;/td&gt;
&lt;td&gt;4.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kimi K2.6&lt;/td&gt;
&lt;td&gt;$0.07&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen3-Max&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Haiku 4.5&lt;/td&gt;
&lt;td&gt;$0.08&lt;/td&gt;
&lt;td&gt;12.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grok 4.3&lt;/td&gt;
&lt;td&gt;$0.07&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.4 Nano&lt;/td&gt;
&lt;td&gt;$0.02&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4 Flash&lt;/td&gt;
&lt;td&gt;$0.008&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;125&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 4 Scout&lt;/td&gt;
&lt;td&gt;$0.005&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;200&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The ratio between cheapest and most expensive at this workload is 88x. That gap, run a million times, is the difference between a $5,000 month and a $440,000 month.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to pick a tier
&lt;/h2&gt;

&lt;p&gt;A simple decision tree that holds up across most teams.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do you need the absolute best reasoning on the hardest 5% of tasks?&lt;/strong&gt; GPT-5.5 Pro or Claude Opus 4.7. Pay the premium, don't try to be clever.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do you need frontier quality on routine work?&lt;/strong&gt; Claude Sonnet 4.6 or Gemini 3.1 Pro. Sonnet wins on agent reliability; Gemini wins on multimodal and 1M context recall.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are you on a budget but need US-lab quality?&lt;/strong&gt; Claude Haiku 4.5 or GPT-5.4 Mini. Both punch above their price tag.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are you cost-sensitive and OK with open-weight quality?&lt;/strong&gt; DeepSeek V4 Flash is the answer for most teams — 1M context at $0.14/$0.28. Llama 4 Scout if you can route through DeepInfra and don't need vision.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are you doing offline / batch work?&lt;/strong&gt; Pick anything and add &lt;code&gt;--batch&lt;/code&gt; for 50% off. The model choice matters less than turning batch on.&lt;/p&gt;

&lt;p&gt;This is the same logic our &lt;a href="https://ofox.ai/blog/llm-api-selection-decision-matrix-mid-2026-english/" rel="noopener noreferrer"&gt;LLM API selection decision matrix&lt;/a&gt; lays out by use case if you want a longer breakdown.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this table doesn't show
&lt;/h2&gt;

&lt;p&gt;Three caveats worth knowing before you route off these numbers.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Rate limits matter more than price for many teams.&lt;/strong&gt; A $0.30/M model you can't get capacity on at peak is more expensive than a $5/M model you can. OpenAI and Anthropic have the most generous tiers; the cheaper Chinese models often gate hard on enterprise quotas.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Quality is not flat within a price band.&lt;/strong&gt; Claude Sonnet 4.6 and Gemini 3.1 Pro are priced similarly but win on different tasks. Sonnet leads on multi-turn agent reliability; Gemini leads on 1M+ token recall and image input. There's no substitute for running your eval on both.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Provider markup is real.&lt;/strong&gt; Going through a reseller adds 5–20% in most cases. We break down &lt;a href="https://ofox.ai/blog/openrouter-pricing-hidden-markup-breakdown-2026/" rel="noopener noreferrer"&gt;OpenRouter's actual margin&lt;/a&gt; versus first-party APIs in a separate piece — short version: it's higher than they advertise once you account for routing costs.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;First-party Claude pricing matches ofox pricing.&lt;/strong&gt; Anthropic does not let resellers undercut; the only saving is from removing the need for multiple billing relationships. That logic applies to all the big labs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The aggregator question
&lt;/h2&gt;

&lt;p&gt;You can pay nine providers separately, manage nine API keys, and reconcile nine invoices. Or you can route everything through one OpenAI-compatible endpoint. &lt;a href="https://ofox.ai" rel="noopener noreferrer"&gt;ofox.ai&lt;/a&gt; is the aggregator we run — one key for every model in this table, OpenAI-compatible SDK, and prices that match each provider's first-party rate for flagship models with up to 70% off on open-weight ones. We're &lt;a href="https://ofox.ai/blog/openrouter-alternatives-2026/" rel="noopener noreferrer"&gt;not the only option&lt;/a&gt;, but the math is similar across aggregators: the value is in not maintaining nine integrations, not in saving 2% on token cost.&lt;/p&gt;

&lt;p&gt;For a deeper read on flagship-level differences, &lt;a href="https://ofox.ai/blog/claude-vs-gpt-vs-gemini-model-comparison-guide-2026/" rel="noopener noreferrer"&gt;Claude vs GPT vs Gemini&lt;/a&gt; is the pillar piece this article links into. For first-party tier breakdowns specifically: &lt;a href="https://ofox.ai/blog/claude-api-pricing-complete-breakdown-2026/" rel="noopener noreferrer"&gt;Claude API pricing breakdown&lt;/a&gt;, &lt;a href="https://ofox.ai/blog/gemini-3-1-pro-api-pricing-performance-guide-2026/" rel="noopener noreferrer"&gt;Gemini 3.1 Pro pricing&lt;/a&gt;, &lt;a href="https://ofox.ai/blog/gpt-5-4-pro-api-guide-pricing-setup-2026/" rel="noopener noreferrer"&gt;GPT-5.4 Pro pricing&lt;/a&gt;, &lt;a href="https://ofox.ai/blog/deepseek-api-pricing-guide-2026/" rel="noopener noreferrer"&gt;DeepSeek V4 pricing&lt;/a&gt;, and &lt;a href="https://ofox.ai/blog/how-to-reduce-ai-api-costs-2026/" rel="noopener noreferrer"&gt;how to actually reduce AI API costs&lt;/a&gt;. The &lt;a href="https://ofox.ai/blog/llm-leaderboard-best-ai-models-ranked-2026/" rel="noopener noreferrer"&gt;May 2026 LLM leaderboard&lt;/a&gt; is the quality-side companion to this price-side table.&lt;/p&gt;

&lt;p&gt;The right model is the one whose price you don't have to think about — pick it for capability and let the bill take care of itself, or pick it for cost and let the capability ceiling decide your roadmap. Anything in between just means you'll switch in six months.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing sources (verified May 26, 2026)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  OpenAI: &lt;a href="https://openai.com/api/pricing/" rel="noopener noreferrer"&gt;openai.com/api/pricing&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Anthropic: &lt;a href="https://platform.claude.com/docs/en/about-claude/pricing" rel="noopener noreferrer"&gt;platform.claude.com/docs/en/about-claude/pricing&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Google Gemini: &lt;a href="https://ai.google.dev/gemini-api/docs/pricing" rel="noopener noreferrer"&gt;ai.google.dev/gemini-api/docs/pricing&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  xAI: &lt;a href="https://x.ai/api" rel="noopener noreferrer"&gt;x.ai/api&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  DeepSeek: &lt;a href="https://api-docs.deepseek.com/quick_start/pricing" rel="noopener noreferrer"&gt;api-docs.deepseek.com/quick_start/pricing&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Moonshot Kimi: official pricing via openrouter.ai/moonshotai&lt;/li&gt;
&lt;li&gt;  Alibaba Qwen: alibabacloud.com/help/en/model-studio/model-pricing&lt;/li&gt;
&lt;li&gt;  Mistral: &lt;a href="https://mistral.ai/pricing" rel="noopener noreferrer"&gt;mistral.ai/pricing&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Meta Llama 4 (via DeepInfra): &lt;a href="https://deepinfra.com/meta-llama/Llama-4-Scout-17B-16E-Instruct" rel="noopener noreferrer"&gt;deepinfra.com/meta-llama&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This table will be re-verified at the start of each month. If a number here disagrees with the provider's page, the provider wins — but tell us, because we want to keep this honest.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://ofox.ai/blog/ai-api-pricing-comparison-2026/" rel="noopener noreferrer"&gt;ofox.ai/blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>pricing</category>
      <category>llm</category>
      <category>api</category>
    </item>
    <item>
      <title>Agentic Coding in 2026: Claude Code vs Codex CLI vs Gemini CLI vs Cursor Agent</title>
      <dc:creator>Owen</dc:creator>
      <pubDate>Mon, 25 May 2026 14:37:35 +0000</pubDate>
      <link>https://dev.to/owen_fox/agentic-coding-in-2026-claude-code-vs-codex-cli-vs-gemini-cli-vs-cursor-agent-4afn</link>
      <guid>https://dev.to/owen_fox/agentic-coding-in-2026-claude-code-vs-codex-cli-vs-gemini-cli-vs-cursor-agent-4afn</guid>
      <description>&lt;h1&gt;
  
  
  Agentic Coding in 2026: Claude Code vs Codex CLI vs Gemini CLI vs Cursor Agent
&lt;/h1&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Agentic coding has fragmented into four specialized tools. Claude Code excels at high-quality pair programming with human oversight. Codex CLI dominates unattended multi-hour tasks with Goal mode reaching 82.7% on Terminal-Bench 2.0. Gemini CLI transitions to Antigravity CLI on June 18, 2026. Cursor Agent uniquely offers cloud VM-based background agents with browser/desktop capabilities and eight-way parallelism.&lt;/p&gt;

&lt;p&gt;The fundamental shift: agents now operate beyond terminals—Codex runs unattended for hours, Cursor agents click through browsers in cloud VMs, and Gemini consolidates into a full desktop platform. The production strategy is not choosing one tool, but composing all three by task type through a unified API gateway.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed in 2026 for Agentic Coding CLIs
&lt;/h2&gt;

&lt;p&gt;Agentic coding evolved from "model writes a function" to "model owns multi-step tasks from specification to verified output." Each of the four mature CLIs occupies different positions on the autonomy spectrum:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Claude Code (Anthropic)&lt;/strong&gt; prioritizes human partnership, running locally with approval gates and extension hooks for developer control.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Codex CLI (OpenAI)&lt;/strong&gt; maximizes autonomy—Goal mode runs unattended with thousands of sequential tool calls demonstrated without intervention.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Gemini CLI (Google)&lt;/strong&gt; offered middle-ground conversational ReAct loops with 1M-token context until the announced transition to Antigravity CLI.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cursor Agent (Cursor)&lt;/strong&gt; abandoned the terminal entirely for cloud VMs with desktop and browser capabilities, supporting up to eight parallel background agents.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The category fragmentation reflects a shifted question: "How much autonomy do I delegate, for how long, and where should execution occur?"&lt;/p&gt;

&lt;h2&gt;
  
  
  The Five-Minute Decision Matrix
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;CLI&lt;/th&gt;
&lt;th&gt;Autonomy Model&lt;/th&gt;
&lt;th&gt;Execution Environment&lt;/th&gt;
&lt;th&gt;Primary Model&lt;/th&gt;
&lt;th&gt;Key Strength&lt;/th&gt;
&lt;th&gt;Main Challenge&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;Approval-gated pair programmer&lt;/td&gt;
&lt;td&gt;Local terminal&lt;/td&gt;
&lt;td&gt;Claude Opus 4.7 / Sonnet 4.6&lt;/td&gt;
&lt;td&gt;Hooks, subagents, Skills with PostToolUse output replacement (May 2026)&lt;/td&gt;
&lt;td&gt;Pro tier subscription throttle&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex CLI&lt;/td&gt;
&lt;td&gt;Unattended Goal mode over hours&lt;/td&gt;
&lt;td&gt;Local or headless&lt;/td&gt;
&lt;td&gt;GPT-5.5 (ofox: GPT-5.4 Pro, GPT-5.3)&lt;/td&gt;
&lt;td&gt;GA Goal mode, 82.7% Terminal-Bench score, remote computer use&lt;/td&gt;
&lt;td&gt;Less idiomatic first-pass output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini CLI&lt;/td&gt;
&lt;td&gt;Conversational ReAct loop&lt;/td&gt;
&lt;td&gt;Local terminal (sunsetting June 18)&lt;/td&gt;
&lt;td&gt;Gemini 3.1 Pro / Flash&lt;/td&gt;
&lt;td&gt;1M context window, free tier (60 RPM/1000 RPD), MCP support&lt;/td&gt;
&lt;td&gt;Consolidating into Antigravity CLI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cursor Agent&lt;/td&gt;
&lt;td&gt;Cloud VM background fleet&lt;/td&gt;
&lt;td&gt;Editor + cloud VM&lt;/td&gt;
&lt;td&gt;Composer 2 or Claude/GPT/Gemini&lt;/td&gt;
&lt;td&gt;Desktop/browser per agent, 8x parallel fan-out&lt;/td&gt;
&lt;td&gt;Credit-based premium model billing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Quick guidance:&lt;/strong&gt; Claude Code for craftsmanship; Codex CLI for endurance; Gemini CLI for free-tier exploration before June 18; Cursor Agent for parallelism.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Code: The Pair-Programmer Model
&lt;/h2&gt;

&lt;p&gt;Claude Code's philosophy keeps developers in control. The terminal-resident CLI operates against local filesystems, requires approval before destructive changes, and exposes state through &lt;code&gt;/context&lt;/code&gt; and &lt;code&gt;/cost&lt;/code&gt; introspection commands. Claude Opus 4.7 is the default as of May 2026 (upgraded from 4.6), with Sonnet 4.6 handling the broader workload at lower cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Extensibility Architecture (Three Layers)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Hooks&lt;/strong&gt; execute shell commands at lifecycle events—PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart. The May 2026 upgrade enabled PostToolUse hooks to replace tool output across all tools via &lt;code&gt;hookSpecificOutput.updatedToolOutput&lt;/code&gt;, enforcing patterns like "run tests before stopping" or "block edits to generated files."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Subagents&lt;/strong&gt; spawn focused workers with isolated context windows, custom prompts, and bounded tool permissions. The primary agent handles planning while specialist subagents manage discrete tasks like code review or security scanning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills&lt;/strong&gt; package reusable expertise as markdown files plus optional scripts, functioning like internal libraries distributed across teams.&lt;/p&gt;

&lt;p&gt;This design reflects the autonomy philosophy: short turns, frequent approvals, granular control. Extended unattended runs conflict with the architecture's core assumption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Economic constraint:&lt;/strong&gt; Pro at $20/month enforces hard ceilings. Max 5x ($100) and Max 20x ($200) raise limits without eliminating them—a direct disadvantage for "set and forget" workflows, precisely where Codex CLI operates.&lt;/p&gt;

&lt;h2&gt;
  
  
  Codex CLI: The Autonomy Champion
&lt;/h2&gt;

&lt;p&gt;Codex CLI targets tasks measured in hours rather than minutes. The May 2026 changelog confirms: Goal mode transitioned from experimental to GA across the Codex app, IDE extensions, and CLI. OpenAI demonstrated 1,000+ sequential tool calls on real software tasks without intervention; Terminal-Bench 2.0 scores of 82.7% on GPT-5.5 provide empirical validation.&lt;/p&gt;

&lt;p&gt;Remote computer use (May 2026 feature) exemplifies the autonomy bet—Codex operates Mac desktop apps after screen lock, including remote access via Codex Mobile. Authorization is time-limited, displays covered, and local input triggers relock, but the philosophy is explicit: agents don't require constant observation.&lt;/p&gt;

&lt;p&gt;Codex CLI 0.125.0 added reasoning-token usage reporting in &lt;code&gt;codex exec --json&lt;/code&gt;, closing observability gaps. Multi-hour session budgeting now achieves production-grade accuracy via token-level reporting and OpenTelemetry traces.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trade-offs Worth Naming
&lt;/h3&gt;

&lt;p&gt;First-pass edits show slightly lower idiomaticity compared to Claude, particularly on tight refactors. The workaround: route through GPT-5.4 Pro via ofox or GPT-5.3 Codex if GPT-5.5 availability lags.&lt;/p&gt;

&lt;p&gt;Codex CLI mirrors OpenAI's ecosystem—tool-calling formats, prompt conventions, and trace output reflect wider OpenAI infrastructure. Anthropic-primary shops find Claude Code more native.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gemini CLI: The Conversational ReAct Loop (With a June 18 Deadline)
&lt;/h2&gt;

&lt;p&gt;Gemini CLI implements the simplest design: reason-and-act loops with built-in tools (Google Search grounding, shell, file operations, web fetch) plus MCP support. The 1M-token context window was uniquely accessible in a terminal, and the free tier (60 requests/minute, 1,000 requests/day on personal accounts) was unmatched for low-friction agentic exploration.&lt;/p&gt;

&lt;h3&gt;
  
  
  The June 18, 2026 Transition
&lt;/h3&gt;

&lt;p&gt;Google announced May 12, 2026 that Gemini CLI and Gemini Code Assist IDE extensions stop serving Google AI Pro/Ultra and free Gemini Code Assist on &lt;strong&gt;June 18, 2026&lt;/strong&gt;. The consolidation target is Google Antigravity—an agent-first platform featuring server-side infrastructure and Antigravity CLI as the terminal equivalent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concrete implications:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Personal free-tier users migrate to Antigravity CLI by June 18; free tier translates forward.&lt;/li&gt;
&lt;li&gt;Paid Google AI Pro/Ultra subscribers face the same migration requirement.&lt;/li&gt;
&lt;li&gt;Self-hosted users with custom API keys can continue via open-source community forks, though corporate recommendations shift toward Antigravity.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This represents re-platforming rather than agentic-coding deprecation. Gemini 3.1 Pro and Gemini 3.1 Flash remain available on ofox and other aggregators; the distribution channel moves.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When Gemini CLI still wins (through June 18):&lt;/strong&gt; free-tier exploration, MCP server prototyping with generous context, pattern testing without paid subscriptions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cursor Agent: The Fleet Model
&lt;/h2&gt;

&lt;p&gt;Cursor rejected terminal-first architecture entirely. Editor-centric from inception, 2026 pushed agents into cloud VMs with dedicated desktops and browsers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Background Agents Architecture
&lt;/h3&gt;

&lt;p&gt;Cursor clones repositories into cloud VMs where agents work on dedicated branches with full desktop and browser access. Results surface as pull requests while you continue local editing. February 2026 upgrades added desktop-per-agent infrastructure—each Background Agent receives its own development environment, browser, and UI interaction capabilities. Agents can launch browsers, navigate localhost, click UI elements, and visually verify code changes before opening PRs.&lt;/p&gt;

&lt;p&gt;Fan-out extends to eight parallel agents—unique across the four CLIs. Dependency upgrades spanning services, test backfills, or standardized changes across multiple repositories genuinely unlock parallelism unavailable elsewhere.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost structure:&lt;/strong&gt; each Background Agent consumes Cursor credits; parallelism has real economic trade-offs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Foreground Capabilities
&lt;/h3&gt;

&lt;p&gt;Composer 2, Cursor's first-party agentic model, claims ~4x speed versus frontier peers, with typical agent turns finishing under 30 seconds. Auto mode is credit-free; premium model pins (Claude Sonnet 4.6, GPT-5.5) consume credits. The $20 Pro plan translates to approximately $20 monthly credits plus unlimited Tab completions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When Cursor Agent dominates:&lt;/strong&gt; editor-native workflows, high-volume repetitive work benefiting from fan-out (dependency upgrades, test backfills, bulk find-and-replace), or scenarios requiring visual UI verification.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Use-Case Matrix
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Best Primary&lt;/th&gt;
&lt;th&gt;Fallback&lt;/th&gt;
&lt;th&gt;Rationale&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;High-quality refactors with oversight&lt;/td&gt;
&lt;td&gt;Claude Code (Opus 4.7)&lt;/td&gt;
&lt;td&gt;Cursor Agent&lt;/td&gt;
&lt;td&gt;Approval-gated execution, superior idiomatic output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-hour unattended execution&lt;/td&gt;
&lt;td&gt;Codex CLI Goal mode&lt;/td&gt;
&lt;td&gt;Cursor Background Agent&lt;/td&gt;
&lt;td&gt;Designed for walk-away autonomy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browser-based UI verification&lt;/td&gt;
&lt;td&gt;Cursor Background Agent&lt;/td&gt;
&lt;td&gt;Codex remote computer use&lt;/td&gt;
&lt;td&gt;Desktop/browser environment per agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Eight-way parallel fan-out (deps)&lt;/td&gt;
&lt;td&gt;Cursor Background Agents&lt;/td&gt;
&lt;td&gt;Codex CLI scripted&lt;/td&gt;
&lt;td&gt;Native parallelism&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free-tier exploration (pre-June 18)&lt;/td&gt;
&lt;td&gt;Gemini CLI&lt;/td&gt;
&lt;td&gt;Cursor Hobby&lt;/td&gt;
&lt;td&gt;1M context, no card required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free-tier exploration (post-June 18)&lt;/td&gt;
&lt;td&gt;Antigravity CLI&lt;/td&gt;
&lt;td&gt;Gemini CLI (BYO-key)&lt;/td&gt;
&lt;td&gt;Free tier migration destination&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local-only, no cloud VMs&lt;/td&gt;
&lt;td&gt;Claude Code or Codex CLI&lt;/td&gt;
&lt;td&gt;Gemini CLI (BYO-key)&lt;/td&gt;
&lt;td&gt;Both remain on-machine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP-heavy custom tools&lt;/td&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;Gemini CLI&lt;/td&gt;
&lt;td&gt;Most mature MCP integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Headless / CI integration&lt;/td&gt;
&lt;td&gt;Codex CLI&lt;/td&gt;
&lt;td&gt;Claude Code (&lt;code&gt;--print&lt;/code&gt; mode)&lt;/td&gt;
&lt;td&gt;Remote-control entrypoint, OpenTelemetry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Strict $30/month budget&lt;/td&gt;
&lt;td&gt;DeepSeek TUI + Cursor Hobby&lt;/td&gt;
&lt;td&gt;Gemini CLI free tier&lt;/td&gt;
&lt;td&gt;See $30/month coding stack guide&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  How to Configure All Four Against One API Key
&lt;/h2&gt;

&lt;p&gt;The under-discussed reality: you don't need four billing dashboards. Each CLI accepts custom endpoints; aggregators like ofox expose Anthropic, OpenAI, and Google models through compatible APIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code with Anthropic-Compatible Endpoint
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://api.ofox.ai/anthropic"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"sk-ofox-..."&lt;/span&gt;
claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Codex CLI with OpenAI-Compatible Endpoint
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://api.ofox.ai/v1"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"sk-ofox-..."&lt;/span&gt;
codex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Gemini CLI with Vertex-Compatible Endpoint
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GOOGLE_GENAI_USE_VERTEXAI&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false
export &lt;/span&gt;&lt;span class="nv"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"sk-ofox-..."&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GEMINI_API_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://api.ofox.ai/gemini"&lt;/span&gt;
gemini
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cursor Agent Custom Models
&lt;/h3&gt;

&lt;p&gt;Settings → Models → Add Custom Model accepts any OpenAI-compatible base URL plus API key. Set to &lt;code&gt;https://api.ofox.ai/v1&lt;/code&gt; to call Claude, GPT, and Gemini through the same authentication Cursor already understands.&lt;/p&gt;

&lt;p&gt;This pattern runs all four agents against the same model catalog, switching by task class while paying only for consumed tokens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shared Gaps Across All Four (May 2026)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Cross-Repo Awareness
&lt;/h3&gt;

&lt;p&gt;All four operate within single repositories. Coordinating across monorepos plus three sibling repositories requires developer intervention.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost Predictability
&lt;/h3&gt;

&lt;p&gt;Even with &lt;code&gt;/cost&lt;/code&gt; commands and Codex token reporting, predicting multi-hour Goal-mode expenses remains guesswork until completion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Persistent Memory Across Sessions
&lt;/h3&gt;

&lt;p&gt;Subagents and Skills enable knowledge reuse, but genuine session-to-session memory requires developer prompt scaffolding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reliable Test-Driven Loops
&lt;/h3&gt;

&lt;p&gt;Write-test-code-iterate works for greenfield projects but degrades on flaky tests or extended CI cycles.&lt;/p&gt;

&lt;h3&gt;
  
  
  Verification Beyond UI
&lt;/h3&gt;

&lt;p&gt;Cursor's browser-equipped agents verify UI changes visually. Data-pipeline correctness and distributed-system invariants still rely on developer-written tests.&lt;/p&gt;

&lt;p&gt;Addressing these gaps often requires architectural workarounds (CI-side verification, persistent external memory stores) rather than awaiting agent evolution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing Recommendation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Pick by autonomy axis first, then ecosystem fit.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Craftsman pair programmer locally:&lt;/strong&gt; Claude Code with Opus 4.7; use Sonnet 4.6 for broader workloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Walk-away autonomy over hours:&lt;/strong&gt; Codex CLI Goal mode with GPT-5.5 (or GPT-5.4 Pro through ofox if GPT-5.5 lags on aggregators).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Free-tier exploration before June 18:&lt;/strong&gt; Gemini CLI; migrate to Antigravity CLI by mid-June.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browser-aware parallel agents in cloud VM:&lt;/strong&gt; Cursor Background Agents, up to eight in parallel.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Production Composition Pattern
&lt;/h3&gt;

&lt;p&gt;Late-2026 production teams rarely choose one tool. The converging pattern: Claude Code locally for craftsmanship, Codex CLI in a separate shell for endurance, and Cursor Background Agents in the cloud for fan-out—all three routed through one API gateway for unified billing and model catalog access.&lt;/p&gt;

&lt;p&gt;The fastest-shipping developers aren't debating "which is best"—they're composing Claude Code for craftsmanship, Codex CLI for endurance, and Cursor Background Agents for parallelism, unified through a single API key.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources and Version Stamps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code:&lt;/strong&gt; PostToolUse output replacement for all tools (May 2026); Fast mode default upgraded to Opus 4.7 (from 4.6) per Anthropic release notes and ClaudeLog, May 2026&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codex CLI:&lt;/strong&gt; v0.124.0 quick reasoning controls; v0.125.0 reasoning-token reporting in &lt;code&gt;codex exec --json&lt;/code&gt;; Goal mode GA; remote computer use per OpenAI developers changelog; GPT-5.5 Terminal-Bench 2.0 score of 82.7% per OpenAI launch announcement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini CLI → Antigravity CLI:&lt;/strong&gt; transition announcement May 12, 2026; cutoff for Google AI Pro/Ultra and free Gemini Code Assist on June 18, 2026, per Google Developers Blog&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cursor Agent:&lt;/strong&gt; Background Agents v3.0 with cloud VMs; February 2026 desktop + browser per agent; 8x parallel fan-out; Composer 2 first-party model per cursor.com and v3 release notes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ofox model availability:&lt;/strong&gt; Claude Opus 4.7, Sonnet 4.6, Haiku 4; GPT-5.4 Pro, GPT-5.4, GPT-5.3 Codex; Gemini 3.1 Pro, 3.1 Flash, 3.1 Flash-Lite—verified at ofox.ai/llms-full.txt on 2026-05-25&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://ofox.ai/blog/agentic-coding-claude-codex-gemini-cursor-2026/" rel="noopener noreferrer"&gt;ofox.ai/blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>codexcli</category>
      <category>cursor</category>
    </item>
    <item>
      <title>How to Delegate Claude Code Tasks to Mistral Vibe — Save 2-4x on Tokens</title>
      <dc:creator>Owen</dc:creator>
      <pubDate>Mon, 25 May 2026 02:37:38 +0000</pubDate>
      <link>https://dev.to/owen_fox/how-to-delegate-claude-code-tasks-to-mistral-vibe-save-2-4x-on-tokens-gp2</link>
      <guid>https://dev.to/owen_fox/how-to-delegate-claude-code-tasks-to-mistral-vibe-save-2-4x-on-tokens-gp2</guid>
      <description>&lt;h1&gt;
  
  
  How to Delegate Claude Code Tasks to Mistral Vibe — Save 2-4x on Tokens
&lt;/h1&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Mistral Vibe (Mistral's open-source coding CLI running Mistral Medium 3.5 at $1.50/$7.50 per million tokens) is roughly 3.3x cheaper than Claude Opus 4.7 ($5/$25). You don't have to choose between them—Claude Code can spawn Vibe as a subagent via the Bash tool, keeping Opus 4.7 for planning and review while Vibe handles refactors, file scans, and bulk edits. A 30-line config file enables this approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Pattern Exists
&lt;/h2&gt;

&lt;p&gt;For most of 2025, the agentic-CLI debate centered on "which one is best." In 2026, the better question is "which one does each job best." Claude Code's subagent system lets you answer pragmatically: keep the expensive model where reasoning matters, route the grunt work somewhere cheaper.&lt;/p&gt;

&lt;p&gt;Claude Opus 4.7 is expensive because most tasks don't need Opus. A single agentic session reading 20-30 files easily burns 100K+ tokens before the model writes code. The token bill is dominated by exploration and bulk edits, not by moments where Opus actually earns its keep.&lt;/p&gt;

&lt;p&gt;Mistral Medium 3.5—the default model in Mistral Vibe since April 29, 2026—costs $1.50/$7.50 per million tokens and scores 77.6% on SWE-Bench Verified. It's not as strong as Opus on novel reasoning, but for mechanical tasks like "rename this symbol in 14 files," "add error handling to these three functions," or "extract this prop into a config object," it's indistinguishable.&lt;/p&gt;

&lt;p&gt;The delegation pattern lets you keep Opus for decisions and hand the mechanics to Vibe. If you're skeptical of mixing CLIs, the hybrid routing approach inside Claude Code itself is a closer-coupled alternative worth comparing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Mistral Vibe Actually Is
&lt;/h2&gt;

&lt;p&gt;Mistral Vibe is a terminal-based coding agent that ships as a single Python CLI with built-in subagent support, MCP integration, and a non-interactive prompt mode. Installation is one line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-LsSf&lt;/span&gt; https://mistral.ai/vibe/install.sh | bash
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;MISTRAL_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Config (optional—Vibe ships with defaults) lives at &lt;code&gt;~/.vibe/config.toml&lt;/code&gt;. A minimal version using only documented keys:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;active_model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"mistral-medium-3-5"&lt;/span&gt;
&lt;span class="py"&gt;enable_auto_update&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;enable_telemetry&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The flag you care about for delegation is &lt;code&gt;--prompt&lt;/code&gt;, which runs Vibe one-shot and prints the result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vibe &lt;span class="nt"&gt;--prompt&lt;/span&gt; &lt;span class="s2"&gt;"refactor src/utils/date.ts to use date-fns instead of moment"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That command is the entire integration surface. Any orchestrator that can shell out—Claude Code, a Makefile, a CI job—can call it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Claude Code Half: Defining a Subagent
&lt;/h2&gt;

&lt;p&gt;Claude Code routes work to subagents by reading the &lt;code&gt;description&lt;/code&gt; field in each agent definition under &lt;code&gt;.claude/agents/&lt;/code&gt;. To make Opus 4.7 delegate to Vibe, write one Markdown file with a Bash-only tool scope and a description that tells Opus when this worker is the right pick.&lt;/p&gt;

&lt;p&gt;Create &lt;code&gt;.claude/agents/vibe-worker.md&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vibe-worker&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;mechanical&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;changes&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;where&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;reasoning&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;is&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;shallow&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;—&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;renames,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;refactors&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;across&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;many&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;files,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;adding&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;handling,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;extracting&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;helpers,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;format/lint&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;cleanup.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Do&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;NOT&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;use&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;architectural&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;decisions&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;or&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;novel&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;logic."&lt;/span&gt;
&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Bash&lt;/span&gt;
&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sonnet&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

You are a delegation wrapper around the Mistral Vibe CLI.

When invoked with a task description, run:

  vibe --prompt "&lt;span class="nt"&gt;&amp;lt;task&lt;/span&gt; &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;"

Capture the output, then return a short summary: which files changed, what the change was, and any warnings Vibe surfaced. Do not edit files yourself — only run the &lt;span class="sb"&gt;`vibe`&lt;/span&gt; command.

If &lt;span class="sb"&gt;`vibe`&lt;/span&gt; returns an error or asks for clarification, return the raw output to the parent and stop.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice that &lt;code&gt;tools: Bash&lt;/code&gt; is intentional: this subagent's only superpower is shelling out, which keeps its context narrow. The &lt;code&gt;description&lt;/code&gt; is what Opus reads to decide when to dispatch, so the "do NOT use for…" line matters as much as the positive cases. The wrapper itself runs on Sonnet 4.6, not Opus, because all it does is format one shell command.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Cost Math
&lt;/h2&gt;

&lt;p&gt;A typical "refactor 20 files to use the new API" task burns about 50K input tokens (reading files + scratch reasoning) and produces about 10K output tokens. Running it three ways:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Path&lt;/th&gt;
&lt;th&gt;Input cost&lt;/th&gt;
&lt;th&gt;Output cost&lt;/th&gt;
&lt;th&gt;Total&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.7 direct&lt;/td&gt;
&lt;td&gt;50K × $5/M = $0.25&lt;/td&gt;
&lt;td&gt;10K × $25/M = $0.25&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.50&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mistral Vibe (Medium 3.5)&lt;/td&gt;
&lt;td&gt;50K × $1.50/M = $0.075&lt;/td&gt;
&lt;td&gt;10K × $7.50/M = $0.075&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.15&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4 Flash via ofox&lt;/td&gt;
&lt;td&gt;50K × $0.14/M = $0.007&lt;/td&gt;
&lt;td&gt;10K × $0.28/M = $0.003&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.01&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Mistral Vibe saves 3.3x against direct Opus on this task. If you run 100 such tasks a month, you've kept $35 in your wallet instead of Anthropic's. The catch is that the saving evaporates the moment you delegate something Vibe can't handle—Opus then re-does the work, so you pay twice. The decision rubric is: only delegate when you'd be comfortable letting a junior engineer do it without supervision.&lt;/p&gt;

&lt;p&gt;For the genuinely token-paranoid, the third row in that table is real—DeepSeek V4 Flash is $0.14/$0.28 per million tokens on ofox. You can substitute it for Mistral Vibe in the same subagent pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Variant: The Same Pattern on Ofox + DeepSeek V4 Flash
&lt;/h2&gt;

&lt;p&gt;If you already have an ofox key for unified model access, you can skip Mistral Vibe entirely and have Claude Code dispatch to DeepSeek V4 Flash directly. The Bash wrapper changes from &lt;code&gt;vibe --prompt&lt;/code&gt; to a curl call, but the subagent definition is otherwise identical.&lt;/p&gt;

&lt;p&gt;Create &lt;code&gt;.claude/agents/cheap-worker.md&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cheap-worker&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Use for mechanical edits — renames, format cleanup, boilerplate generation, simple refactors. Routes to DeepSeek V4 Flash via ofox. NOT for design decisions or novel logic.&lt;/span&gt;
&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Bash, Read&lt;/span&gt;
&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sonnet&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

For each delegated task, call:

  curl https://api.ofox.ai/v1/chat/completions &lt;span class="err"&gt;\&lt;/span&gt;
    -H "Authorization: Bearer $OFOX_API_KEY" &lt;span class="err"&gt;\&lt;/span&gt;
    -H "Content-Type: application/json" &lt;span class="err"&gt;\&lt;/span&gt;
    -d '{"model":"deepseek/deepseek-v4-flash","messages":[{"role":"user","content":"&lt;span class="nt"&gt;&amp;lt;task&lt;/span&gt; &lt;span class="err"&gt;+&lt;/span&gt; &lt;span class="na"&gt;relevant&lt;/span&gt; &lt;span class="na"&gt;file&lt;/span&gt; &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;"}]}'

Apply the returned diff yourself using Read + your own edit primitives. Return a one-paragraph summary to the parent.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The trade-off: Mistral Vibe is a real coding agent with its own planning loop, so it handles multi-file tasks better. A raw DeepSeek V4 Flash call is just a language model—the orchestration logic falls on you (or on Opus, which costs Opus tokens). For single-file edits, the ofox variant wins on price. For multi-file refactors, Vibe pulls ahead because its agentic loop runs on the cheap model, not on Opus.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where This Stops Working
&lt;/h2&gt;

&lt;p&gt;The delegation pattern breaks in three specific situations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When the task involves judgment about API design or trade-offs.&lt;/strong&gt; Mistral Medium 3.5 will pick &lt;em&gt;an&lt;/em&gt; answer; Opus 4.7 will tell you why one option is wrong. Architecture decisions are not where you save tokens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When the delegated task needs context the wrapper can't supply.&lt;/strong&gt; Vibe runs in a fresh process with no memory of your conversation. If "fix this bug" depends on three earlier discussions, you'll pass them in as prompt context—paying input tokens twice, in both Opus and Vibe. Net cost can exceed the no-delegation baseline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When Vibe's tokenizer disagrees with Anthropic's.&lt;/strong&gt; Claude Opus 4.7 ships a new tokenizer that uses ~12-27% more tokens than Opus 4.6 on the same text. Mistral's tokenizer is different again. Your "50K tokens" estimate from a Claude session is not what Vibe will count, and the bills won't line up exactly. The 3.3x ratio holds in aggregate; trust it monthly, not per-task.&lt;/p&gt;

&lt;h2&gt;
  
  
  Picking the Right Delegation Cutoff
&lt;/h2&gt;

&lt;p&gt;A useful heuristic: if you can write a clear, two-sentence task description without referring to "the thing we discussed earlier" or "the approach you mentioned," it's delegable. If you can't, keep it on Opus.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tasks that consistently win when delegated:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Symbol renames across the codebase&lt;/li&gt;
&lt;li&gt;Adding null-checks or error handling to a list of known functions&lt;/li&gt;
&lt;li&gt;Generating boilerplate (test scaffolding, type definitions from schemas, config files)&lt;/li&gt;
&lt;li&gt;Format/lint fixes that grep can target but humans hate doing&lt;/li&gt;
&lt;li&gt;Translating between formats (JSON ↔ YAML, OpenAPI ↔ TypeScript)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tasks that lose when delegated:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Anything involving "which approach is better"&lt;/li&gt;
&lt;li&gt;Novel algorithm work&lt;/li&gt;
&lt;li&gt;Bug fixes where the root cause isn't established&lt;/li&gt;
&lt;li&gt;Reviewing AI-generated code (don't ask a cheaper model to review its peer's work)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're already optimizing Claude Code spend, this pattern stacks on top of existing strategies—they target different cost drivers (this one targets &lt;em&gt;which model does the work&lt;/em&gt;, while other optimization guides target &lt;em&gt;how much context the model sees&lt;/em&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  The Full Minimum-Viable Setup
&lt;/h2&gt;

&lt;p&gt;Five minutes if you already have an Anthropic key:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;curl -LsSf https://mistral.ai/vibe/install.sh | bash&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;export MISTRAL_API_KEY=...&lt;/code&gt; (get one from console.mistral.ai)&lt;/li&gt;
&lt;li&gt;Drop the &lt;code&gt;.claude/agents/vibe-worker.md&lt;/code&gt; definition from earlier into your project root&lt;/li&gt;
&lt;li&gt;Restart Claude Code&lt;/li&gt;
&lt;li&gt;Next time you need to do a 20-file refactor, just ask—Opus will read the subagent description and delegate&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first time you watch Claude Code dispatch to &lt;code&gt;vibe-worker&lt;/code&gt; and come back with a diff that cost $0.15 instead of $0.50, the pattern justifies itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  When This Is the Wrong Question Entirely
&lt;/h2&gt;

&lt;p&gt;If your monthly bill is dominated by &lt;em&gt;one&lt;/em&gt; model and you're chasing a single-digit-percent cost cut, this isn't the lever to pull. Check whether prompt caching, batching, or context window discipline would save more for less engineering effort. Delegation overhead is real—every subagent dispatch is a Bash spawn, and every Bash spawn is a roundtrip Opus has to reason about.&lt;/p&gt;

&lt;p&gt;But if you've already done the easy optimizations and you still see Opus burning tokens on tasks that look mechanical when you watch them happen, this is the pattern. Two CLIs, one config file, predictable savings. The broader model-selection question is independent—you can run the delegation pattern with any pair of orchestrator + worker; Opus + Vibe is just the version with the cleanest CLI ergonomics in May 2026.&lt;/p&gt;

&lt;p&gt;What you're really buying is the right to keep using the model you trust for hard problems, while paying a third of the price for the easy ones. That's the deal—and it only takes 30 lines of YAML to claim it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://ofox.ai/blog/delegate-claude-code-tasks-to-mistral-vibe-save-tokens-2026/" rel="noopener noreferrer"&gt;ofox.ai/blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>subagents</category>
      <category>tokenoptimization</category>
    </item>
  </channel>
</rss>
