OpenAI Agents SDK 0.14 Deep Dive — Sandbox Agents, Model-Native Harness, Subagents, and Codex-Style Filesystem Tools Redefining the 2026 Agent Infrastructure Standard
On April 15, 2026, OpenAI shipped Agents SDK 0.14. It's a minor release on paper, but it changes the default shape of agent runtimes. The release lands on three pillars: (1) a Model-Native Harness where the model directly drives files, shells, and patches; (2) native Sandbox Agents that execute model-generated code in an isolated compute layer; and (3) the Subagent pattern that lets one parent run many children in parallel. Stacked on top are Codex-style filesystem primitives (apply_patch, shell), the Skills primitive for progressive context disclosure, and AGENTS.md as a repo-pinned agent persona. Seven external sandbox adapters ship at launch — Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel — while locally you can run a fast Docker-free loop with UnixLocalSandboxClient. Model Context Protocol (MCP) is no longer an optional integration; it's now first-class built-in tooling. And long-horizon work is finally durable: the SDK persists state via checkpoint, snapshot, and rehydration, so a dying or expired container is no longer a session killer. This article maps the 0.14 surface area across six axes, then shares the 4-week migration checklist ManoIT validated while moving internal RAG and DevSecOps automations onto the new runtime.
1. Why 0.14 Is the Inflection Point — From "Prompt Bundle" to "Runtime"
The original Agents SDK from spring 2025 was essentially a function-calling loop wrapped in handoffs and guardrails. You declared an Agent, attached tools, and let the model pick functions. Two things broke in production. First, when the model produced code, there was no safe place to actually run it. Second, anything longer than an hour was exposed to container restarts and session timeouts. 0.14 pulls both fixes inside the SDK. It's no longer a "prompt + tools" bundle — it's the infrastructure surface for executing agents.
The table below collapses the 0.13.x → 0.14.0 operating-surface delta into a single page.
| Axis | 0.13.x and earlier | 0.14.0 (2026-04-15) | Operational signal |
|---|---|---|---|
| Execution model | Function-call loop (Agent + tools) | Model-Native Harness (memory + filesystem + shell) | Model directly drives the OS surface |
| Code execution | Operator-run external containers | Native Sandbox Agents (new in v0.14.0) | SDK standardizes the compute layer |
| Credentials | Co-located with the model env | Control harness ↔ compute separation | Blocks lateral movement from injection |
| Filesystem tools | Custom functions per project | apply_patch + shell + AGENTS.md (Codex-style) | Patch-level edits, unified shell |
| Long-horizon tasks | Restart from scratch on expiry | Checkpoint, snapshot, rehydration | Resume from last point |
| Parallelism | One serial primary agent | Subagents routed across sandboxes | Concurrent per-container execution |
| External sandboxes | None (DIY integration) | Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, Vercel | BYO Sandbox abstraction |
| MCP | Optional dependency | First-class built-in tooling | Tool graph fully externalized |
| Local loop | Docker + external runner required | UnixLocalSandboxClient (no Docker) | Faster local dev |
| Package | pip install openai-agents |
extras: [docker], [voice], [redis]
|
Modular install surface |
| Languages | Python + TypeScript | 0.14 new pieces: Python first, TS later | Sandbox/harness Python-led |
The two rows that carry the most operational weight are "Credentials" and "External sandboxes." As more agents run longer than an hour, the old shape — the model's generated code sharing a process with your API keys — stops being acceptable. 0.14 severs that link at the SDK level. The control harness (control plane) holds credentials; model-generated code only runs in a separate sandbox (compute plane). Even if prompt injection succeeds, lateral movement to the rest of the corporate network is cut off.
2. The 0.14 Runtime Architecture — Six Axes
The 0.14 runtime decomposes cleanly into six axes. Swapping external adapters does not change what the axes mean.
| Axis | Owner | Manifestation | Boundary |
|---|---|---|---|
| ① Harness | OpenAI SDK | memory + filesystem orchestrator | Holds credentials |
| ② Model | OpenAI / 3rd-party | Decision-maker for tool calls | No sandbox access |
| ③ Sandbox Client | SDK adapter | UnixLocal / BYO / 7 external | Delegates code execution |
| ④ Compute | Sandbox provider | files + shell + packages | Network/scope controls |
| ⑤ MCP Servers | Standard external tools | tools as a service | OAuth/auth delegated |
| ⑥ Persistence | SDK + Provider | checkpoint/snapshot | Guarantees resume |
The most counter-intuitive separation is between ② and ④. "The model decides; the code runs elsewhere." Through 0.13 it was common for shell commands returned by the model to be executed in the same process. In 0.14, the model has no direct access to compute. The model asks the harness to "apply this patch and run this shell"; the harness routes that through the sandbox client to the compute layer.
2.1 The Smallest Local Loop — UnixLocalSandboxClient
The first thing to meet when 0.14 lands is UnixLocalSandboxClient. It takes a host temp directory as the sandbox workspace, no Docker required, which makes it the fastest local dev mode.
# venv + install (Python 3.10+)
python -m venv .venv && source .venv/bin/activate
# Base install
pip install openai-agents
# Docker-backed sandbox
pip install "openai-agents[docker]"
# Voice / Redis session
pip install "openai-agents[voice]"
pip install "openai-agents[redis]"
# examples/sandbox/unix_local_runner.py (0.14.0 idiomatic form)
from agents import Runner
from agents.sandbox import SandboxAgent, Manifest, UnixLocalSandboxClient
# 1) Manifest: declare initial workspace state
manifest = Manifest.from_dict({
"files": {
"README.md": "# ManoIT Auto Blog\nLocal sandbox demo.\n",
},
})
# 2) Sandbox agent: model directly drives shell/patch/skills
agent = SandboxAgent(
name="repo-inspector",
instructions=(
"Inspect the code/docs in this workspace. "
"Make any changes via apply_patch only."
),
default_manifest=manifest,
)
# 3) Sync run with the local Unix client
result = Runner.run_sync(
agent,
"Append the line 'Review complete' (in Korean) to README.md.",
sandbox=UnixLocalSandboxClient(),
)
print(result.final_output)
Two operational nuances. First, the Manifest declaratively pins what state the sandbox starts in. What a CI container's Dockerfile used to do is now the SDK's responsibility. Second, Runner.run_sync owns the container's lifetime — workspaces are torn down post-run, and the next call starts in a fresh one. ManoIT calls this loop devloop-fast and policy-mandates that all pre-CI agent debugging happens here.
2.2 External Sandboxes — BYO Sandbox × 7
Production moves to external adapters. The same sandbox= interface accepts seven providers at launch.
| Provider | Execution shape | Strength | Operational signal |
|---|---|---|---|
| Blaxel | Micro VMs | First-class SDK tutorials | VPC controls early |
| Cloudflare | Workers Sandbox | Edge-adjacent, ms cold start | Auto regional routing |
| Daytona | SDK standard workspaces | Snapshot-friendly | Recommended for long-horizon |
| E2B | Micro VMs | Code execution focused | Strong observability/logs |
| Modal | Serverless containers | GPU routing | Pairs with model post-processing |
| Runloop | Runner pool | Benchmark-friendly | Automated scorecards |
| Vercel | Vercel Sandbox | Natural Next.js integration | Instant web demos |
"Which provider is right?" is the wrong question. The real operational splits are two: snapshot cost for long-horizon work and outbound network policy. ManoIT routes short analysis/patching to Vercel Sandbox and hour-plus builds/migrations to Daytona — the same agent moves between them by swapping sandbox=.
3. Model-Native Harness — apply_patch, shell, skills, AGENTS.md
The 0.14 harness is built on the assumption that "the model directly drives the OS surface." Four first-class primitives are now baked into the SDK — apply_patch (patch-level edits), shell (shell commands), skills (progressive context disclosure), and AGENTS.md (repo-level agent persona). All four were lifted from operational lessons inside OpenAI Codex and promoted to default SDK weapons in 0.14.
3.1 apply_patch — Goodbye to "Rewrite the Whole File"
The standard pattern up to 0.13 was "rewrite the entire file." Microedits in large files frequently produced regressions because the model would touch unrelated lines. 0.14's apply_patch only accepts patch-formatted edits. Two things improve simultaneously: token use drops (no full-file resends) and regressions drop (out-of-scope lines are forbidden). ManoIT now policy-disables shell and enables only apply_patch for code-modification agents.
3.2 Skills — "Don't Hand Over Everything at Once"
Skills implement progressive context disclosure. The agent unfolds a skill's instructions only when it actually needs that domain knowledge mid-task. Instead of dropping 50KB into the system prompt at startup, you reveal 1–3KB on demand. In a world where 200K context windows are common, this turns out to be a quietly effective cost-reduction trick.
3.3 AGENTS.md — The Repo Carries Its Own Agent Spec
If every agent entering a repository must follow the same rules (language, conventions, prohibitions), eventually the repository has to carry that spec itself. The 0.14 harness auto-reads AGENTS.md from the workspace root and injects it into the model. It's likely to become the unifying standard that tidies up the era of per-tool files like CLAUDE.md, CURSOR.md, and .cursorrules. ManoIT's internal template:
# AGENTS.md | Project: ManoIT Auto Blog | v1.0
## Identity
- Language: Korean (comments/docs), English (code/variables)
- Convention: snake_case (variables), PascalCase (classes)
## Tools Allowed
- apply_patch: yes
- shell: limited (no curl/wget to external)
- mcp:notion: read-only
- mcp:github: PR open + comment
## Forbidden
- Modify .env, secrets/*, deploy/*
- Direct egress except via approved MCP servers
- Force-push or rewrite git history
## Output
- All Korean prose with English technical terms in parentheses
- Code comments in Korean
- API response envelope: {success, data, error, meta}
4. Subagents — One Harness, Many Containers
The Subagent pattern in 0.14 is the model where "one agent spins up many children in parallel." Each child runs in an independent sandbox, and the parent harness aggregates results. The point is isolation — files produced by child A do not pollute child B's workspace. That single fact materially improves long-horizon automation reliability.
| Pattern | 0.13-and-earlier ops | 0.14 Subagents | Outcome |
|---|---|---|---|
| Multi-repo analysis | Serial loop | One child per repo, parallel | n×1 → 1× (within limits) |
| Bulk PR reviews | Queue processing | One child per PR, parent aggregation | Report consistency ↑ |
| Long builds | One container accumulating state | Child-level build+test, parent gate | Failure isolation |
| Large datasets | Hand-rolled sharding | One shard per sandbox, parent merge | Less code |
Watch the model-call volume. As children multiply, calls multiply too. 0.14 doesn't force a summarization step when the parent aggregates, but ManoIT requires the parent to run a single summarization pass once child output tokens cross a threshold. Cost scales with child count, but context pollution stays bounded.
5. Checkpoint · Snapshot · Rehydration — Containers That Are Allowed to Die
This is the first place the SDK itself guarantees durability. When a sandbox container expires or crashes, the harness spins up a new one and restores state from the last checkpoint to resume. Two stages happen in practice.
1) Snapshot — The harness snapshots the workspace, memory, and conversation state at intervals. Where it's stored depends on the provider adapter (Daytona has native snapshots; Modal uses volume snapshots). 2) Rehydration — The next call brings up a new container, unpacks the snapshot to restore the last state, and prompts the model to "continue from the last point." The model rediscovers where it left off through its memory.
This pattern removes the old constraint that "an hour-plus migration must fit inside a one-hour container." After 0.14 lands, ManoIT pilot-ran a three-hour monorepo migration twice; both runs survived mid-flight container expirations and converged to the same result.
6. MCP as First-Class — Externalizing the Tool Graph
The quietest but most consequential change in 0.14 is the MCP surface. MCP used to be an optional dependency wrapped by the SDK separately. In 0.14 it's built-in tooling — function tools and MCP tools are exposed to the model under the same tool-tree shape. Two things become natural.
First, externalizing the tools. Integration code for internal systems (Jira, Notion, GitHub) no longer lives in the SDK codebase — it lives in an MCP server that Claude Code, Cursor, and OpenAI agents can share simultaneously. Second, auth separation. OAuth/session handling moves into the MCP server, so secrets don't enter the agent code. In 0.14 the SDK passes the MCP server's tool spec through to the model unmodified and just relays the result.
7. ManoIT 4-Week Migration Checklist
The standard 4-week schedule ManoIT validated moving internal RAG and DevSecOps automations to 0.14. Framed not as "we lifted it over" but as moving without breaking operational SLOs.
| Week | Goal | Deliverables | Rollback if failed |
|---|---|---|---|
| Week 1 | Local devloop-fast | UnixLocalSandboxClient standard, shell/patch policy, AGENTS.md template | Revert to SDK 0.13 immediately |
| Week 2 | BYO Sandbox adapter eval | Vercel/Daytona/Modal bench, cost/latency comparison | No external adapter, in-house Docker |
| Week 3 | Subagent pilot | Bulk PR reviews, multi-repo analysis, parent summarization policy | Downgrade to serial processing |
| Week 4 | MCP externalization + checkpoint | Notion/GitHub MCP servers split out, 3-hour migration rehearsal | Disable MCP, revert to function tools temporarily |
The biggest snag in the field was AGENTS.md standardization in Week 1. Some repos already carried CLAUDE.md/CURSOR.md, and conflicts emerged. ManoIT designated AGENTS.md as the single authoritative source and demoted CLAUDE.md/CURSOR.md to auto-generated files. A Git hook regenerates them whenever AGENTS.md changes.
8. Security & Governance Signals — Lateral Movement, Injection, Audit
The 0.14 isolation model knocks down two threats at once. Lateral movement dies because compute can't see credentials. Prompt injection still happens, but even when it succeeds, the model's reach is "shell/patch inside the sandbox" — blast radius shrinks. Two responsibilities remain on the operator.
| Risk | 0.14's response | Operator responsibility |
|---|---|---|
| Outbound egress abuse | Delegated to sandbox policy | Enforce egress allowlists at the provider |
| Sandbox side channels | Workspace isolation | Encrypt snapshot stores, separate keys |
| MCP server over-permission | Pass tool spec as-is | Tighten RBAC/scope at the MCP server |
| Audit trails | Harness logs | Collect both control and compute via OpenTelemetry |
| Cost runaway | Snapshot / harness call counters | Bill sandbox vs model costs separately |
The last two rows are particularly hard to defer under Korean enterprise audit norms. ManoIT keeps harness calls billed against OpenAI and sandbox execution billed against the BYO provider, with alerts that fire if both curves spike together.
9. Closing — "Agent Infrastructure" Now Has a Standardized Surface
2025 was the year of "which prompt is better." 2026 shifts the competition to "who has the safer, longer-running agent infrastructure." OpenAI Agents SDK 0.14 pulls that surface inside the SDK and standardizes it — harness, sandbox, subagent, checkpoint, and MCP now share names and shapes inside the same SDK. The implication is simple: the cost of running an agent runtime in-house drops by roughly an order of magnitude. The gap that's left to fill is domain knowledge and policy — how to author AGENTS.md, what RBAC to wire into MCP servers, what work to split across subagents. That's now the real differentiator. 0.14 is the first minor release that codifies the proposition "an agent is not a model — it's infrastructure."
This article was produced by ManoIT's automated blog pipeline, cross-checked against the OpenAI Agents SDK 0.14 release notes, the OpenAI blog (2026-04-15), TechCrunch / Help Net Security / Dataconomy coverage, the GitHub openai-agents-python v0.14.0 release tag and examples/sandbox/unix_local_runner.py, and the OpenAI Developers Sandbox Agents guide. Written by Anthropic Claude (Opus); edited and technically reviewed by ManoIT.
© 2026 ManoIT — manoit.co.kr
Originally published at ManoIT Tech Blog.
Top comments (0)