daniel jeong

Posted on May 11 • Originally published at manoit.co.kr

OpenAI Agents SDK 0.14: Sandbox Agents, Model-Native Harness, Subagents, Codex-Style Filesystem Tools

#ai #mcp #llm #agents

OpenAI Agents SDK 0.14 Deep Dive — Sandbox Agents, Model-Native Harness, Subagents, and Codex-Style Filesystem Tools Redefining the 2026 Agent Infrastructure Standard

On April 15, 2026, OpenAI shipped Agents SDK 0.14. It's a minor release on paper, but it changes the default shape of agent runtimes. The release lands on three pillars: (1) a Model-Native Harness where the model directly drives files, shells, and patches; (2) native Sandbox Agents that execute model-generated code in an isolated compute layer; and (3) the Subagent pattern that lets one parent run many children in parallel. Stacked on top are Codex-style filesystem primitives (apply_patch, shell), the Skills primitive for progressive context disclosure, and AGENTS.md as a repo-pinned agent persona. Seven external sandbox adapters ship at launch — Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel — while locally you can run a fast Docker-free loop with UnixLocalSandboxClient. Model Context Protocol (MCP) is no longer an optional integration; it's now first-class built-in tooling. And long-horizon work is finally durable: the SDK persists state via checkpoint, snapshot, and rehydration, so a dying or expired container is no longer a session killer. This article maps the 0.14 surface area across six axes, then shares the 4-week migration checklist ManoIT validated while moving internal RAG and DevSecOps automations onto the new runtime.

1. Why 0.14 Is the Inflection Point — From "Prompt Bundle" to "Runtime"

The original Agents SDK from spring 2025 was essentially a function-calling loop wrapped in handoffs and guardrails. You declared an Agent, attached tools, and let the model pick functions. Two things broke in production. First, when the model produced code, there was no safe place to actually run it. Second, anything longer than an hour was exposed to container restarts and session timeouts. 0.14 pulls both fixes inside the SDK. It's no longer a "prompt + tools" bundle — it's the infrastructure surface for executing agents.

The table below collapses the 0.13.x → 0.14.0 operating-surface delta into a single page.

Axis	0.13.x and earlier	0.14.0 (2026-04-15)	Operational signal
Execution model	Function-call loop (Agent + tools)	Model-Native Harness (memory + filesystem + shell)	Model directly drives the OS surface
Code execution	Operator-run external containers	Native Sandbox Agents (new in v0.14.0)	SDK standardizes the compute layer
Credentials	Co-located with the model env	Control harness ↔ compute separation	Blocks lateral movement from injection
Filesystem tools	Custom functions per project	apply_patch + shell + AGENTS.md (Codex-style)	Patch-level edits, unified shell
Long-horizon tasks	Restart from scratch on expiry	Checkpoint, snapshot, rehydration	Resume from last point
Parallelism	One serial primary agent	Subagents routed across sandboxes	Concurrent per-container execution
External sandboxes	None (DIY integration)	Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, Vercel	BYO Sandbox abstraction
MCP	Optional dependency	First-class built-in tooling	Tool graph fully externalized
Local loop	Docker + external runner required	UnixLocalSandboxClient (no Docker)	Faster local dev
Package	`pip install openai-agents`	extras: `[docker]`, `[voice]`, `[redis]`	Modular install surface
Languages	Python + TypeScript	0.14 new pieces: Python first, TS later	Sandbox/harness Python-led

The two rows that carry the most operational weight are "Credentials" and "External sandboxes." As more agents run longer than an hour, the old shape — the model's generated code sharing a process with your API keys — stops being acceptable. 0.14 severs that link at the SDK level. The control harness (control plane) holds credentials; model-generated code only runs in a separate sandbox (compute plane). Even if prompt injection succeeds, lateral movement to the rest of the corporate network is cut off.

2. The 0.14 Runtime Architecture — Six Axes

The 0.14 runtime decomposes cleanly into six axes. Swapping external adapters does not change what the axes mean.

Axis	Owner	Manifestation	Boundary
① Harness	OpenAI SDK	memory + filesystem orchestrator	Holds credentials
② Model	OpenAI / 3rd-party	Decision-maker for tool calls	No sandbox access
③ Sandbox Client	SDK adapter	UnixLocal / BYO / 7 external	Delegates code execution
④ Compute	Sandbox provider	files + shell + packages	Network/scope controls
⑤ MCP Servers	Standard external tools	tools as a service	OAuth/auth delegated
⑥ Persistence	SDK + Provider	checkpoint/snapshot	Guarantees resume

The most counter-intuitive separation is between ② and ④. "The model decides; the code runs elsewhere." Through 0.13 it was common for shell commands returned by the model to be executed in the same process. In 0.14, the model has no direct access to compute. The model asks the harness to "apply this patch and run this shell"; the harness routes that through the sandbox client to the compute layer.

2.1 The Smallest Local Loop — UnixLocalSandboxClient

The first thing to meet when 0.14 lands is UnixLocalSandboxClient. It takes a host temp directory as the sandbox workspace, no Docker required, which makes it the fastest local dev mode.

# venv + install (Python 3.10+)
python -m venv .venv && source .venv/bin/activate

# Base install
pip install openai-agents

# Docker-backed sandbox
pip install "openai-agents[docker]"

# Voice / Redis session
pip install "openai-agents[voice]"
pip install "openai-agents[redis]"

# examples/sandbox/unix_local_runner.py (0.14.0 idiomatic form)
from agents import Runner
from agents.sandbox import SandboxAgent, Manifest, UnixLocalSandboxClient

# 1) Manifest: declare initial workspace state
manifest = Manifest.from_dict({
    "files": {
        "README.md": "# ManoIT Auto Blog\nLocal sandbox demo.\n",
    },
})

# 2) Sandbox agent: model directly drives shell/patch/skills
agent = SandboxAgent(
    name="repo-inspector",
    instructions=(
        "Inspect the code/docs in this workspace. "
        "Make any changes via apply_patch only."
    ),
    default_manifest=manifest,
)

# 3) Sync run with the local Unix client
result = Runner.run_sync(
    agent,
    "Append the line 'Review complete' (in Korean) to README.md.",
    sandbox=UnixLocalSandboxClient(),
)
print(result.final_output)

Two operational nuances. First, the Manifest declaratively pins what state the sandbox starts in. What a CI container's Dockerfile used to do is now the SDK's responsibility. Second, Runner.run_sync owns the container's lifetime — workspaces are torn down post-run, and the next call starts in a fresh one. ManoIT calls this loop devloop-fast and policy-mandates that all pre-CI agent debugging happens here.

2.2 External Sandboxes — BYO Sandbox × 7

Production moves to external adapters. The same sandbox= interface accepts seven providers at launch.

Provider	Execution shape	Strength	Operational signal
Blaxel	Micro VMs	First-class SDK tutorials	VPC controls early
Cloudflare	Workers Sandbox	Edge-adjacent, ms cold start	Auto regional routing
Daytona	SDK standard workspaces	Snapshot-friendly	Recommended for long-horizon
E2B	Micro VMs	Code execution focused	Strong observability/logs
Modal	Serverless containers	GPU routing	Pairs with model post-processing
Runloop	Runner pool	Benchmark-friendly	Automated scorecards
Vercel	Vercel Sandbox	Natural Next.js integration	Instant web demos

"Which provider is right?" is the wrong question. The real operational splits are two: snapshot cost for long-horizon work and outbound network policy. ManoIT routes short analysis/patching to Vercel Sandbox and hour-plus builds/migrations to Daytona — the same agent moves between them by swapping sandbox=.

3. Model-Native Harness — apply_patch, shell, skills, AGENTS.md

The 0.14 harness is built on the assumption that "the model directly drives the OS surface." Four first-class primitives are now baked into the SDK — apply_patch (patch-level edits), shell (shell commands), skills (progressive context disclosure), and AGENTS.md (repo-level agent persona). All four were lifted from operational lessons inside OpenAI Codex and promoted to default SDK weapons in 0.14.

3.1 apply_patch — Goodbye to "Rewrite the Whole File"

The standard pattern up to 0.13 was "rewrite the entire file." Microedits in large files frequently produced regressions because the model would touch unrelated lines. 0.14's apply_patch only accepts patch-formatted edits. Two things improve simultaneously: token use drops (no full-file resends) and regressions drop (out-of-scope lines are forbidden). ManoIT now policy-disables shell and enables only apply_patch for code-modification agents.

3.2 Skills — "Don't Hand Over Everything at Once"

Skills implement progressive context disclosure. The agent unfolds a skill's instructions only when it actually needs that domain knowledge mid-task. Instead of dropping 50KB into the system prompt at startup, you reveal 1–3KB on demand. In a world where 200K context windows are common, this turns out to be a quietly effective cost-reduction trick.

3.3 AGENTS.md — The Repo Carries Its Own Agent Spec

If every agent entering a repository must follow the same rules (language, conventions, prohibitions), eventually the repository has to carry that spec itself. The 0.14 harness auto-reads AGENTS.md from the workspace root and injects it into the model. It's likely to become the unifying standard that tidies up the era of per-tool files like CLAUDE.md, CURSOR.md, and .cursorrules. ManoIT's internal template:

# AGENTS.md | Project: ManoIT Auto Blog | v1.0

## Identity
- Language: Korean (comments/docs), English (code/variables)
- Convention: snake_case (variables), PascalCase (classes)

## Tools Allowed
- apply_patch: yes
- shell: limited (no curl/wget to external)
- mcp:notion: read-only
- mcp:github: PR open + comment

## Forbidden
- Modify .env, secrets/*, deploy/*
- Direct egress except via approved MCP servers
- Force-push or rewrite git history

## Output
- All Korean prose with English technical terms in parentheses
- Code comments in Korean
- API response envelope: {success, data, error, meta}

4. Subagents — One Harness, Many Containers

The Subagent pattern in 0.14 is the model where "one agent spins up many children in parallel." Each child runs in an independent sandbox, and the parent harness aggregates results. The point is isolation — files produced by child A do not pollute child B's workspace. That single fact materially improves long-horizon automation reliability.

Pattern	0.13-and-earlier ops	0.14 Subagents	Outcome
Multi-repo analysis	Serial loop	One child per repo, parallel	n×1 → 1× (within limits)
Bulk PR reviews	Queue processing	One child per PR, parent aggregation	Report consistency ↑
Long builds	One container accumulating state	Child-level build+test, parent gate	Failure isolation
Large datasets	Hand-rolled sharding	One shard per sandbox, parent merge	Less code

Watch the model-call volume. As children multiply, calls multiply too. 0.14 doesn't force a summarization step when the parent aggregates, but ManoIT requires the parent to run a single summarization pass once child output tokens cross a threshold. Cost scales with child count, but context pollution stays bounded.

5. Checkpoint · Snapshot · Rehydration — Containers That Are Allowed to Die

This is the first place the SDK itself guarantees durability. When a sandbox container expires or crashes, the harness spins up a new one and restores state from the last checkpoint to resume. Two stages happen in practice.

1) Snapshot — The harness snapshots the workspace, memory, and conversation state at intervals. Where it's stored depends on the provider adapter (Daytona has native snapshots; Modal uses volume snapshots). 2) Rehydration — The next call brings up a new container, unpacks the snapshot to restore the last state, and prompts the model to "continue from the last point." The model rediscovers where it left off through its memory.

This pattern removes the old constraint that "an hour-plus migration must fit inside a one-hour container." After 0.14 lands, ManoIT pilot-ran a three-hour monorepo migration twice; both runs survived mid-flight container expirations and converged to the same result.

6. MCP as First-Class — Externalizing the Tool Graph

The quietest but most consequential change in 0.14 is the MCP surface. MCP used to be an optional dependency wrapped by the SDK separately. In 0.14 it's built-in tooling — function tools and MCP tools are exposed to the model under the same tool-tree shape. Two things become natural.

First, externalizing the tools. Integration code for internal systems (Jira, Notion, GitHub) no longer lives in the SDK codebase — it lives in an MCP server that Claude Code, Cursor, and OpenAI agents can share simultaneously. Second, auth separation. OAuth/session handling moves into the MCP server, so secrets don't enter the agent code. In 0.14 the SDK passes the MCP server's tool spec through to the model unmodified and just relays the result.

7. ManoIT 4-Week Migration Checklist

The standard 4-week schedule ManoIT validated moving internal RAG and DevSecOps automations to 0.14. Framed not as "we lifted it over" but as moving without breaking operational SLOs.

Week	Goal	Deliverables	Rollback if failed
Week 1	Local devloop-fast	UnixLocalSandboxClient standard, shell/patch policy, AGENTS.md template	Revert to SDK 0.13 immediately
Week 2	BYO Sandbox adapter eval	Vercel/Daytona/Modal bench, cost/latency comparison	No external adapter, in-house Docker
Week 3	Subagent pilot	Bulk PR reviews, multi-repo analysis, parent summarization policy	Downgrade to serial processing
Week 4	MCP externalization + checkpoint	Notion/GitHub MCP servers split out, 3-hour migration rehearsal	Disable MCP, revert to function tools temporarily

The biggest snag in the field was AGENTS.md standardization in Week 1. Some repos already carried CLAUDE.md/CURSOR.md, and conflicts emerged. ManoIT designated AGENTS.md as the single authoritative source and demoted CLAUDE.md/CURSOR.md to auto-generated files. A Git hook regenerates them whenever AGENTS.md changes.

8. Security & Governance Signals — Lateral Movement, Injection, Audit

The 0.14 isolation model knocks down two threats at once. Lateral movement dies because compute can't see credentials. Prompt injection still happens, but even when it succeeds, the model's reach is "shell/patch inside the sandbox" — blast radius shrinks. Two responsibilities remain on the operator.

Risk	0.14's response	Operator responsibility
Outbound egress abuse	Delegated to sandbox policy	Enforce egress allowlists at the provider
Sandbox side channels	Workspace isolation	Encrypt snapshot stores, separate keys
MCP server over-permission	Pass tool spec as-is	Tighten RBAC/scope at the MCP server
Audit trails	Harness logs	Collect both control and compute via OpenTelemetry
Cost runaway	Snapshot / harness call counters	Bill sandbox vs model costs separately

The last two rows are particularly hard to defer under Korean enterprise audit norms. ManoIT keeps harness calls billed against OpenAI and sandbox execution billed against the BYO provider, with alerts that fire if both curves spike together.

9. Closing — "Agent Infrastructure" Now Has a Standardized Surface

2025 was the year of "which prompt is better." 2026 shifts the competition to "who has the safer, longer-running agent infrastructure." OpenAI Agents SDK 0.14 pulls that surface inside the SDK and standardizes it — harness, sandbox, subagent, checkpoint, and MCP now share names and shapes inside the same SDK. The implication is simple: the cost of running an agent runtime in-house drops by roughly an order of magnitude. The gap that's left to fill is domain knowledge and policy — how to author AGENTS.md, what RBAC to wire into MCP servers, what work to split across subagents. That's now the real differentiator. 0.14 is the first minor release that codifies the proposition "an agent is not a model — it's infrastructure."

This article was produced by ManoIT's automated blog pipeline, cross-checked against the OpenAI Agents SDK 0.14 release notes, the OpenAI blog (2026-04-15), TechCrunch / Help Net Security / Dataconomy coverage, the GitHub openai-agents-python v0.14.0 release tag and examples/sandbox/unix_local_runner.py, and the OpenAI Developers Sandbox Agents guide. Written by Anthropic Claude (Opus); edited and technically reviewed by ManoIT.

Originally published at ManoIT Tech Blog.

DEV Community