DEV Community

daniel jeong
daniel jeong

Posted on • Originally published at manoit.co.kr

OpenAI Agents SDK 0.14: Sandbox Agents, Model-Native Harness, Subagents, Codex-Style Filesystem Tools

OpenAI Agents SDK 0.14 Deep Dive — Sandbox Agents, Model-Native Harness, Subagents, and Codex-Style Filesystem Tools Redefining the 2026 Agent Infrastructure Standard

On April 15, 2026, OpenAI shipped Agents SDK 0.14. It's a minor release on paper, but it changes the default shape of agent runtimes. The release lands on three pillars: (1) a Model-Native Harness where the model directly drives files, shells, and patches; (2) native Sandbox Agents that execute model-generated code in an isolated compute layer; and (3) the Subagent pattern that lets one parent run many children in parallel. Stacked on top are Codex-style filesystem primitives (apply_patch, shell), the Skills primitive for progressive context disclosure, and AGENTS.md as a repo-pinned agent persona. Seven external sandbox adapters ship at launch — Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel — while locally you can run a fast Docker-free loop with UnixLocalSandboxClient. Model Context Protocol (MCP) is no longer an optional integration; it's now first-class built-in tooling. And long-horizon work is finally durable: the SDK persists state via checkpoint, snapshot, and rehydration, so a dying or expired container is no longer a session killer. This article maps the 0.14 surface area across six axes, then shares the 4-week migration checklist ManoIT validated while moving internal RAG and DevSecOps automations onto the new runtime.

1. Why 0.14 Is the Inflection Point — From "Prompt Bundle" to "Runtime"

The original Agents SDK from spring 2025 was essentially a function-calling loop wrapped in handoffs and guardrails. You declared an Agent, attached tools, and let the model pick functions. Two things broke in production. First, when the model produced code, there was no safe place to actually run it. Second, anything longer than an hour was exposed to container restarts and session timeouts. 0.14 pulls both fixes inside the SDK. It's no longer a "prompt + tools" bundle — it's the infrastructure surface for executing agents.

The table below collapses the 0.13.x → 0.14.0 operating-surface delta into a single page.

Axis 0.13.x and earlier 0.14.0 (2026-04-15) Operational signal
Execution model Function-call loop (Agent + tools) Model-Native Harness (memory + filesystem + shell) Model directly drives the OS surface
Code execution Operator-run external containers Native Sandbox Agents (new in v0.14.0) SDK standardizes the compute layer
Credentials Co-located with the model env Control harness ↔ compute separation Blocks lateral movement from injection
Filesystem tools Custom functions per project apply_patch + shell + AGENTS.md (Codex-style) Patch-level edits, unified shell
Long-horizon tasks Restart from scratch on expiry Checkpoint, snapshot, rehydration Resume from last point
Parallelism One serial primary agent Subagents routed across sandboxes Concurrent per-container execution
External sandboxes None (DIY integration) Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, Vercel BYO Sandbox abstraction
MCP Optional dependency First-class built-in tooling Tool graph fully externalized
Local loop Docker + external runner required UnixLocalSandboxClient (no Docker) Faster local dev
Package pip install openai-agents extras: [docker], [voice], [redis] Modular install surface
Languages Python + TypeScript 0.14 new pieces: Python first, TS later Sandbox/harness Python-led

The two rows that carry the most operational weight are "Credentials" and "External sandboxes." As more agents run longer than an hour, the old shape — the model's generated code sharing a process with your API keys — stops being acceptable. 0.14 severs that link at the SDK level. The control harness (control plane) holds credentials; model-generated code only runs in a separate sandbox (compute plane). Even if prompt injection succeeds, lateral movement to the rest of the corporate network is cut off.

2. The 0.14 Runtime Architecture — Six Axes

The 0.14 runtime decomposes cleanly into six axes. Swapping external adapters does not change what the axes mean.

Axis Owner Manifestation Boundary
① Harness OpenAI SDK memory + filesystem orchestrator Holds credentials
② Model OpenAI / 3rd-party Decision-maker for tool calls No sandbox access
③ Sandbox Client SDK adapter UnixLocal / BYO / 7 external Delegates code execution
④ Compute Sandbox provider files + shell + packages Network/scope controls
⑤ MCP Servers Standard external tools tools as a service OAuth/auth delegated
⑥ Persistence SDK + Provider checkpoint/snapshot Guarantees resume

The most counter-intuitive separation is between ② and ④. "The model decides; the code runs elsewhere." Through 0.13 it was common for shell commands returned by the model to be executed in the same process. In 0.14, the model has no direct access to compute. The model asks the harness to "apply this patch and run this shell"; the harness routes that through the sandbox client to the compute layer.

2.1 The Smallest Local Loop — UnixLocalSandboxClient

The first thing to meet when 0.14 lands is UnixLocalSandboxClient. It takes a host temp directory as the sandbox workspace, no Docker required, which makes it the fastest local dev mode.

# venv + install (Python 3.10+)
python -m venv .venv && source .venv/bin/activate

# Base install
pip install openai-agents

# Docker-backed sandbox
pip install "openai-agents[docker]"

# Voice / Redis session
pip install "openai-agents[voice]"
pip install "openai-agents[redis]"
Enter fullscreen mode Exit fullscreen mode
# examples/sandbox/unix_local_runner.py (0.14.0 idiomatic form)
from agents import Runner
from agents.sandbox import SandboxAgent, Manifest, UnixLocalSandboxClient

# 1) Manifest: declare initial workspace state
manifest = Manifest.from_dict({
    "files": {
        "README.md": "# ManoIT Auto Blog\nLocal sandbox demo.\n",
    },
})

# 2) Sandbox agent: model directly drives shell/patch/skills
agent = SandboxAgent(
    name="repo-inspector",
    instructions=(
        "Inspect the code/docs in this workspace. "
        "Make any changes via apply_patch only."
    ),
    default_manifest=manifest,
)

# 3) Sync run with the local Unix client
result = Runner.run_sync(
    agent,
    "Append the line 'Review complete' (in Korean) to README.md.",
    sandbox=UnixLocalSandboxClient(),
)
print(result.final_output)
Enter fullscreen mode Exit fullscreen mode

Two operational nuances. First, the Manifest declaratively pins what state the sandbox starts in. What a CI container's Dockerfile used to do is now the SDK's responsibility. Second, Runner.run_sync owns the container's lifetime — workspaces are torn down post-run, and the next call starts in a fresh one. ManoIT calls this loop devloop-fast and policy-mandates that all pre-CI agent debugging happens here.

2.2 External Sandboxes — BYO Sandbox × 7

Production moves to external adapters. The same sandbox= interface accepts seven providers at launch.

Provider Execution shape Strength Operational signal
Blaxel Micro VMs First-class SDK tutorials VPC controls early
Cloudflare Workers Sandbox Edge-adjacent, ms cold start Auto regional routing
Daytona SDK standard workspaces Snapshot-friendly Recommended for long-horizon
E2B Micro VMs Code execution focused Strong observability/logs
Modal Serverless containers GPU routing Pairs with model post-processing
Runloop Runner pool Benchmark-friendly Automated scorecards
Vercel Vercel Sandbox Natural Next.js integration Instant web demos

"Which provider is right?" is the wrong question. The real operational splits are two: snapshot cost for long-horizon work and outbound network policy. ManoIT routes short analysis/patching to Vercel Sandbox and hour-plus builds/migrations to Daytona — the same agent moves between them by swapping sandbox=.

3. Model-Native Harness — apply_patch, shell, skills, AGENTS.md

The 0.14 harness is built on the assumption that "the model directly drives the OS surface." Four first-class primitives are now baked into the SDK — apply_patch (patch-level edits), shell (shell commands), skills (progressive context disclosure), and AGENTS.md (repo-level agent persona). All four were lifted from operational lessons inside OpenAI Codex and promoted to default SDK weapons in 0.14.

3.1 apply_patch — Goodbye to "Rewrite the Whole File"

The standard pattern up to 0.13 was "rewrite the entire file." Microedits in large files frequently produced regressions because the model would touch unrelated lines. 0.14's apply_patch only accepts patch-formatted edits. Two things improve simultaneously: token use drops (no full-file resends) and regressions drop (out-of-scope lines are forbidden). ManoIT now policy-disables shell and enables only apply_patch for code-modification agents.

3.2 Skills — "Don't Hand Over Everything at Once"

Skills implement progressive context disclosure. The agent unfolds a skill's instructions only when it actually needs that domain knowledge mid-task. Instead of dropping 50KB into the system prompt at startup, you reveal 1–3KB on demand. In a world where 200K context windows are common, this turns out to be a quietly effective cost-reduction trick.

3.3 AGENTS.md — The Repo Carries Its Own Agent Spec

If every agent entering a repository must follow the same rules (language, conventions, prohibitions), eventually the repository has to carry that spec itself. The 0.14 harness auto-reads AGENTS.md from the workspace root and injects it into the model. It's likely to become the unifying standard that tidies up the era of per-tool files like CLAUDE.md, CURSOR.md, and .cursorrules. ManoIT's internal template:

# AGENTS.md | Project: ManoIT Auto Blog | v1.0

## Identity
- Language: Korean (comments/docs), English (code/variables)
- Convention: snake_case (variables), PascalCase (classes)

## Tools Allowed
- apply_patch: yes
- shell: limited (no curl/wget to external)
- mcp:notion: read-only
- mcp:github: PR open + comment

## Forbidden
- Modify .env, secrets/*, deploy/*
- Direct egress except via approved MCP servers
- Force-push or rewrite git history

## Output
- All Korean prose with English technical terms in parentheses
- Code comments in Korean
- API response envelope: {success, data, error, meta}
Enter fullscreen mode Exit fullscreen mode

4. Subagents — One Harness, Many Containers

The Subagent pattern in 0.14 is the model where "one agent spins up many children in parallel." Each child runs in an independent sandbox, and the parent harness aggregates results. The point is isolation — files produced by child A do not pollute child B's workspace. That single fact materially improves long-horizon automation reliability.

Pattern 0.13-and-earlier ops 0.14 Subagents Outcome
Multi-repo analysis Serial loop One child per repo, parallel n×1 → 1× (within limits)
Bulk PR reviews Queue processing One child per PR, parent aggregation Report consistency ↑
Long builds One container accumulating state Child-level build+test, parent gate Failure isolation
Large datasets Hand-rolled sharding One shard per sandbox, parent merge Less code

Watch the model-call volume. As children multiply, calls multiply too. 0.14 doesn't force a summarization step when the parent aggregates, but ManoIT requires the parent to run a single summarization pass once child output tokens cross a threshold. Cost scales with child count, but context pollution stays bounded.

5. Checkpoint · Snapshot · Rehydration — Containers That Are Allowed to Die

This is the first place the SDK itself guarantees durability. When a sandbox container expires or crashes, the harness spins up a new one and restores state from the last checkpoint to resume. Two stages happen in practice.

1) Snapshot — The harness snapshots the workspace, memory, and conversation state at intervals. Where it's stored depends on the provider adapter (Daytona has native snapshots; Modal uses volume snapshots). 2) Rehydration — The next call brings up a new container, unpacks the snapshot to restore the last state, and prompts the model to "continue from the last point." The model rediscovers where it left off through its memory.

This pattern removes the old constraint that "an hour-plus migration must fit inside a one-hour container." After 0.14 lands, ManoIT pilot-ran a three-hour monorepo migration twice; both runs survived mid-flight container expirations and converged to the same result.

6. MCP as First-Class — Externalizing the Tool Graph

The quietest but most consequential change in 0.14 is the MCP surface. MCP used to be an optional dependency wrapped by the SDK separately. In 0.14 it's built-in tooling — function tools and MCP tools are exposed to the model under the same tool-tree shape. Two things become natural.

First, externalizing the tools. Integration code for internal systems (Jira, Notion, GitHub) no longer lives in the SDK codebase — it lives in an MCP server that Claude Code, Cursor, and OpenAI agents can share simultaneously. Second, auth separation. OAuth/session handling moves into the MCP server, so secrets don't enter the agent code. In 0.14 the SDK passes the MCP server's tool spec through to the model unmodified and just relays the result.

7. ManoIT 4-Week Migration Checklist

The standard 4-week schedule ManoIT validated moving internal RAG and DevSecOps automations to 0.14. Framed not as "we lifted it over" but as moving without breaking operational SLOs.

Week Goal Deliverables Rollback if failed
Week 1 Local devloop-fast UnixLocalSandboxClient standard, shell/patch policy, AGENTS.md template Revert to SDK 0.13 immediately
Week 2 BYO Sandbox adapter eval Vercel/Daytona/Modal bench, cost/latency comparison No external adapter, in-house Docker
Week 3 Subagent pilot Bulk PR reviews, multi-repo analysis, parent summarization policy Downgrade to serial processing
Week 4 MCP externalization + checkpoint Notion/GitHub MCP servers split out, 3-hour migration rehearsal Disable MCP, revert to function tools temporarily

The biggest snag in the field was AGENTS.md standardization in Week 1. Some repos already carried CLAUDE.md/CURSOR.md, and conflicts emerged. ManoIT designated AGENTS.md as the single authoritative source and demoted CLAUDE.md/CURSOR.md to auto-generated files. A Git hook regenerates them whenever AGENTS.md changes.

8. Security & Governance Signals — Lateral Movement, Injection, Audit

The 0.14 isolation model knocks down two threats at once. Lateral movement dies because compute can't see credentials. Prompt injection still happens, but even when it succeeds, the model's reach is "shell/patch inside the sandbox" — blast radius shrinks. Two responsibilities remain on the operator.

Risk 0.14's response Operator responsibility
Outbound egress abuse Delegated to sandbox policy Enforce egress allowlists at the provider
Sandbox side channels Workspace isolation Encrypt snapshot stores, separate keys
MCP server over-permission Pass tool spec as-is Tighten RBAC/scope at the MCP server
Audit trails Harness logs Collect both control and compute via OpenTelemetry
Cost runaway Snapshot / harness call counters Bill sandbox vs model costs separately

The last two rows are particularly hard to defer under Korean enterprise audit norms. ManoIT keeps harness calls billed against OpenAI and sandbox execution billed against the BYO provider, with alerts that fire if both curves spike together.

9. Closing — "Agent Infrastructure" Now Has a Standardized Surface

2025 was the year of "which prompt is better." 2026 shifts the competition to "who has the safer, longer-running agent infrastructure." OpenAI Agents SDK 0.14 pulls that surface inside the SDK and standardizes it — harness, sandbox, subagent, checkpoint, and MCP now share names and shapes inside the same SDK. The implication is simple: the cost of running an agent runtime in-house drops by roughly an order of magnitude. The gap that's left to fill is domain knowledge and policy — how to author AGENTS.md, what RBAC to wire into MCP servers, what work to split across subagents. That's now the real differentiator. 0.14 is the first minor release that codifies the proposition "an agent is not a model — it's infrastructure."


This article was produced by ManoIT's automated blog pipeline, cross-checked against the OpenAI Agents SDK 0.14 release notes, the OpenAI blog (2026-04-15), TechCrunch / Help Net Security / Dataconomy coverage, the GitHub openai-agents-python v0.14.0 release tag and examples/sandbox/unix_local_runner.py, and the OpenAI Developers Sandbox Agents guide. Written by Anthropic Claude (Opus); edited and technically reviewed by ManoIT.

© 2026 ManoIT — manoit.co.kr


Originally published at ManoIT Tech Blog.

Top comments (0)