정상록

Posted on Apr 17

OpenAI Agents SDK v0.14 — Sandbox Agents and the Model-Native Harness Go GA

On April 15, 2026, OpenAI announced the next evolution of the Agents SDK and shipped openai-agents-python v0.14.0/v0.14.1 with two headline features: Sandbox Agents and a model-native harness. Both are now GA for all API customers.

The one-sentence read: the default shape of "an AI agent" just moved from a chatbot with tool calls to a coworker with a computer.

The three trade-offs OpenAI is trying to escape

OpenAI frames the existing landscape as three approaches, each with a trade-off:

Model-agnostic frameworks (LangChain, LlamaIndex, etc.) — flexible across models, but can't fully exploit frontier model capabilities. There's always a gap between "what the latest model can do" and "what the framework surfaces."
Provider SDKs — closest to the model, but thin on harness observability. Hard to see what a production agent is actually doing.
Managed agent APIs — easy to deploy, but constrain execution location and data access.

The response: split harness and compute. The harness owns the loop, memory, orchestration, observability. The sandbox owns execution, filesystem, network, tool calls. They run in different isolation domains.

The primary reason is security. If model-generated code executes in the same environment as your credentials, you're one prompt injection away from exfiltration. Long-running agents also need to survive environment loss — which a clean split makes tractable.

What landed on the harness side

Feature	What it does
Configurable memory	Reuse lessons across runs, progressive disclosure instead of shoving everything into context
Sandbox-aware orchestration	Route sub-agents into isolated sandboxes
Codex-like filesystem tools	`read`/`write`/`edit` as first-class
MCP support	Native Model Context Protocol (Anthropic's open standard)
Skills	Progressive disclosure of domain capabilities
AGENTS.md	Project-level custom instructions file
Shell tool	1st-class shell command execution
Apply patch tool	1st-class file patching

Two items stand out.

AGENTS.md is essentially CLAUDE.md with a different name. If you've used Claude Code, the concept is identical: a project-root file containing custom instructions for the agent. OpenAI officially adopting this convention is itself a signal — "project-level agent config file" is becoming a cross-provider pattern.

MCP support means OpenAI officially endorsed Anthropic's Model Context Protocol. When both major model providers support MCP, the tool ecosystem becomes portable. This is a bigger deal than the harness features IMO — it's infrastructure-level consolidation.

What landed on the sandbox side

Seven default providers ship out of the box:

Blaxel
Cloudflare
Daytona
E2B
Modal
Runloop
Vercel

Plus BYO sandbox for anyone who wants to run their own isolation layer.

The most interesting piece is the Manifest abstraction — a portable format describing an agent workspace:

Local file mounts
Output directories
Storage integrations (AWS S3, Google Cloud Storage, Azure Blob Storage, Cloudflare R2)
Git repository cloning

If the sandbox container dies, the Manifest lets you reconstruct an identical workspace elsewhere. This is what unlocks the next piece:

Durable execution

Agent state is externalized from the container. Snapshotting and rehydration are built in. If an existing environment fails or expires, the agent resumes from its last checkpoint in a new container.

For long-running tasks — bulk translations, large refactors, overnight research — this is the difference between "I can run this" and "I can run this safely."

Permissions

Per-Manifest-entry file permissions, mapped to Unix filesystem semantics:

dataroom/   → read-only
output/     → writable
config/     → inaccessible

This is not a convenience feature. It's a structural limit on blast radius from prompt injection. Injection isn't prevented, but what an injected instruction can actually do is bounded.

Scaling

One agent run can use one sandbox or many. Sandboxes are called only when needed. Sub-agents route to isolated containers. Container-level parallelism for speed. Cost and performance on the same axis.

Availability and pricing

GA for all OpenAI API customers
Standard API pricing — tokens + tool usage. No new tier.
Python first (v0.14.0+). TypeScript coming later.
Roadmap: Code mode and Subagents expanding to Python and TypeScript.

Ecosystem size at the time of this release: ~14.7M monthly Python downloads, ~1.5M TypeScript downloads, 19,000+ GitHub stars, 250+ contributors.

The Claude Code convergence

If you look at the naming, the parallel with Anthropic's Claude Code is hard to miss:

OpenAI Agents SDK	Anthropic Claude Code
`AGENTS.md`	`CLAUDE.md`
Skills	Skills
Shell tool	Bash tool
Apply patch tool	Edit tool

Both companies converged on the same answer to "what does an AI agent need?" The answer is: shell, filesystem, project-level instructions, tool-use standard (MCP), and a sandbox to do it all safely.

This matters because it redefines the default. "AI agent = chatbot with tools" is dead. "AI agent = semi-autonomous process with a computer" is the new baseline.

Practical angles for small teams

1. Wire Manifest to your data layer directly

Connect S3 or Cloudflare R2 through a Manifest entry. The agent reads source data and writes results itself — no wrapper scripts. This collapses a lot of "data prep → run → save results" plumbing.

2. Safe overnight jobs

With Durable execution, "run 50 translations overnight" is no longer a gamble. Container crashes resume from checkpoints.

3. Security-by-default code agents

Permissions structurally prevent common mistakes like "agent has access to .env and leaks credentials." Solo devs especially benefit — the security layer isn't "remember to configure it," it's "the Manifest says dataroom is read-only."

4. Isolated sub-agent parallelism

Complex tasks split into sub-agents, each in its own sandbox, running in parallel. A pattern that was DIY is now production-ready.

Trade-offs worth watching

TypeScript lag. Python-first is painful for TS/frontend-heavy teams. TypeScript is on the roadmap but timing isn't public.

Provider lock-in risk. Seven sandbox providers is great on paper, but depth of integration with any one will make migration costly. Manifest promises portability — real-world portability always has edges.

Billing predictability. Three cost axes (sandbox compute + tokens + tool usage) make budget modeling harder. Monitor carefully in the early rollout.

Prompt injection is not solved. Harness/compute split limits damage. It doesn't prevent injection. Input validation, least-privilege, and audit logging are still on you.

Bottom line

"AI agent" as an abstraction shifted from "a model that chats and calls tools" to "a semi-autonomous process with sandboxed compute, durable state, and Unix-style permissions." OpenAI formalized the direction Claude Code had been demonstrating, both companies are on MCP, and the practical implications for small teams are real — overnight jobs, direct data-layer access, structural security.

One more stack to learn. Probably worth it.

DEV Community