On April 15, 2026, OpenAI announced the next evolution of the Agents SDK and shipped openai-agents-python v0.14.0/v0.14.1 with two headline features: Sandbox Agents and a model-native harness. Both are now GA for all API customers.
The one-sentence read: the default shape of "an AI agent" just moved from a chatbot with tool calls to a coworker with a computer.
The three trade-offs OpenAI is trying to escape
OpenAI frames the existing landscape as three approaches, each with a trade-off:
- Model-agnostic frameworks (LangChain, LlamaIndex, etc.) — flexible across models, but can't fully exploit frontier model capabilities. There's always a gap between "what the latest model can do" and "what the framework surfaces."
- Provider SDKs — closest to the model, but thin on harness observability. Hard to see what a production agent is actually doing.
- Managed agent APIs — easy to deploy, but constrain execution location and data access.
The response: split harness and compute. The harness owns the loop, memory, orchestration, observability. The sandbox owns execution, filesystem, network, tool calls. They run in different isolation domains.
The primary reason is security. If model-generated code executes in the same environment as your credentials, you're one prompt injection away from exfiltration. Long-running agents also need to survive environment loss — which a clean split makes tractable.
What landed on the harness side
| Feature | What it does |
|---|---|
| Configurable memory | Reuse lessons across runs, progressive disclosure instead of shoving everything into context |
| Sandbox-aware orchestration | Route sub-agents into isolated sandboxes |
| Codex-like filesystem tools |
read/write/edit as first-class |
| MCP support | Native Model Context Protocol (Anthropic's open standard) |
| Skills | Progressive disclosure of domain capabilities |
| AGENTS.md | Project-level custom instructions file |
| Shell tool | 1st-class shell command execution |
| Apply patch tool | 1st-class file patching |
Two items stand out.
AGENTS.md is essentially CLAUDE.md with a different name. If you've used Claude Code, the concept is identical: a project-root file containing custom instructions for the agent. OpenAI officially adopting this convention is itself a signal — "project-level agent config file" is becoming a cross-provider pattern.
MCP support means OpenAI officially endorsed Anthropic's Model Context Protocol. When both major model providers support MCP, the tool ecosystem becomes portable. This is a bigger deal than the harness features IMO — it's infrastructure-level consolidation.
What landed on the sandbox side
Seven default providers ship out of the box:
- Blaxel
- Cloudflare
- Daytona
- E2B
- Modal
- Runloop
- Vercel
Plus BYO sandbox for anyone who wants to run their own isolation layer.
The most interesting piece is the Manifest abstraction — a portable format describing an agent workspace:
- Local file mounts
- Output directories
- Storage integrations (AWS S3, Google Cloud Storage, Azure Blob Storage, Cloudflare R2)
- Git repository cloning
If the sandbox container dies, the Manifest lets you reconstruct an identical workspace elsewhere. This is what unlocks the next piece:
Durable execution
Agent state is externalized from the container. Snapshotting and rehydration are built in. If an existing environment fails or expires, the agent resumes from its last checkpoint in a new container.
For long-running tasks — bulk translations, large refactors, overnight research — this is the difference between "I can run this" and "I can run this safely."
Permissions
Per-Manifest-entry file permissions, mapped to Unix filesystem semantics:
dataroom/ → read-only
output/ → writable
config/ → inaccessible
This is not a convenience feature. It's a structural limit on blast radius from prompt injection. Injection isn't prevented, but what an injected instruction can actually do is bounded.
Scaling
One agent run can use one sandbox or many. Sandboxes are called only when needed. Sub-agents route to isolated containers. Container-level parallelism for speed. Cost and performance on the same axis.
Availability and pricing
- GA for all OpenAI API customers
- Standard API pricing — tokens + tool usage. No new tier.
- Python first (v0.14.0+). TypeScript coming later.
- Roadmap: Code mode and Subagents expanding to Python and TypeScript.
Ecosystem size at the time of this release: ~14.7M monthly Python downloads, ~1.5M TypeScript downloads, 19,000+ GitHub stars, 250+ contributors.
The Claude Code convergence
If you look at the naming, the parallel with Anthropic's Claude Code is hard to miss:
| OpenAI Agents SDK | Anthropic Claude Code |
|---|---|
AGENTS.md |
CLAUDE.md |
| Skills | Skills |
| Shell tool | Bash tool |
| Apply patch tool | Edit tool |
Both companies converged on the same answer to "what does an AI agent need?" The answer is: shell, filesystem, project-level instructions, tool-use standard (MCP), and a sandbox to do it all safely.
This matters because it redefines the default. "AI agent = chatbot with tools" is dead. "AI agent = semi-autonomous process with a computer" is the new baseline.
Practical angles for small teams
1. Wire Manifest to your data layer directly
Connect S3 or Cloudflare R2 through a Manifest entry. The agent reads source data and writes results itself — no wrapper scripts. This collapses a lot of "data prep → run → save results" plumbing.
2. Safe overnight jobs
With Durable execution, "run 50 translations overnight" is no longer a gamble. Container crashes resume from checkpoints.
3. Security-by-default code agents
Permissions structurally prevent common mistakes like "agent has access to .env and leaks credentials." Solo devs especially benefit — the security layer isn't "remember to configure it," it's "the Manifest says dataroom is read-only."
4. Isolated sub-agent parallelism
Complex tasks split into sub-agents, each in its own sandbox, running in parallel. A pattern that was DIY is now production-ready.
Trade-offs worth watching
TypeScript lag. Python-first is painful for TS/frontend-heavy teams. TypeScript is on the roadmap but timing isn't public.
Provider lock-in risk. Seven sandbox providers is great on paper, but depth of integration with any one will make migration costly. Manifest promises portability — real-world portability always has edges.
Billing predictability. Three cost axes (sandbox compute + tokens + tool usage) make budget modeling harder. Monitor carefully in the early rollout.
Prompt injection is not solved. Harness/compute split limits damage. It doesn't prevent injection. Input validation, least-privilege, and audit logging are still on you.
Bottom line
"AI agent" as an abstraction shifted from "a model that chats and calls tools" to "a semi-autonomous process with sandboxed compute, durable state, and Unix-style permissions." OpenAI formalized the direction Claude Code had been demonstrating, both companies are on MCP, and the practical implications for small teams are real — overnight jobs, direct data-layer access, structural security.
One more stack to learn. Probably worth it.
Top comments (0)