DEV Community

BeanBean
BeanBean

Posted on • Originally published at nextfuture.io.vn

OpenAI Agents SDK 0.14: ship a sandbox coding agent in 15 min

Originally published on NextFuture

What's new this week

OpenAI shipped Agents SDK 0.14 on April 15, 2026; the Python package is already at 0.14.3 on PyPI. The headline addition is Sandbox Agents, a new primitive for running agents inside an isolated workspace where they can inspect files, execute shell commands, edit repos, snapshot state, and resume a session later. Until this release, building that loop meant hand-rolling Docker orchestration around Responses calls. Now it is four objects: SandboxAgent, Manifest, SandboxRunConfig, and a sandbox client.

Why it matters for builders

AI engineers get a first-party replacement for the custom sandbox glue most teams have been maintaining. Before: spin up a container, mount a repo, wire stdin/stdout into tool calls, and log traces by hand. After: declare a Manifest with the files and repos the agent should see, pick a DockerSandboxClient, and the runner handles workspace materialization, tool routing, and OpenAI-native tracing. Snapshots plus serialized session resume mean a failed overnight agent run restarts from where it crashed instead of from the first prompt.

Fullstack web engineers finally get a shippable path to "repo-aware" agents inside a product. A Next.js feature like "explain this PR" or "generate a migration script from this schema" no longer needs a homemade Code Interpreter. You hand the agent a Git repo through the Manifest, run inside a UnixLocalSandboxClient during local dev or Docker in production, and redact sensitive tool output at the tracing layer with a single flag. The SDK streams events so you can surface agent progress in a React client with the same server-sent events pattern you already use for chat completions.

Indie makers get a small API surface for coding agents. The new SandboxAgent keeps the normal Agent/Runner shape, so a weekend project like "an agent that writes, tests, and deploys a cron job for me" is tens of lines of Python instead of a weekend of Kubernetes YAML. Install once, import, ship tonight.

Hands-on: try it in under 15 minutes

You need Python 3.10+, Docker, and an OpenAI API key with billing enabled. Sandbox Agents call the Responses API under the hood and are billed at the standard model rate; there is no extra sandbox surcharge.

pip install 'openai-agents>=0.14.3' docker
export OPENAI_API_KEY=sk-...
Enter fullscreen mode Exit fullscreen mode

Create fix_repo.py. It spins up a Docker-isolated coding agent, points it at a scratch workspace, and asks it to patch a failing test:

import asyncio
from docker import from_env
from agents import SandboxAgent, Runner, RunConfig
from agents.sandbox import (
    Manifest, LocalDir, SandboxRunConfig,
    DockerSandboxClient, DockerSandboxClientOptions,
    DEFAULT_PYTHON_SANDBOX_IMAGE,
)

agent = SandboxAgent(
    name="repo-fixer",
    model="gpt-5.1",
    instructions="You are a senior Python engineer. Fix the failing test in /workspace.",
    default_manifest=Manifest(
        local_dirs=[LocalDir(src="./my-repo", dst="/workspace")],
    ),
)

async def main():
    result = await Runner.run(
        agent,
        "Run pytest, find the failing test, patch the bug, re-run until green.",
        max_turns=12,
        run_config=RunConfig(
            sandbox=SandboxRunConfig(
                client=DockerSandboxClient(from_env()),
                options=DockerSandboxClientOptions(
                    image=DEFAULT_PYTHON_SANDBOX_IMAGE,
                ),
                snapshots=True,
            ),
        ),
    )
    print(result.final_output)

asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

Run python fix_repo.py. The agent materializes ./my-repo into the container, shells out to pytest, reads the traceback, edits the source, and loops until the suite is green. Every tool span streams to the tracing dashboard at platform.openai.com/traces, including token usage per turn. Swap DockerSandboxClient for UnixLocalSandboxClient() if you want to skip Docker for a quick local experiment, but do not point it at your real home directory; the Unix client is a thin wrapper around the local filesystem.

Costs track the underlying model. A typical three-step fix on a 500-line repo with gpt-5.1 lands near $0.04-$0.10 per run. A long-horizon refactor using o5-pro with 30+ turns can reach a few dollars, so always set max_turns as a hard ceiling. Snapshots are stored on the sandbox client (local disk in the Docker case) and resume with SandboxRunConfig(session_state=...), so a crashed 20-minute run restarts at minute 14 rather than zero.

How it compares to alternatives

OpenAI Agents SDK 0.14Claude Agent SDKLangGraph 0.4

Starts atFree SDK + pay-as-you-go (gpt-5.1 from ~$2.50/M input tokens)Free SDK + Claude API ($5/M input for Opus 4.7)Free OSS; LangSmith tracing from $39/mo
Best forPython teams wanting sandboxed coding agents with snapshots and resumeTeams already on Claude that want MCP tools and 1M-token contextTeams that need fine-grained graph control over agent state transitions
Key limitSandbox surface is Python-only today; TypeScript support is still on the roadmapNo first-party Docker isolation - you wire containers yourselfRuns the state machine in your process; no built-in sandbox or snapshotting
IntegrationNative Docker/UnixLocal sandbox clients, built-in OpenAI tracing, Redis sessionsCLI + SDK, MCP-first tool model, hooks and slash commandsComposable with any LLM provider through LangChain adapters

Try it this week

Pick one slow maintenance task in your own repo tonight - a flaky test, a missing migration, a schema drift, a stale dependency pin - and point the script above at it. If the agent ships a patch in under ten turns, you have a production-shaped automation in a single Python file, and the cost of replacing it with a custom orchestrator becomes hard to justify. For context on where this fits the broader tooling curve, our AI coding agents recap maps the cost-per-feature trade-offs, and the OpenAI Codex April 2026 update review covers the product-side cousin of this SDK.


This article was originally published on NextFuture. Follow us for more fullstack & AI engineering content.

Top comments (0)