Why this guide?
OpenAI’s AgentKit gives you an end-to-end way to design, run, and ship agentic apps—bridging the gap between code, tools, UI, evals, and deployment. You’ll build a real, task-running agent you can run locally and embed in a web app. We’ll use the Agents SDK for code and ChatKit for the drop-in UI.
What you’ll build
- A Python (or TypeScript) agent that:
- calls tools to fetch data and write files,
- can run background tasks (e.g., long scrapes or reports),
- exposes an embeddable chat interface with streaming.
- Lightweight eval hooks and guardrails (timeouts, schema checking).
- A “one-command” local dev loop.
Prereqs
- Python 3.10+ (or Node 18+), uv/pip (or pnpm/npm)
- An OpenAI API key with access to current models
- Basic familiarity with tool/function calling
Docs: AgentKit & platform overview, Agents SDK (Python & TS), background tasks, ChatKit.
Step 1 — Scaffolding the project
Choose Python or TypeScript.
Option A: Python
mkdir agentkit-demo && cd agentkit-demo
uv venv && source .venv/bin/activate
pip install -U openai-agents httpx pydantic fastapi uvicorn python-dotenv
# .env
OPENAI_API_KEY=sk-...
The OpenAI Agents SDK is the official library for building agentic workflows in code.
Option B: TypeScript
mkdir agentkit-demo && cd agentkit-demo
pnpm init -y
pnpm add openai-agents zod undici dotenv
Step 2 — Define your first tool (deterministic, side-effectful)
Tools are ordinary functions with a typed signature. The agent decides when to call them.
Python
# tools.py
from pydantic import BaseModel
class SaveNoteArgs(BaseModel):
title: str
body: str
def save_note(args: SaveNoteArgs) -> str:
path = f"notes/{args.title.replace(' ', '_')}.md"
os.makedirs("notes", exist_ok=True)
with open(path, "w", encoding="utf-8") as f:
f.write(f"# {args.title}\n\n{args.body}\n")
return f"Saved note to {path}"
TypeScript
// tools.ts
import { z } from "zod";
export const SaveNoteArgs = z.object({
title: z.string().min(1),
body: z.string().min(1),
});
export async function saveNote({ title, body }: z.infer<typeof SaveNoteArgs>) {
const path = `notes/${title.replace(/\s+/g, "_")}.md`;
await fs.promises.mkdir("notes", { recursive: true });
await fs.promises.writeFile(path, `# ${title}\n\n${body}\n`);
return `Saved note to ${path}`;
}
Step 3 — Create the agent and register tools
Python
# agent.py
import os
from openai_agents import Agent, Tool, run
from tools import save_note, SaveNoteArgs
agent = Agent(
model="gpt-5", # choose your latest supported model
name="TaskRunner",
instructions=(
"You are a helpful task-running agent. "
"Prefer calling tools for side effects. Be concise."
),
tools=[
Tool(function=save_note, schema=SaveNoteArgs, name="save_note",
description="Save a markdown note locally")
],
timeouts={"tool_call_s": 20, "response_s": 60},
)
if __name__ == "__main__":
# quick CLI loop
while True:
user = input("You: ")
for delta in run(agent, user): # streams tokens & tool calls
print(delta, end="", flush=True)
print()
The Agents SDK exposes small, composable primitives (agents, tools, runs) with built-in streaming and tracing.
TypeScript
// agent.ts
import { Agent, tool } from "openai-agents";
import { saveNote, SaveNoteArgs } from "./tools";
export const agent = new Agent({
model: "gpt-5",
name: "TaskRunner",
instructions:
"Helpful task-runner. Use tools for any filesystem or network action.",
tools: [
tool({
name: "save_note",
description: "Save a markdown note locally",
parameters: SaveNoteArgs,
execute: saveNote,
}),
],
timeouts: { toolCallMs: 20_000, responseMs: 60_000 },
});
Step 4 — Run background tasks (long jobs)
Some work (web crawls, data pipelines) shouldn’t block chat. Use SDK background-mode helpers to spawn and track tasks.
# background.py
from openai_agents import background_task
@background_task(name="weekly_report")
def build_weekly_report():
# ...long computation or API calls...
return {"status": "ok", "url": "reports/2025-10-08.html"}
Register a tool that starts the job and returns a ticket; separately, another tool checks status by ticket ID. Pattern:start_* + get_*_status.
Step 5 — Add simple guardrails & schema-checked outputs
Ask the model to always return a typed schema when not using tools.
from pydantic import BaseModel
class AgentReply(BaseModel):
kind: str # "note" | "status" | "error"
message: str
agent.output_schema = AgentReply # SDK validates; on failure you can retry
This catches malformed outputs early and pairs nicely with retries and timeouts.
Step 6 — Add a minimal web UI with ChatKit
ChatKit provides embeddable widgets that connect to your agent backend. Drop it into any React page.
// app/page.tsx (Next.js)
import { Chat } from "@openai/chatkit/react";
export default function Page() {
return (
<main>
<h1>TaskRunner</h1>
<Chat
endpoint="/api/chat" // your server endpoint
title="TaskRunner"
placeholder="Ask me to write and save a note…"
showToolCallSteps // surfaces tool invocations in the UI
/>
</main>
);
}
Server route (pseudo):
// pages/api/chat.ts
import type { NextApiRequest, NextApiResponse } from "next";
import { agent } from "../../agent";
export default async function handler(req: NextApiRequest, res: NextApiResponse) {
const { message } = req.body;
res.setHeader("Content-Type", "text/event-stream"); // stream
for await (const chunk of agent.run(message)) res.write(chunk);
res.end();
}
Step 7 — Evals & iteration inside AgentKit
AgentKit’s platform view lets you design flows, visualize runs, and layer evals to measure correctness/latency across scenarios—so you can ship with confidence. Use it alongside your SDK project; import traces and compare runs as you tune prompts and tools.
End-to-end test script (sanity check)
# tests/test_happy_path.py
from agent import agent
def test_save_note_roundtrip():
user = "Create a note titled 'Sprint Summary' with 5 bullets."
chunks = []
for d in agent.run(user):
chunks.append(d)
assert any("Saved note to" in c for c in map(str, chunks))
Deployment snapshot
- Local:
uvicorn app:api --reload or node server.mjs
- Hosted: any serverless or container runtime; keep OPENAI_API_KEY in secrets
- Observability: ship logs/traces to AgentKit’s tracing UI to analyze tool calls and latencies.
Common pitfalls (and how to dodge them)
Letting the model “pretend” to do work
Always route side effects through tools; assert “Never claim completion without tool output” in your system instructions. Track tool success in logs.
No timeouts or retries
Wrap tool calls with sane timeouts; implement backoff on transient failures. The SDK surfaces per-tool and per-response timeouts.
Unbounded tool arguments
Validate arguments with pydantic/zod (length limits, enums). Reject dangerous inputs (paths, shell chars) before executing.
Background work blocking the chat
Use background-mode helpers + “ticket” pattern. Update the user with progress messages.
UI without visibility
Enable step rendering in ChatKit so users see tool calls and statuses. It builds trust.
No success metric
Add simple evals (did a file appear? does a URL return 200?); bake a tiny eval set per feature in AgentKit so regressions are obvious.
FAQs
How is AgentKit different from “just calling the API”?
AgentKit is the platform layer: visual workflows, tracing, evals, deployment aids, and UI building blocks. You still write agents with the Agents SDK (Python/TS).
Can I run multi-agent workflows?
Yes—compose multiple agents and coordinate via messages/tools. See the Cookbook’s multi-agent example for patterns.
What models should I use?
Use your latest general-purpose reasoning model (e.g., GPT-5 tier available to your account) and smaller models for cheap tool-arg synthesis when latency matters.
Does this work with computer-using agents?
Yes—pair with the Computer-Using Agent capability when tasks require actual UI control (e.g., clicking through sites). Add human-in-the-loop gates for sensitive steps.
Copy-paste starter (Python)
# quickstart
uv venv && source .venv/bin/activate
pip install -U openai-agents fastapi uvicorn pydantic python-dotenv
echo "OPENAI_API_KEY=sk-..." > .env
# app.py
import os
from fastapi import FastAPI
from openai_agents import Agent, Tool
from pydantic import BaseModel
class SaveNoteArgs(BaseModel):
title: str
body: str
def save_note(args: SaveNoteArgs) -> str:
os.makedirs("notes", exist_ok=True)
path = f"notes/{args.title.replace(' ', '_')}.md"
with open(path, "w", encoding="utf-8") as f:
f.write(f"# {args.title}\n\n{args.body}\n")
return f"Saved note to {path}"
agent = Agent(
model="gpt-5",
name="TaskRunner",
instructions="Use tools for side effects, be concise.",
tools=[Tool(function=save_note, schema=SaveNoteArgs, name="save_note")],
timeouts={"tool_call_s": 20, "response_s": 60},
)
api = FastAPI()
@api.post("/chat")
async def chat(body: dict):
user = body.get("message", "")
chunks = []
async for delta in agent.arun(user):
chunks.append(str(delta))
return {"stream": "".join(chunks)}
Run:
uvicorn app:api --reload
Where to go next
AgentKit overview & platform: shipping flows, evals, tracing, deployment.
Agents SDK docs & repos: Python & TypeScript quickstarts, patterns, and APIs.
ChatKit docs: add a polished chat UI in minutes, wired to your backend.
Cookbook multi-agent example: advanced coordination patterns.
Final takeaway
With AgentKit + Agents SDK + ChatKit, you can go from “idea” to a task-running AI agent with real tools, background work, and an embeddable UI—fast. Start with one reliable tool, wire up background tasks, surface steps in the UI, and add evals early. That’s the path to agents you can trust in production.
Top comments (0)