Dhruv Joshi

Posted on Oct 8

Build a Task-Running AI Agent with the New ChatGPT AgentKit: A Step-by-Step Guide

#programming #python #agentaichallenge #ai

Why this guide?

OpenAI’s AgentKit gives you an end-to-end way to design, run, and ship agentic apps—bridging the gap between code, tools, UI, evals, and deployment. You’ll build a real, task-running agent you can run locally and embed in a web app. We’ll use the Agents SDK for code and ChatKit for the drop-in UI.

What you’ll build

A Python (or TypeScript) agent that:
calls tools to fetch data and write files,
can run background tasks (e.g., long scrapes or reports),
exposes an embeddable chat interface with streaming.
Lightweight eval hooks and guardrails (timeouts, schema checking).
A “one-command” local dev loop.

Prereqs

Python 3.10+ (or Node 18+), uv/pip (or pnpm/npm)
An OpenAI API key with access to current models
Basic familiarity with tool/function calling

Docs: AgentKit & platform overview, Agents SDK (Python & TS), background tasks, ChatKit.

Step 1 — Scaffolding the project

Choose Python or TypeScript.

Option A: Python

mkdir agentkit-demo && cd agentkit-demo
uv venv && source .venv/bin/activate
pip install -U openai-agents httpx pydantic fastapi uvicorn python-dotenv

# .env
OPENAI_API_KEY=sk-...

The OpenAI Agents SDK is the official library for building agentic workflows in code.

Option B: TypeScript

mkdir agentkit-demo && cd agentkit-demo
pnpm init -y
pnpm add openai-agents zod undici dotenv

Step 2 — Define your first tool (deterministic, side-effectful)

Tools are ordinary functions with a typed signature. The agent decides when to call them.

Python

# tools.py
from pydantic import BaseModel

class SaveNoteArgs(BaseModel):
    title: str
    body: str

def save_note(args: SaveNoteArgs) -> str:
    path = f"notes/{args.title.replace(' ', '_')}.md"
    os.makedirs("notes", exist_ok=True)
    with open(path, "w", encoding="utf-8") as f:
        f.write(f"# {args.title}\n\n{args.body}\n")
    return f"Saved note to {path}"

TypeScript

// tools.ts
import { z } from "zod";
export const SaveNoteArgs = z.object({
  title: z.string().min(1),
  body: z.string().min(1),
});

export async function saveNote({ title, body }: z.infer<typeof SaveNoteArgs>) {
  const path = `notes/${title.replace(/\s+/g, "_")}.md`;
  await fs.promises.mkdir("notes", { recursive: true });
  await fs.promises.writeFile(path, `# ${title}\n\n${body}\n`);
  return `Saved note to ${path}`;
}

Step 3 — Create the agent and register tools

Python

# agent.py
import os
from openai_agents import Agent, Tool, run
from tools import save_note, SaveNoteArgs

agent = Agent(
    model="gpt-5",               # choose your latest supported model
    name="TaskRunner",
    instructions=(
        "You are a helpful task-running agent. "
        "Prefer calling tools for side effects. Be concise."
    ),
    tools=[
        Tool(function=save_note, schema=SaveNoteArgs, name="save_note",
             description="Save a markdown note locally")
    ],
    timeouts={"tool_call_s": 20, "response_s": 60},
)

if __name__ == "__main__":
    # quick CLI loop
    while True:
        user = input("You: ")
        for delta in run(agent, user):   # streams tokens & tool calls
            print(delta, end="", flush=True)
        print()

The Agents SDK exposes small, composable primitives (agents, tools, runs) with built-in streaming and tracing.

TypeScript

// agent.ts
import { Agent, tool } from "openai-agents";
import { saveNote, SaveNoteArgs } from "./tools";

export const agent = new Agent({
  model: "gpt-5",
  name: "TaskRunner",
  instructions:
    "Helpful task-runner. Use tools for any filesystem or network action.",
  tools: [
    tool({
      name: "save_note",
      description: "Save a markdown note locally",
      parameters: SaveNoteArgs,
      execute: saveNote,
    }),
  ],
  timeouts: { toolCallMs: 20_000, responseMs: 60_000 },
});

Step 4 — Run background tasks (long jobs)

Some work (web crawls, data pipelines) shouldn’t block chat. Use SDK background-mode helpers to spawn and track tasks.

# background.py
from openai_agents import background_task

@background_task(name="weekly_report")
def build_weekly_report():
    # ...long computation or API calls...
    return {"status": "ok", "url": "reports/2025-10-08.html"}

Register a tool that starts the job and returns a ticket; separately, another tool checks status by ticket ID. Pattern:start_* + get_*_status.

Step 5 — Add simple guardrails & schema-checked outputs

Ask the model to always return a typed schema when not using tools.

from pydantic import BaseModel

class AgentReply(BaseModel):
    kind: str  # "note" | "status" | "error"
    message: str

agent.output_schema = AgentReply  # SDK validates; on failure you can retry

This catches malformed outputs early and pairs nicely with retries and timeouts.

Step 6 — Add a minimal web UI with ChatKit

ChatKit provides embeddable widgets that connect to your agent backend. Drop it into any React page.

// app/page.tsx (Next.js)
import { Chat } from "@openai/chatkit/react";

export default function Page() {
  return (
    <main>
      <h1>TaskRunner</h1>
      <Chat
        endpoint="/api/chat"       // your server endpoint
        title="TaskRunner"
        placeholder="Ask me to write and save a note…"
        showToolCallSteps          // surfaces tool invocations in the UI
      />
    </main>
  );
}

Server route (pseudo):

// pages/api/chat.ts
import type { NextApiRequest, NextApiResponse } from "next";
import { agent } from "../../agent";

export default async function handler(req: NextApiRequest, res: NextApiResponse) {
  const { message } = req.body;
  res.setHeader("Content-Type", "text/event-stream"); // stream
  for await (const chunk of agent.run(message)) res.write(chunk);
  res.end();
}

Step 7 — Evals & iteration inside AgentKit

AgentKit’s platform view lets you design flows, visualize runs, and layer evals to measure correctness/latency across scenarios—so you can ship with confidence. Use it alongside your SDK project; import traces and compare runs as you tune prompts and tools.

End-to-end test script (sanity check)

# tests/test_happy_path.py
from agent import agent

def test_save_note_roundtrip():
    user = "Create a note titled 'Sprint Summary' with 5 bullets."
    chunks = []
    for d in agent.run(user):
        chunks.append(d)
    assert any("Saved note to" in c for c in map(str, chunks))

Deployment snapshot

Local:uvicorn app:api --reload or node server.mjs
Hosted: any serverless or container runtime; keep OPENAI_API_KEY in secrets
Observability: ship logs/traces to AgentKit’s tracing UI to analyze tool calls and latencies.

Common pitfalls (and how to dodge them)

Letting the model “pretend” to do work
Always route side effects through tools; assert “Never claim completion without tool output” in your system instructions. Track tool success in logs.

No timeouts or retries
Wrap tool calls with sane timeouts; implement backoff on transient failures. The SDK surfaces per-tool and per-response timeouts.

Unbounded tool arguments
Validate arguments with pydantic/zod (length limits, enums). Reject dangerous inputs (paths, shell chars) before executing.

Background work blocking the chat
Use background-mode helpers + “ticket” pattern. Update the user with progress messages.

UI without visibility
Enable step rendering in ChatKit so users see tool calls and statuses. It builds trust.

No success metric
Add simple evals (did a file appear? does a URL return 200?); bake a tiny eval set per feature in AgentKit so regressions are obvious.

FAQs

How is AgentKit different from “just calling the API”?

AgentKit is the platform layer: visual workflows, tracing, evals, deployment aids, and UI building blocks. You still write agents with the Agents SDK (Python/TS).

Can I run multi-agent workflows?

Yes—compose multiple agents and coordinate via messages/tools. See the Cookbook’s multi-agent example for patterns.

What models should I use?

Use your latest general-purpose reasoning model (e.g., GPT-5 tier available to your account) and smaller models for cheap tool-arg synthesis when latency matters.

Does this work with computer-using agents?

Yes—pair with the Computer-Using Agent capability when tasks require actual UI control (e.g., clicking through sites). Add human-in-the-loop gates for sensitive steps.

Copy-paste starter (Python)

# quickstart
uv venv && source .venv/bin/activate
pip install -U openai-agents fastapi uvicorn pydantic python-dotenv
echo "OPENAI_API_KEY=sk-..." > .env

# app.py
import os
from fastapi import FastAPI
from openai_agents import Agent, Tool
from pydantic import BaseModel

class SaveNoteArgs(BaseModel):
    title: str
    body: str

def save_note(args: SaveNoteArgs) -> str:
    os.makedirs("notes", exist_ok=True)
    path = f"notes/{args.title.replace(' ', '_')}.md"
    with open(path, "w", encoding="utf-8") as f:
        f.write(f"# {args.title}\n\n{args.body}\n")
    return f"Saved note to {path}"

agent = Agent(
    model="gpt-5",
    name="TaskRunner",
    instructions="Use tools for side effects, be concise.",
    tools=[Tool(function=save_note, schema=SaveNoteArgs, name="save_note")],
    timeouts={"tool_call_s": 20, "response_s": 60},
)

api = FastAPI()

@api.post("/chat")
async def chat(body: dict):
    user = body.get("message", "")
    chunks = []
    async for delta in agent.arun(user):
        chunks.append(str(delta))
    return {"stream": "".join(chunks)}

Run:

uvicorn app:api --reload

Where to go next

AgentKit overview & platform: shipping flows, evals, tracing, deployment.
Agents SDK docs & repos: Python & TypeScript quickstarts, patterns, and APIs.
ChatKit docs: add a polished chat UI in minutes, wired to your backend.
Cookbook multi-agent example: advanced coordination patterns.

Final takeaway

With AgentKit + Agents SDK + ChatKit, you can go from “idea” to a task-running AI agent with real tools, background work, and an embeddable UI—fast. Start with one reliable tool, wire up background tasks, surface steps in the UI, and add evals early. That’s the path to agents you can trust in production.

DEV Community