DEV Community

Cover image for How to Use OpenAI AgentKit (2025): Build, Deploy, and Optimize AI Agents
Emmanuel Mumba
Emmanuel Mumba

Posted on • Originally published at deepdocs.dev

How to Use OpenAI AgentKit (2025): Build, Deploy, and Optimize AI Agents

Until recently, building AI agents required juggling fragmented tools manual orchestration, separate connectors, custom pipelines, and complex front-end work before anything went live. OpenAI’s new AgentKit changes that.

Launched at DevDay 2025, AgentKit is a complete toolkit for developers and enterprises to build, deploy, and optimize AI agents in one unified workflow. It consolidates what used to take weeks of setup into a streamlined process that moves from idea to deployment in hours.

This guide covers what AgentKit includes, how to use its components, and why it’s a major shift for developers building agentic systems.

1. What Is AgentKit?

AgentKit combines several OpenAI technologies into one developer platform. Instead of managing multiple scripts or APIs, developers can now design, test, and deploy intelligent agents using a structured set of tools:

  • Agent Builder – a visual canvas for creating and versioning multi-agent workflows.
  • Connector Registry – a central hub for managing how data and external tools connect across OpenAI products.
  • ChatKit – a front-end toolkit for embedding customizable chat interfaces in your applications.
  • Evals – built-in performance evaluation tools for datasets, automated grading, and optimization.
  • Reinforcement Fine-Tuning (RFT) – advanced customization to improve reasoning and tool-calling accuracy.

Together, these components form an end-to-end workflow: design → connect → deploy → optimize.

2. Build Agents Visually with Agent Builder

One of the most significant parts of AgentKit is the Agent Builder. Instead of writing orchestration logic by hand, developers get a visual canvas where they can connect nodes, define logic, set conditions, and integrate external tools.

It supports:

  • Drag-and-drop workflow creation
  • Preview runs and inline eval configuration
  • Versioning and rollback
  • Guardrails for safety and compliance

Each node can represent an agent, a logic block, a tool call, or a safety filter. Guardrails can automatically mask personally identifiable information (PII), detect jailbreaks, or flag unintended outputs.

3. Manage Integrations with Connector Registry

As organizations deploy multiple agents, data governance becomes a challenge. OpenAI’s Connector Registry provides a single place to manage all integrations across different workspaces and teams.

Through the registry, administrators can securely connect data sources such as:

  • Dropbox
  • Google Drive
  • Microsoft Teams
  • SharePoint

It also supports third-party MCPs (Model Context Protocols), allowing developers to extend connectivity to their own APIs or proprietary tools.

For enterprises, this means less duplication, consistent permission management, and clear visibility into which connectors each agent can access. It’s especially valuable for larger environments where compliance and audit trails matter as much as functionality.

4. Embed Chat Experiences with ChatKit

One of the hardest parts of launching an AI agent is creating a smooth, interactive front-end. ChatKit removes that complexity by offering embeddable chat interfaces that connect directly to your workflows.

There are two main ways to use ChatKit:

  1. OpenAI-hosted integration – Embed ChatKit in your web app and let OpenAI handle hosting and scaling.
  2. Advanced integration – Run it on your own infrastructure using the ChatKit Python SDK and React bindings.

Here’s a minimal example using the hosted option.

Backend (FastAPI example):

Frontend (React):

Once embedded, the chat widget handles message threading, streaming responses, and visualizing model reasoning automatically. Developers can customize the UI theme, prompt behavior, and feature set without rebuilding the front-end from scratch.

Canva integrated ChatKit into its developer documentation in less than an hour, creating a conversational support assistant that guides users through building apps and integrations.

HubSpot and LegalOn also use ChatKit to power internal knowledge assistants and customer support tools.

For most teams, what used to take weeks of UI engineering can now be done in a single day.

5. Measure and Improve with Evals

Agent performance depends on continuous testing, and that’s where Evals comes in. Originally launched to evaluate model prompts, Evals has expanded into a full testing framework within AgentKit.

New capabilities include:

  • Datasets for building and reusing eval cases
  • Trace grading to assess complex workflows end-to-end
  • Automated prompt optimization to improve outputs based on grader feedback
  • Third-party model support for cross-model evaluation

Developers can now run evaluations across multiple agents, track where logic breaks, and automatically generate better prompts.

Organizations like Carlyle report that using Evals cut development time on multi-agent systems by half and increased accuracy by 30%. For production environments, this kind of automated testing is critical to ensuring reliable, repeatable behavior.

6. Fine-Tune Agent Behavior with Reinforcement Fine-Tuning (RFT)

Beyond evaluation, OpenAI is introducing Reinforcement Fine-Tuning (RFT) a new technique that lets developers train reasoning models to perform specific tasks better.

Currently available for o4-mini and in private beta for GPT-5, RFT enables:

  • Custom tool calls – teaching agents when and how to invoke functions effectively
  • Custom graders – defining success metrics tailored to your domain

This gives teams fine-grained control over how their agents reason, plan, and respond. For use cases like financial advice, research assistance, or enterprise automation, it bridges the gap between general-purpose AI and task-specific reliability.

7. Why AgentKit Changes the Game

AgentKit represents a shift from loosely connected tools to a single, structured ecosystem. A few years ago, developers had to manually stitch together APIs, handle safety and orchestration logic, and build UI components from scratch.

Now, the workflow looks more like this:

  1. Design the agent visually with Agent Builder.
  2. Connect data and APIs via Connector Registry.
  3. Deploy instantly with ChatKit.
  4. Evaluate and optimize performance with Evals and RFT.

The result is faster iteration, better governance, and stronger collaboration between developers, designers, and compliance teams. By standardizing each step, OpenAI effectively turned “building an agent” into a repeatable development process rather than an experimental project.

💡

Pro tip: As you start building with AgentKit, keeping your docs current can be tough. Deepdocs automates that it updates and fixes your documentation on every commit, right from GitHub.

8. Getting Started

Here’s what’s currently available:

  • ChatKit and Evals are generally available.
  • Agent Builder is in beta.
  • Connector Registry is rolling out to API, ChatGPT Enterprise, and EDU customers.

All tools are included under standard API model pricing. Developers can explore them directly in the OpenAI platform dashboard.

9. Conclusion

AgentKit isn’t just another feature drop it’s a new way to build AI systems. It brings everything together: workflow design, data connections, deployment, and testing. That means developers can focus more on creating and less on setup.

Whether you’re a small team or a large company, AgentKit makes it faster and safer to build, test, and launch real AI agents marking a big step forward in how we create intelligent tools.

Top comments (0)