Jangwook Kim

Posted on Jun 11 • Originally published at effloow.com

OpenAI Agent Builder and Evals Winddown Migration Checklist

#openai #agentbuilder #evals #agentssdk

OpenAI's Agent Builder and Evals winddown turns a product update into a practical migration problem. Teams that used visual agent flows, dashboard evals, prompt optimizers, datasets, graders, or release gates now need to decide what should move to code, what belongs in ChatGPT Workspace Agents, what should be retired, and what evidence must be captured before access changes.

OpenAI's AgentKit announcement now carries a June 3, 2026 update saying Agent Builder and Evals will no longer be available on the OpenAI platform from November 30, 2026 onward. OpenAI's Evals guide says existing Evals content remains available during the transition, becomes read-only for existing users on October 31, 2026, and is scheduled to shut down on November 30, 2026. OpenAI's Agent Builder migration guide points users toward exporting workflows as Agents SDK code, recreating suitable workflows as Workspace Agents, and validating behavior manually.

This article is for agent teams, developer-tool vendors, and platform owners who need a migration plan that can survive procurement, security review, and release management. Effloow Lab also ran a small OpenAI API check against synthetic migration cases. The lab did not access a real Agent Builder workspace, export a customer workflow, run a Workspace Agent, or migrate production Evals. Treat it as a prompt-harness sanity check, not proof of automatic migration.

Public lab note: /lab-runs/openai-agent-builder-evals-winddown-migration-checklist-2026

Why This Matters

The risk is not only losing a UI. The risk is losing operational knowledge that lives inside visual flows, eval datasets, graders, prompt experiments, and dashboard-only review habits.

A visual agent prototype can hide important product decisions: which tools are safe to call, who approves risky actions, which prompt variants were rejected, what trace reviewers look for, and which cases block release. An eval project can hide equally important quality decisions: the source of test cases, the grader definition, the human acceptance threshold, and the baseline result that proved a prompt was safe enough to ship.

If those assets are still only "in the platform," migration planning should start now. October 31, 2026 matters because the Evals platform becomes read-only for existing users on that date, according to the current OpenAI docs. November 30, 2026 matters because OpenAI's public AgentKit update says Agent Builder and Evals will no longer be available on the OpenAI platform from that point onward.

The practical goal is not to recreate every prototype. The goal is to classify every workflow before the deadline:

Keep as code with Agents SDK.
Recreate as a ChatGPT Workspace Agent.
Replace with a separate evaluation workflow.
Retire because no owner, usage, or business value remains.

That classification needs evidence. A team should not decide from a product name alone. A support escalation agent with API deployment, custom Python tools, approval gates, and trace review is a different migration problem from a revenue-ops weekly summary prompt that reads workspace docs and runs on a schedule.

What OpenAI Currently Recommends

OpenAI gives two clear migration directions for Agent Builder workflows. The migration guide says users can export an existing Agent Builder workflow as Agents SDK code, then either continue with the Agents SDK in an application or use the export to help recreate a ChatGPT Workspace Agent.

The same guide draws an important boundary: Agents SDK is best for building agents through code, while ChatGPT Workspace Agents are best for building agents through natural language and sharing them with teams. It also says the export process does not convert the workflow graph or guarantee that every behavior transfers unchanged.

That last sentence should shape the whole migration plan. Export is a starting artifact, not acceptance evidence.

OpenAI's Agents SDK overview frames the SDK as the place to own orchestration, tool execution, approvals, and state. The Agents SDK product update adds that the updated SDK supports controlled workspaces, file inspection, command execution, sandbox execution, configurable memory, MCP, skills, custom instructions, shell tools, apply-patch-style edits, and sandbox providers. That makes it a credible path for product workflows that need runtime control, deployment ownership, and engineering review.

Workspace Agents are different. OpenAI's ChatGPT Enterprise and Edu release notes say Workspace Agents are generally available in ChatGPT Business, Enterprise, and Edu, with agent safeguards and admin visibility. The same release notes also show that availability and controls can depend on workspace settings, plan details, and admin enablement. OpenAI's Business and Enterprise/Edu rate card says credit-based pricing for Workspace Agents is anticipated to begin on July 6, 2026.

Do not turn those source facts into a guarantee for a specific customer. A migration plan still has to verify the actual workspace, admin settings, app connections, permissions, schedule support, and Slack or other integration behavior.

Evidence From Effloow Lab

Effloow Lab ran a bounded OpenAI API check on June 11, 2026 using five synthetic migration cases. The prompt asked the model to classify each case into a recommended path, list migration risks, identify evidence to collect, and name claims that should not be made.

The saved artifact records model gpt-5.5-2026-04-23, response status completed, 411 input tokens, 1,588 output tokens, and 1,999 total tokens. The first run hit the configured output cap, so the lab was rerun with a larger cap and the completed artifact was saved.

The output was useful because it reinforced a migration rule that is easy to miss: classify by workflow shape, not by product label.

For a support escalation workflow with custom Python tools, approval gates, trace review, and API-backend deployment, the lab recommended an Agents SDK or code migration path. The risk list included custom tool reimplementation, approval behavior, trace review parity, and backend deployment contracts.

For a prompt-driven weekly account-summary helper that reads connected workspace docs, the lab classified Workspace Agents as a candidate, but only if scheduling, Slack posting, connector permissions, and workspace access are verified.

For a research team depending on Evals datasets and graders, the lab recommended exporting or archiving the evaluation assets and rebuilding regression checks outside Evals before shutdown. It explicitly warned against claiming that historical results, graders, or datasets transfer automatically.

For an ownerless legal ops prototype, the lab recommended triage first. No migration path can be selected until there is an owner, an export inventory, data-source knowledge, and a workspace-access decision.

This is not a benchmark, not a user study, and not an official OpenAI migration guarantee. Its value is narrower: it gives article readers a compact decision pattern they can adapt to their own inventory.

Migration Decision Matrix

Use this matrix before deciding whether to move a workflow into Agents SDK, Workspace Agents, a replacement eval workflow, or retirement.

Workflow Shape	Likely Path	Evidence Needed Before Claiming Success
Custom tools, API backend, approval gates, trace review	Agents SDK	Exported code, tool parity tests, approval tests, trace/review mapping, deployment proof
Natural-language team workflow over connected docs	Workspace Agents candidate	Workspace access, admin enablement, connector permissions, schedule and sharing behavior
Evals datasets, graders, prompt regression gates	Replacement eval process	Dataset export, grader definitions, baseline results, release thresholds, owner
Visual prototype with no owner or usage evidence	Retire or triage	Owner, users, business value, data map, risk classification
Customer-facing migration service	Inventory-first checklist	Decision tree, artifact template, risk matrix, explicit unknowns

The most important row is the last one. If you are a developer-tool vendor or consultant, the deliverable buyers need is often not "we migrated the agent." It is an evidence packet that proves which path is appropriate and what remains unverified.

Build The Inventory Before Exporting

Start with a workflow inventory, not code. Exported Agents SDK code can help, but it does not tell you why a workflow existed, who approved it, which behavior was acceptable, or which risks were manually managed.

For every Agent Builder workflow, capture:

Workflow name, owner, business function, and current user group.
Whether the workflow is production, pilot, demo, or abandoned.
Trigger pattern: manual, scheduled, chat-driven, event-driven, API-driven, or unknown.
Tool list, permissions, scopes, secrets, data sources, and write actions.
Human approval gates and what they block.
Trace review process and who reviews it.
Expected outputs and known failure modes.
Security constraints: private data, regulated data, external tools, audit retention.
Current replacement decision: Agents SDK, Workspace Agent, eval replacement, retire, or unknown.

Then export. The migration guide says to open the workflow in Agent Builder, select Code, choose Agents SDK in the code dialog, select TypeScript or Python, and copy the complete export. That export should go into a version-controlled migration branch or review packet.

Do not skip behavior tests. OpenAI's migration guide says to review control flow, triggers, tools, and permissions as you test the migrated agent. It also says connected apps, authentication, publishing, and permission configuration require separate review in ChatGPT, while Agents SDK implementations require validation of runtime configuration, tools, authentication, permissions, and deployment.

That means your migration acceptance criteria should be written before the replacement is built:

Same task accepted or rejected as before.
Same high-risk actions require human approval.
Same private data remains protected.
Same or stricter tool permissions.
Same or better logging for release review.
Clear behavior when a connector, tool, or auth token fails.

Replace Evals With A Release Gate, Not A Spreadsheet

The Evals winddown is more than a dashboard move. Evals often become the memory of a team's quality bar. If you only copy rows into a spreadsheet, you lose grader logic, run history, prompt versions, and the release decision process.

OpenAI's Evals docs describe evaluations as tests for model outputs, with task descriptions, test inputs, testing criteria, graders, and result analysis. The same docs now say the platform is being deprecated. Current guidance also points newer experimentation toward Datasets, while advanced use cases still reference Evals during the transition.

For migration, capture these assets:

Dataset rows and source notes.
Prompt versions and model settings.
Grader definitions and thresholds.
Human annotations and reviewer notes.
Historical baseline results.
Release gates that depended on those results.
Any external model configuration or custom endpoint setup.
Known edge cases discovered from production logs.

Then rebuild the gate in a place your team can own after November 30, 2026. That could be a test suite around Responses API calls, an internal eval runner, a CI job, a vendor eval platform, or a lightweight review process that combines model output, automated checks, and human judgment.

For many teams, the safest first replacement is simple:

Store representative test cases as JSONL.
Run the candidate prompt or agent against those cases.
Score deterministic checks with code.
Send judgment-heavy cases to human review or an LLM grader that is clearly labeled as a grader.
Fail the release if required cases regress.
Archive outputs and reviewer notes with the release.

That is not as polished as a platform UI, but it keeps the release gate alive and auditable.

Buyer Checklist For A Credible Migration

If you are evaluating a vendor, agency, or internal platform team, ask for artifacts instead of promises.

The minimum packet should include:

A complete workflow inventory with owner and retirement decision.
The Agent Builder export, if access exists.
A path decision for each workflow: Agents SDK, Workspace Agent, eval replacement, retire, or hold.
Tool and permission map.
Approval and human-review map.
Evals asset inventory with datasets, graders, thresholds, and baseline results.
A replacement test plan.
A list of [DATA NOT AVAILABLE] facts that remain unverified.

The strongest packet also includes a migration rehearsal:

One representative workflow exported and rebuilt.
One eval set archived and rerun in a replacement harness.
One high-risk tool call tested for approval behavior.
One workspace-agent candidate checked against the actual workspace settings.
One rollback or retirement decision documented.

For Effloow-style technical content, this is the buyer-facing proof surface. A credible article, tool package, or migration service should be able to show the evidence chain: sources checked, lab notes saved, synthetic or sandbox data labeled, unsupported claims removed, and unknowns preserved.

Evidence Grade

This guide is an OpenAI API-backed source guide. Official OpenAI sources were checked, and Effloow Lab ran a small synthetic migration triage through the OpenAI API. It is not hands-on proof of Agent Builder export, Workspace Agent creation, or production Evals migration.

Common Mistakes

The first mistake is assuming Agent Builder export means migration is done. OpenAI's own migration guide says the export does not guarantee unchanged behavior. Treat it like generated starter code.

The second mistake is moving every workflow to code. If the task is a natural-language team workflow over connected workspace apps, Workspace Agents may be a better fit, assuming the team's ChatGPT plan, admin controls, app connections, and permissions support it.

The third mistake is moving every workflow to Workspace Agents. If the workflow needs custom deployment, backend APIs, deterministic approvals, strict trace review, or product integration, Agents SDK is usually the more serious candidate.

The fourth mistake is waiting until Evals is read-only. Once the platform is read-only, teams may still be able to view existing content, but the window for improving, editing, or reconstructing the evaluation process becomes tighter.

The fifth mistake is fabricating parity. Do not claim automatic migration, universal feature parity, fixed migration effort, workspace availability, or continued Evals access unless the current source and the customer's evidence prove it.

FAQ

Q: What is the Agent Builder and Evals winddown date?

OpenAI's AgentKit announcement says Agent Builder and Evals will no longer be available on the OpenAI platform from November 30, 2026 onward. The Evals docs also say Evals becomes read-only for existing users on October 31, 2026.

Q: Should I migrate Agent Builder workflows to Agents SDK or Workspace Agents?

Use Agents SDK when the workflow should live as code, own tool execution, manage state, run in an application, or integrate with a backend. Consider Workspace Agents when the workflow is better expressed through natural language, shared in ChatGPT, and governed through workspace admin controls.

Q: Can I trust the Agent Builder export as production code?

No. Treat it as a starting point. The OpenAI migration guide says the process does not convert the workflow graph or guarantee unchanged behavior. You still need parity tests, tool checks, authentication review, and deployment validation.

Q: What should happen to Evals datasets and graders?

Inventory and archive them before the read-only date. Capture datasets, grader definitions, prompt versions, historical results, thresholds, and release criteria. Then rebuild the release gate in a process your team controls after shutdown.

Q: Did Effloow migrate a real Agent Builder workflow for this article?

No. Effloow Lab ran a bounded OpenAI API check using synthetic migration cases and verified official OpenAI sources. Real workspace access, Agent Builder export behavior, Workspace Agent creation, and production Evals migration are [DATA NOT AVAILABLE] for this run.

Key Takeaways

Agent Builder and Evals migration should start with inventory, not code. Classify each workflow by ownership, data access, tool risk, approval needs, deployment target, and evaluation dependency.

Agents SDK is the likely path for code-owned product workflows. Workspace Agents may fit natural-language team workflows if workspace access, admin settings, connectors, and scheduling behavior are verified. Evals users need a replacement release gate before the read-only window.

The safest migration message is disciplined: export what you can, preserve what matters, test behavior explicitly, and label every unknown instead of turning a deadline into unsupported product claims.

DEV Community