DEV Community

Joakim William Hauge
Joakim William Hauge

Posted on

How to Add Execution Budgets to OpenAI Agents SDK

One of the fastest ways for AI agents to become expensive in production is not model pricing.

It’s runaway execution.

A simple workflow starts retrying.
A tool loops recursively.
The agent keeps reasoning without converging.
Suddenly a single session burns 10–50x the expected cost.

Most teams discover this after deployment.

This article shows a very simple way to add:

  • execution budgets
  • runtime ceilings
  • step limits

to OpenAI Agents SDK workflows using TypeScript.

No complex infrastructure required.


The Problem

A basic agent loop often looks harmless:

```ts id="zjlwm4"
while (!taskComplete) {
const result = await agent.run();
}




But in production, autonomous systems can drift into:

* recursive retries
* repeated tool invocation
* escalating token usage
* unstable recovery behaviour

The workflow technically “works.”

Economically, it stopped making sense a long time ago.

---

# What We Want

We want simple runtime constraints like:



```ts id="8m4t2u"
{
  maxSteps: 15,
  maxRuntimeMs: 30000,
  maxEstimatedCostUsd: 1.50
}
Enter fullscreen mode Exit fullscreen mode

If the workflow exceeds those boundaries:

  • execution stops
  • the system fails safely
  • costs remain bounded

Think of it as:

```txt id="1jlwm4"
circuit breakers for autonomous execution




---

# Basic Setup

Install the OpenAI SDK:



```bash id="jlwm4"
npm install openai
Enter fullscreen mode Exit fullscreen mode

Then create a simple agent loop.


Step 1 — Track Runtime State

We’ll maintain a lightweight execution context:

```ts id="d8y2n4"
type ExecutionState = {
steps: number;
startedAt: number;
estimatedCostUsd: number;
};




Initialize it:



```ts id="r4m1kx"
const state: ExecutionState = {
  steps: 0,
  startedAt: Date.now(),
  estimatedCostUsd: 0
};
Enter fullscreen mode Exit fullscreen mode

Step 2 — Define Budget Constraints

Now define simple execution ceilings:

```ts id="v9x2qa"
const LIMITS = {
maxSteps: 15,
maxRuntimeMs: 30_000,
maxEstimatedCostUsd: 1.5
};




These numbers do not need to be perfect initially.

The important thing is:



```txt id="jlwm4"
execution becomes bounded
Enter fullscreen mode Exit fullscreen mode

Step 3 — Create a Budget Guard

Now we add a lightweight guard function:

```ts id="4d1mza"
function enforceBudget(state: ExecutionState) {
const runtimeMs = Date.now() - state.startedAt;

if (state.steps > LIMITS.maxSteps) {
throw new Error("Execution budget exceeded: max steps");
}

if (runtimeMs > LIMITS.maxRuntimeMs) {
throw new Error("Execution budget exceeded: runtime");
}

if (state.estimatedCostUsd > LIMITS.maxEstimatedCostUsd) {
throw new Error("Execution budget exceeded: cost");
}
}




This becomes your:

## runtime governance layer

---

# Step 4 — Wrap Agent Execution

Now wrap every execution cycle:



```ts id="z3cw1l"
while (true) {
  enforceBudget(state);

  const response = await agent.run({
    input: userPrompt
  });

  state.steps += 1;

  // Example rough estimation
  state.estimatedCostUsd += 0.08;

  if (response.done) {
    break;
  }
}
Enter fullscreen mode Exit fullscreen mode

That’s it.

Now your workflow has:

  • bounded runtime
  • bounded execution depth
  • bounded economic exposure

Why This Matters

Most AI systems behave normally most of the time.

The problem comes from tail events:

  • recursive loops
  • retry storms
  • unstable tool chains
  • runaway context growth

Those edge cases are where infrastructure cost explodes.

Execution budgets help constrain the blast radius before instability compounds.


Production Improvements

The simple example above is intentionally minimal.

In production you’d likely also add:

  • token-based estimation
  • tool-call budgets
  • recursion detection
  • timeout policies
  • per-user limits
  • adaptive thresholds
  • execution tracing

But even basic runtime ceilings dramatically improve operational safety.


The Distributed Systems Parallel

Distributed systems eventually evolved:

  • timeouts
  • circuit breakers
  • bounded failure domains
  • retry limits

because unconstrained execution became dangerous at scale.

Autonomous AI systems are starting to encounter similar operational realities.

As agents become:

  • more autonomous
  • more persistent
  • more deeply integrated

runtime governance becomes increasingly important.


Final Thoughts

A lot of teams focus exclusively on:

  • prompts
  • reasoning quality
  • tool orchestration

But long-term production systems also need:

  • bounded execution
  • operational constraints
  • economic predictability

Because eventually:
the challenge is not just building autonomous systems.

It is building governable autonomous systems.

Top comments (0)