I Got a Surprise API Bill. So I Built a Runtime That Enforces Agent Budgets.

#agents #ai #aiops #opensource

I was running an AI agent nothing fancy, just a research task. Left it running overnight.
Woke up to a bill I didn't expect.
The agent hadn't done anything malicious. It just... kept going. Looping, retrying, calling the model over and over, because nobody told it to stop. No token cap. No cost limit. No guardrails whatsoever.
That's the thing nobody talks about with AI agents: they're eager. Give them a task and they'll spend whatever it takes to finish it — or to try to finish it. And if you're not watching, you find out the hard way.
That's why I built Joule.

The Core Problem
Most agent frameworks — LangChain, CrewAI, AutoGen are great at building agents. Defining tools, chaining steps, routing between models. They solve the "how do I make the agent do things" problem well.
But none of them solve the "how do I stop the agent from doing too much" problem.
There's no built-in budget. No hard stop. No governance. You're basically handing your agent a credit card with no limit and hoping for the best.

What Joule Does Differently
Joule is an agent runtime where every task runs inside a budget envelope. You set limits before execution. The agent operates within them. When a limit is hit, it stops — cleanly, with a structured result.
Here's the simplest usage:
typescriptimport { Joule } from '@joule/core';

const result = await joule.execute({
description: "Analyze our Q4 metrics and draft a summary",
budget: 'medium',
});

console.log(result.result);
console.log(Cost: $${result.budgetUsed.costUsd} | Tokens: ${result.budgetUsed.tokensUsed});
The agent runs. Hits a step where it would exceed $0.50. Stops. Returns what it has. You get a budgetUsed breakdown with every execution — token count, dollar cost, latency, tool calls.

Budget Enforcement Across 7 Dimensions
One thing I wanted to get right was what gets budgeted. Token and cost limits are obvious. But agents can go wrong in other ways too.
Joule tracks and enforces limits across:
DimensionWhat it limitsTokensTotal LLM tokens consumedCost (USD)Dollar spend on API callsLatencyWall-clock timeTool callsNumber of tool invocationsEscalationsUpgrades from cheap → expensive modelsEnergy (Wh)Estimated compute energyCarbon (gCO₂)Estimated emissions
The energy and carbon tracking came from my other research in LLM inference optimization — it felt wrong to build a "cost-aware" runtime that ignored the environmental cost. So those dimensions are first-class.

Smart Routing: Start Cheap, Escalate Only When Needed
One of the key things that makes Joule efficient is the model router. By default, it picks the smallest model that can handle the task — local Ollama, gpt-4o-mini, Haiku — and only escalates to a larger model if the task genuinely needs it and the budget allows it.
For simple tasks, the planning prompt drops from ~2900 tokens to ~50 by stripping tool descriptions entirely. It's not a trick — it's just: don't send information the model doesn't need.
The result: in benchmarks against CrewAI across 30 tasks, Joule was 1.5x faster on average, with the biggest gap on generation tasks (1.8x). And that's before counting the budget enforcement and governance that CrewAI doesn't do at all.

Governance: What the Agent Is Allowed to Do
Budget is about how much. Governance is about what.
Joule has a constitutional layer — three tiers of rules:

Hard boundaries — never violated, no override. ("Never expose PII.")
Soft boundaries — can be overridden with authority + audit trail. ("Prefer local models.")
Aspirational principles — guide behavior, don't block execution. ("Minimize token usage.")

You configure it in YAML:
yamlgovernance:
constitution: default
requireApproval:
- shell_exec # human-in-the-loop before running shell commands
- file_delete # prevent accidental data loss
budget:
maxCostUsd: 1.00 # hard stop
And there's a trust scoring system — agents earn autonomy through clean behavior. New agents get watched closely. Clean track record → more tools unlocked, less oversight. A violation → demotion, increased scrutiny, or quarantine.
It sounds elaborate, but it basically answers the question: if I'm running 10 agents in parallel, how do I know they're not doing something I didn't ask for?

Current Status
The runtime is in active development. 1140 tests passing across 91 files. The core — task execution, budget enforcement, model routing, governance, multi-agent crews — is solid. The observability layer (React dashboard, Prometheus metrics, OTLP/Langfuse export) is working.
Known rough edges: the computer/desktop automation agent is good for Office tasks but struggles with complex browser workflows. The governance layer is implemented but still maturing.

Why This Matters Now
Agent frameworks are everywhere. The "build an agent in 10 lines" demo is easy. Shipping an agent to production — one that doesn't blow your budget, doesn't do things you didn't authorize, and that you can debug after the fact is still genuinely hard.
That's the gap Joule is trying to close.
If you've ever woken up to a surprise API bill, you already understand the problem. The solution shouldn't be "just remember to add a token limit." It should be baked into the runtime.

GitHub: github.com/Aagam-Bothara/Joule
Feedback, issues, and contributions very welcome. Still early — but the foundation is there.

Top comments (1)

Sloan the DEV Moderator • Mar 24

Hey friend, nice post! 👋

You might want to double-check your formatting in this post, it looks like some things didn't come out as you intended. Here's a formatting guide in case you need some help troubleshooting. Best of luck and thanks again for sharing this post!