I built a runtime that stops AI agents from over-engineering everything

#ai #claude #llm #opensource

I gave Claude a budget. It changed what it built.

Over the past few weeks I've been exploring whether LLMs can work with execution budgets and whether that constraint changes how they behave.

The short answer: it does.

The Problem

Researchers studying frontier models found they are consistently over-optimistic about budget. Instead of stopping and alerting the user early, they keep spending tokens on work that's unlikely to succeed. Even after fine-tuning specifically targeting budget awareness, calibration caps at 47%.

You cannot train your way out of this. External enforcement is required.

What I Tried

I started giving Claude implementation tasks with a fixed token budget.

The behavior changed.

Instead of building everything it could think of, the model focused on completing the requested specification before asking for more budget.

Unconstrained Claude on a REST API task added:

JWT authentication (not requested)
Health check endpoint (not requested)
Full test suite (not requested)
nodemon dev dependency (not requested)
URL validation middleware (not requested)

Token Sensei at budget 400 delivered:

The API
Nothing else

The Numbers

Three tasks. Two conditions each. Same model (claude-sonnet-4-5).

| Task | Unconstrained | Token Sensei | Savings |
|---|---|---|---|
| Notes REST API | 3,584 tokens | 2,578 tokens | 28% |
| Bookmark REST API | 4,000+ (truncated) | 1,604 tokens | 60%+ |
| Python CLI | 2,466 tokens | 993 tokens | 60% |

The Bookmark Manager task is the most striking. The unconstrained model was still generating when it hit the response ceiling. Token Sensei completed the same task in 1,604 tokens and stopped.

How Token Sensei Works

Set a task and budget
The runtime streams output and tracks exact token usage
When the budget is exhausted, execution pauses
A checkpoint shows what was completed and what remains
You decide — approve a loan to continue, or ship what exists

The key insight: the budget does not just control how much the model produces. It changes what the model optimizes for. Under constraint, it prioritizes completing the requested work. Reduced token usage is an outcome of that shift, not the objective.

The Protocol

Token Sensei is both a runtime and a protocol. The LOAN_REQUEST format tells the model how to surface checkpoints:

LOAN_REQUEST
Completed: ✓ [working deliverables produced]
Remaining: • [specific remaining items]
Requested Budget: [minimum needed]
This budget will complete: ✓ [specific deliverables]

The runtime enforces budget externally using exact provider-reported token counts — not estimates.

Try It

git clone https://github.com/shouvik12/token-sensei
cd token-sensei
npm install
node server.js

Open localhost:3000. Set a task, set a budget, run.

MIT licensed. Would love feedback from the dev.to community — especially whether the checkpoint UX feels right and whether the protocol is something you'd use outside the runtime.

Every AI task should have a budget.

DEV Community

I built a runtime that stops AI agents from over-engineering everything

Top comments (0)