Protos Galaxias

Posted on Feb 24

How Upgrading to AI SDK v6 Cut My Agent's Token Usage by 57%

#ai #opensource #webdev #productivity

I built Browse Bot, an AI browser extension that acts like an agent - it can click buttons, fill forms, and navigate pages based on natural language commands. During beta-testing we have encountered some problems: it would hang mid-task, burn through tokens, and sometimes just stop without explanation.

Then AI SDK v6 came out, bringing changes that allowed Browse Bot to make a significant leap in effiency.

Let me explain.

The Problem: Death by a Thousand LLM Calls

Our internal test suite runs real tasks through the agent: "add items from favorites to cart," "fill out this form," "log in to the site." Each test launches Firefox with the extension installed, gives the agent a text prompt, and checks if it succeeded (Did the DOM update? Did localStorage change? Did it call the right tool?).

Before the upgrade, the pass rate was unstable: some tasks worked, but others hung. The agent would click the wrong element, or worse, click nothing at all while the LLM kept thinking.

When we dug into the logs, we have discovered:

Hidden LLM calls everywhere. Every time the agent needed to click something, it would parse the page, search for the element via another LLM call, then click. That's 3-4 extra calls per action.
Context growing linearly. Each step added more tokens. By step 5, the context was bloated with redundant information from steps 1-4.
No verification loop. The agent would click, assume success, and move on. If the click failed (dynamic page, wrong element, etc.), it would just continue with broken state.

The Upgrade: Three Changes, One Library

AI SDK v6 introduced hooks and better control over the agent loop. That's what made these fixes possible.

1. References Instead of Element Search

We started assigning unique references to each interactive element during page parsing: @1, @2, @3, etc. When the agent wants to click something, it just says "click @5" — no additional LLM call to search for the element.

This alone removed 3-4 hidden calls per task.

2. Context Compression Between Steps

AI SDK v6 has a prepareStep hook. We used it to replace old step results with single-line summaries. Instead of carrying the full HTML snapshot from step 1 into step 6, the context now just says: "Step 1: Clicked login button. Result: Login form appeared."

Context now stays manageable even for multi-step tasks.

P.S. To be honest, further testing showed that we have compressed context too much so that the Bot was forgetting the beginning of the conversation. This was fixed since.

3. Software Guardrail After Actions

After every click or form fill, we inject a verification instruction into the prompt: "Read the page. Confirm the result matches expectations. If not, stop and notify the user."

It's just a prompt addition, but it works. The agent now catches its own mistakes instead of silently continuing with broken state.

The Results

We ran the same test suite before and after. Same tasks, same pages, same Firefox setup.

Metrics for the same task:

Tokens: 65k → 28k (-57%)
LLM calls: ~9 → 6 (-33%)
Task completion: unstable → stable
Hidden costs: eliminated

The pass rate went from "sometimes works" to "consistently works."

Trade-offs

These changes aren't free:

References can break on dynamic pages. If the page re-renders between parsing and action, @5 might point to the wrong element. We handle this by re-parsing when the DOM changes, but it adds complexity.
Compression loses detail. Summarizing old steps means the agent can't revisit them in full. For most tasks, this is fine. For debugging, it's annoying.
Guardrails add latency. Every verification step is another LLM call. We decided the reliability gain was worth it, but it's a trade-off.

If You're Building AI Agents

A few things that helped us:

Run real tasks. Our test suite uses actual websites (login forms, e-commerce carts, etc.). Synthetic benchmarks didn't catch the problems beta users found.
Measure end-to-end, not just API calls. Token count matters, but "task completes correctly" matters more.
Don't ignore library updates. AI SDK v6 wasn't on our roadmap, but it helped to debug so many problems, it's crazy.

Browse Bot is open source and available for Chrome and Firefox. It works with OpenRouter, OpenAI, xAI, Ollama, and LM Studio (so you can run it locally if you want). No account required.

You can view all the technical details, including the ones for this update, in my GitHub.

DEV Community