Robert Imbeault

Posted on Mar 31

Your Context Window Is Chaos. We Fixed It.

#ai #api #programming #webdev

If you’re routing across multiple LLMs, you probably already know this feeling:

One model happily accepts your massive conversation.
The next model chokes, truncates half the important bits, and hallucinates the rest.

Same app. Same user. Different context window. Chaos.

Backboard.io now includes Adaptive Context Management, a system that automatically manages conversation state when your app moves between models with different context sizes.

ps. if you have keys from any of the frontiers or OpenRouter you can use this for free!

You still get access to 17,000+ LLMs on the platform.

You just don’t have to personally babysit their context windows anymore.

And yes, it’s included for free.

The Problem: Context Windows Are Inconsistent (and Annoying)
In a multi‑model setup, this is what actually happens:

You start on a large‑context model. Everything fits:

system prompt
conversation history
tool calls + tool responses
RAG chunks
web search results
random runtime metadata you forgot you added
Your router decides to send the next request to a smaller‑context model.

Suddenly your carefully curated “state” is too big to fit. Something has to go.

Most platforms respond with:

“Cool, just write truncation and summarization logic that:

prioritizes what matters,
handles overflow nicely,
doesn’t break when you add a new tool,
and works for every model you might ever route to.”
So we all end up writing the same brittle code:

if tokens > limit: drop_old_messages() maybe_summarize() hope_nothing_important_was_there()
In a multi‑model system, that logic gets complicated and fragile fast.

What We Shipped: Adaptive Context Management

Backboard now automatically handles context transitions when models change.

There’s no extra endpoint and no new config. It runs inside the Backboard runtime whenever a request is routed to a model.

When that happens, Backboard:

Looks up the model’s context window.
Dynamically budgets it:
20% reserved for raw state
80% freed via summarization
Within that 20% “raw state” budget, we prioritize:

system prompt
recent messages
tool calls
RAG results
web search context
Whatever fits in that 20% goes through unchanged.

Everything else is handled by intelligent summarization.

You don’t write the logic. You just route between models.

How Intelligent Summarization Works
When we need to compress, we follow a simple rule:

First try the model you’re switching to.

“Hey smaller model, summarize this so you can still understand what’s going on.”
If the summary still doesn’t fit:

We fall back to the larger model that was previously in use to generate a more efficient summary.
This preserves the important parts of the conversation while ensuring the final state always fits within the new model’s context window.

All of this happens automatically during the request and tool calls.

No manual orchestration. No custom jobs. No extra service.

You Should Rarely Hit 100% Context Again
Because Adaptive Context Management runs continuously:

It reshapes and compresses state before you slam into the limit.
It keeps a buffer in the context window instead of riding at 99.9% and hoping for the best.
Mid‑conversation model switches stop being a coin flip on whether something vital gets chopped.
Your job: define the routing logic and features.

Our job: make sure the context window doesn’t quietly wreck them.

You Still Get Visibility: context_usage in msg
This is not a black box.

We expose context usage directly in the msg endpoint so you can see what’s happening in real time.

Example response:

"context_usage": { "used_tokens": 1302, "context_limit": 8191, "percent": 19.9, "summary_tokens": 0, "model": "gpt-4" }

You can track:

how much context is currently used
how close you are to the limit
how many tokens are from summarization
which model is currently managing the context
If you like graphs and dashboards, this gives you the raw data without forcing you to build your own context tracking system from scratch.

The Bigger Idea: Treat Models Like Infrastructure
Backboard’s thesis is simple:

You should be able to treat models as interchangeable infrastructure.

Your state should just move with the user.

That only works if state can move safely between:

cheap and expensive models
long‑context and short‑context models
different providers and pricing tiers
Adaptive Context Management is the safety layer that makes that viable:

You route across thousands of models.
Backboard keeps the conversation state aligned with each model’s constraints.
You don’t write ad‑hoc truncation and summarization logic per model.
You focus on product behavior.

We handle the context window drama.

Adaptive Context Management is free and live today in the Backboard API.

No feature flag. No extra pricing line.

You can start building with it now at:

👉 https://docs.backboard.io

If you’re already routing across multiple models and have horror stories about context windows, I’d love to hear them.