Context Window Blindness: Why Your AI Agent Doesn't Know It's Running Out of Space

#python #agents #llm #opensource

On Monday I showed how agents waste tokens by getting stuck in loops — repeating the same tool call dozens of times, burning money on nothing. Today — a quieter problem that costs just as much, and is far harder to spot.

Your AI agent has been going blind. It had no idea its context window was 90% full.

The Problem: Two Different Realities

When you run a long agent task, here's what you see: a status bar showing "Context: 87% used." It's right there in the TUI. You can see the agent is almost out of space.

But the model can't see the status bar. It has no idea.

From the model's perspective, every message it writes, every tool call it makes, every plan it sketches — all of that just continues normally. It has no signal that the conversation history is filling up. It keeps producing long responses, initiating multi-step plans, making tool calls that generate pages of output.

Then at 90%: auto-compression kicks in. The model's working memory gets force-compressed. It loses the thread of what it did 40 messages ago. It starts contradicting its earlier decisions.

This is context window blindness: the gap between what the user sees and what the model knows about its own situation.

The Fix: LimitWarnerCapability

In pydantic-deep v0.3.8 — the modular agent runtime for Python — we added LimitWarnerCapability. The solution: inject usage information directly into the conversation as a user message, at two thresholds.

At ~70% usage:

You are approaching the context limit. Begin wrapping up your current task. Avoid starting new complex subtasks.

At ~85% usage:

CRITICAL: Your context window is almost full. Use /compact NOW before continuing.

These are injected as user messages — not system prompt modifications. The model treats them as authoritative input.

from pydantic_deep import DeepAgent

agent = DeepAgent(
    model="claude-opus-4-5",
    context_manager=True,  # default — enables LimitWarnerCapability
)

Auto-enabled by default. No configuration needed.

BM25 History Search

We also rewrote search_conversation_history from naive substring to BM25:

# Before v0.3.8
results = [msg for msg in history if query in msg.content]

# After v0.3.8 — BM25 ranked, zero external deps
results = search_conversation_history(history, "explain the authentication flow")
# Rare terms rank higher. Multi-word queries tokenized properly.

Pure Python. Zero external dependencies. Standard Lucene BM25 formula.

EvictionCapability

Large tool outputs are intercepted via the after_tool_execute hook before they enter message history — not trimmed after. The difference matters on long tasks.

Key Takeaways

Models have no intrinsic awareness of context usage — that info lives in the orchestration layer
LimitWarnerCapability bridges that gap with runtime user message injection at 70%/85%
BM25 replaces naive substring search for conversation history
EvictionCapability prevents large outputs from entering history at all

Full write-up: oss.vstorm.co/blog/context-window-blindness-ai-agents-limit-warner

GitHub: github.com/vstorm-co/pydantic-deep

Have you hit this? How did the hallucinations manifest?