Context window overflow occurs when an AI agent's tool outputs exceed the token limit the large language model (LLM) can process at once. The agent doesn't crash; it silently truncates data, loses earlier context, or produces incomplete results. This post shows how the Memory Pointer Pattern fixes it: from single-agent to multi-agent coordination where 145KB of data never enters any LLM context.
This demo uses Strands Agents. The Memory Pointer Pattern is framework-agnostic and can be applied with LangGraph, AutoGen, or other agent frameworks that support tool context.
Working code: github.com/aws-samples/sample-why-agents-fail
Series: Why AI Agents Fail
- Context Window Overflow (this post) — Memory Pointer Pattern for large data
- MCP Tools That Never Respond — Async pattern for slow external APIs
- AI Agent Reasoning Loops — Detect and block repeated tool calls
The Problem: Agents Can't Handle Large Tool Outputs
When an AI agent calls a tool that returns large data (server logs, database results, file contents), the response can overflow the LLM's context window. The agent doesn't crash with a clear error. It silently degrades: truncating data, losing context, or failing to complete the task.
Research from IBM (Solving Context Window Overflow in AI Agents, 2025) quantifies this:
- In Materials Science workflows, tool outputs can reach 2M+ elements
- Traditional approach consumed 20,822,181 tokens and failed
- The same workflow with memory pointers used 1,234 tokens and succeeded
- That's a reduction of over 16,000x in this workflow
Community observation (Context Window Limits Explained, Airbyte 2025) confirms teams discover these limits "the hard way" through silent errors. The agent appears to work but produces incomplete or wrong results.
The concept of passing references instead of raw data has also been validated in multi-agent settings. Research from Amazon (Towards Effective GenAI Multi-Agent Collaboration, 2024) introduces "payload referencing," where agents exchange pointers to shared data instead of embedding large payloads in messages. This improved performance on code-intensive tasks by 23% and achieved 90% end-to-end goal success rates in enterprise benchmarks. This is exactly what we implement below with Strands Swarm.
Why This Happens
When the tool output is small (a few KB), this works fine. But when a tool returns 200KB of server logs:
- The full output gets injected into the conversation
- The LLM's context window fills up
- Older context (including the original question) gets pushed out
- The LLM can't reason about the data because it can't see it all
- The agent either fails or produces incomplete answers
Solution 1: Single Agent with Strands ToolContext
The first approach uses agent.state, a native key-value store scoped to each agent instance. Tools write large data there via ToolContext and return a short pointer string to the context:
from strands import Agent, tool, ToolContext
# context=True injects ToolContext as the last parameter — required to access agent.state
@tool(context=True)
def fetch_application_logs(app_name: str, tool_context: ToolContext, hours: int = 24) -> str:
"""Fetch application logs. Returns a memory pointer for large datasets."""
logs = generate_logs(app_name, hours) # Could be 200KB+
if len(str(logs)) > 20_000: # Threshold: store externally above 20KB
pointer = f"logs-{app_name}"
# Store the full payload in agent.state — it never enters the LLM context
tool_context.agent.state.set(pointer, logs)
# Return only the pointer key (52 bytes) — this is all the LLM sees
return f"Data stored as pointer '{pointer}'. Use analyze tools to query it."
return str(logs) # Small enough to return directly
@tool(context=True)
def analyze_error_patterns(data_pointer: str, tool_context: ToolContext) -> str:
"""Analyze errors — resolves pointer from agent.state."""
# Retrieve the full dataset from agent.state using the pointer key
data = tool_context.agent.state.get(data_pointer)
errors = [e for e in data if e["level"] == "ERROR"]
# Return a summary (not raw data) — keeps the response small
return f"Found {len(errors)} errors across {len(set(e['service'] for e in errors))} services"
The LLM never sees the 200KB. It only sees "Data stored as pointer 'logs-payment-service'" (52 bytes). The next tool reads the full data from agent.state and returns a summary. Strands handles this natively, with no global dicts, no hashlib, no external infrastructure.
Single Agent Results
| Metric | Without Pointers | With Memory Pointers |
|---|---|---|
| Data in context | 214KB (full logs) | 52 bytes (pointer) |
| Agent behavior | Truncates/fails | Processes all data |
| Errors detected | Partial | Complete (all services) |
Solution 2: Multi-Agent with Strands Swarm
A single agent works for linear pipelines. But real-world incident response involves specialized roles: someone fetches data, someone analyzes it, someone writes the report. Strands Swarm coordinates multiple agents autonomously: define agents with different tools, and the Swarm handles handoffs.
This is the same "payload referencing" pattern from the Amazon multi-agent collaboration paper. Agents exchange pointers to shared data instead of passing raw payloads. The difference is that Strands Swarm handles the coordination automatically, and provides invocation_state as the official API for sharing data across agents.
from strands import Agent, tool, ToolContext
from strands.multiagent import Swarm
# invocation_state is a dict shared across all agents in the Swarm — the cross-agent store
@tool(context=True)
def fetch_application_logs(app_name: str, tool_context: ToolContext, hours: int = 6) -> str:
logs = generate_logs(hours) # 145KB+
pointer = f"logs-{app_name}"
# Store in invocation_state so all downstream agents can access it without re-fetching
tool_context.invocation_state[pointer] = logs
# Only the pointer string travels through the LLM context to the next agent
return f"Stored as '{pointer}'. Hand off to analyzer."
@tool(context=True)
def analyze_error_patterns(logs_pointer: str, tool_context: ToolContext) -> str:
# Resolve the pointer to the full dataset — no LLM context consumed
logs = tool_context.invocation_state.get(logs_pointer)
errors = [l for l in logs if l["level"] == "ERROR"]
result = {"total_errors": len(errors)} # additional fields omitted for brevity
# Store analysis results as another pointer for the reporter agent
tool_context.invocation_state["error_analysis"] = result
return json.dumps(result)
# Each agent has a focused role; the Swarm decides the handoff order autonomously
collector = Agent(name="collector", tools=[fetch_application_logs], model=MODEL)
analyzer = Agent(name="analyzer", tools=[analyze_error_patterns, detect_latency_anomalies], model=MODEL)
reporter = Agent(name="reporter", tools=[generate_incident_report], model=MODEL)
swarm = Swarm([collector, analyzer, reporter], entry_point=collector)
result = swarm("Fetch logs, analyze, and generate incident report.")
The Swarm automatically:
- Starts with the collector, which fetches 145KB of logs and stores them in
invocation_state - The collector hands off to the analyzer with the pointer
"logs-payment-service" - The analyzer runs error and latency analysis, stores results in
invocation_state, hands off to the reporter - The reporter generates the final incident report
No orchestration code or manual handoff logic is needed. Each agent has its own tools and the Swarm figures out the flow from the agent descriptions and the task. All data sharing happens via tool_context.invocation_state, the same ToolContext API used in single-agent, with a different store.
Swarm Results
Status: COMPLETED
Agents: collector → analyzer → reporter
Time: ~14s
Shared store:
logs-payment-service: 145,310 bytes
error_analysis: 135 bytes
latency_analysis: 70 bytes
145KB of logs processed by three agents. None of it ever entered any LLM context window.
Follow-up Investigation
After the swarm completes, the data stays in the shared store. A separate investigator agent can drill into specific services without re-fetching:
# The investigator reuses invocation_state populated by the swarm — no data re-fetch needed
investigator = Agent(
name="investigator",
tools=[get_error_details, analyze_error_patterns],
model=MODEL,
)
# Each question resolves the pointer from invocation_state and runs analysis in-memory
investigator("Which service had the most errors?")
investigator("Show me the error logs for cache-layer")
investigator("What status codes are those errors returning?")
# All queries read from the same 145KB already in invocation_state — no re-fetch, no context overflow
When to Use Each Approach
Single agent + agent.state — linear pipelines where one agent handles fetch + analyze + report. Use ToolContext to access tool_context.agent.state from tools.
Swarm + invocation_state — specialized roles, complex workflows, or when you want autonomous coordination. Use ToolContext to access tool_context.invocation_state — the official Strands API for multi-agent data sharing. The Swarm handles handoffs, timeouts, and repetitive handoff detection.
Both — use SlidingWindowConversationManager as additional protection. It automatically trims conversation history and handles ContextWindowOverflowException with retry.
These approaches are part of context engineering for AI agents: the practice of deciding what information enters the LLM's context window and when.
Try It Yourself
You need Python 3.9+, uv, and an OpenAI API key.
git clone https://github.com/aws-samples/sample-why-agents-fail
cd sample-why-agents-fail/stop-ai-agents-wasting-tokens/01-context-overflow-demo
uv venv && uv pip install -r requirements.txt
export OPENAI_API_KEY="your-key-here"
uv run python test_context_overflow.py # Single-agent: 4 scenarios
uv run python swarm_demo.py # Multi-agent: Collector → Analyzer → Reporter
Or open test_context_overflow.ipynb in Kiro, VS Code, or your preferred notebook environment.
Key Takeaways
- Context overflow is silent — agents don't crash, they produce wrong results
- Memory pointers solve it — store large data externally, pass references
- >16,000x token reduction — validated by IBM Research on the Materials Science benchmark
-
Single-agent uses
agent.state—@tool(context=True)+ToolContextto store and retrieve data outside context -
Multi-agent uses
invocation_state— sameToolContextAPI, shared across all agents in the Swarm. No orchestration code needed - Data persists for follow-up — after the pipeline completes, stored data is available for investigation without re-fetching
Frequently Asked Questions
Why do AI agents run out of context?
AI agents run out of context when tool responses are injected directly into the LLM conversation history. Each response consumes tokens. When cumulative tool outputs exceed the model's context window limit, the LLM loses earlier context, truncates data, or fails entirely. This happens silently: the agent appears to work but produces incomplete or wrong results.
What is the Memory Pointer Pattern for AI agents?
The Memory Pointer Pattern stores large tool outputs (logs, datasets, query results) in external state instead of the LLM context window. Tools return a short reference key (the "pointer") that subsequent tools use to retrieve the full data. IBM Research validated this pattern with a reduction of over 16,000x on the Materials Science benchmark.
How does agent.state differ from invocation_state in Strands Agents?
agent.state is scoped to a single agent instance. Use it for linear pipelines where one agent handles all steps. invocation_state is shared across all agents in a Strands Swarm. Use it when multiple specialized agents need to exchange data without passing large payloads through the LLM context.
Can I use the Memory Pointer Pattern with LangGraph or other frameworks?
Yes. The pattern requires two capabilities: a shared key-value store accessible from tools, and the ability to pass short reference strings through the LLM context. LangGraph provides this through its state management, AutoGen through shared memory, and CrewAI through task context. The Strands implementation uses ToolContext as the native API.
References
Research
- Solving Context Window Overflow in AI Agents — IBM Research, Nov 2025
- Towards Effective GenAI Multi-Agent Collaboration — Amazon, Dec 2024
- Context Window Limits Explained — Airbyte blog (community observation), Dec 2025
- Efficient On-Device Agents via Adaptive Context Management — Nov 2025
Implementation
- Strands Agent State — ToolContext and agent.state
- Strands Swarm — Multi-agent orchestration
- Strands Conversation Management — Sliding window and context overflow
Have you hit context window limits in your agents? What strategies worked for you? Share in the comments.
Next in this series: MCP Tools That Never Respond — async patterns for slow external APIs.
All code in this series is open source under the MIT-0 License. Star the repository to follow updates.
Gracias!
🇻🇪Dev.to - Linkedin - GitHub - Twitter - Instagram - Youtube



Top comments (0)