DEV Community

Harshal Sant
Harshal Sant

Posted on

ContextLens — py-spy/pprof but for what's inside your LLM prompt

In multi-turn agent loops, the full context re-sends on every API call. A tool result added at turn 3 gets billed again at turns 4, 5, 6, 7... forever. Most of it is never read again.

Standard observability tools tell you the total token count. They never tell you what's in there or how much of it is waste.

That's what ContextLens fixes.


What it does

ContextLens is a diagnostic profiler for LLM agent context windows. It:

  • Decomposes the context window into regions: system prompt, tool schemas, tool results, retrieved chunks, user messages, assistant messages
  • Tracks which blocks get re-billed across turns using SHA-256 content hashing
  • Runs 5 waste detectors and ranks findings by dollar cost
  • Prints a concrete one-line fix for each finding
  • Renders an interactive D3 treemap report as a self-contained HTML file

No API key required. Works offline on saved traces.


The five detectors

Detector What it finds
Duplicate Same block re-sent verbatim across multiple turns
Near-Duplicate >85% Jaccard similarity between distinct blocks
Stale Tool Result Tool output never referenced by a later assistant message
Unused Tool Schema Tool defined every turn but never called
Redundant Retrieval Retrieved chunk with <15% overlap with model output

---Run the built-in demo (simulates a 30-turn agent loop, no API key needed):

python -c "import contextlens; contextlens.demo()"
python examples/demo.py
Live capture — Anthropic

import anthropic
import contextlens as cl

client = anthropic.Anthropic()

with cl.capture_anthropic(client, model="claude-3-5-sonnet-20241022") as collector:
for turn in range(20):
client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system="You are a helpful assistant.",
messages=build_messages(turn),
)

report = cl.analyze_trace(collector.build_trace())
print(f"Recoverable waste: {report.recoverable_tokens:,} tokens (${report.recoverable_cost_usd:.4f})")
Live capture — OpenAI

import openai
import contextlens as cl

client = openai.OpenAI()

with cl.capture_openai(client, model="gpt-4o") as collector:
for turn in range(20):
client.chat.completions.create(model="gpt-4o", messages=build_messages(turn))

report = cl.analyze_trace(collector.build_trace())
Analyze a saved trace

report = cl.analyze_file("trace.json")
html = cl.render_html_report(report)
open("report.html", "w").write(html)
Example terminal output

+---------------------------------------------------------------------+
| ContextLens | Run demo-001 |
| Model: claude-3-5-sonnet-20241022 | Provider: anthropic | Turns: 30 |
+---------------------------------------------------------------------+

Context Composition by Region


Region Tokens Cost (USD) Share
assistant_message 11,490 $0.0345 ###....... 25.5%
tool_result 10,333 $0.0310 ##........ 22.9%
tool_schema 9,450 $0.0284 ##........ 21.0%
retrieved_content 5,805 $0.0174 #......... 12.9%
user_message 4,740 $0.0142 #......... 10.5%
system 3,240 $0.0097 #......... 7.2%
TOTAL 45,058 $0.1352

Re-billing: 43,185 tokens (95.8%) re-billing waste -> $0.1296 recoverable

Top Waste Findings
# Type Sev. Wasted Tokens Cost Fix
1 duplicate medium 7,084 $0.0213 Cache or externalize...
2 redundant_ret medium 5,805 $0.0174 Use a re-ranker...
3 unused_schema low 3,150 $0.0095 Remove send_email...
Try the live demo
No install, no API key: https://huggingface.co/spaces/Harshal0610/contextlens

Links
GitHub: https://github.com/HarshalSant/contextlens
Install: pip install contextlens-profiler
License: MIT
Feedback welcome — especially from anyone running multi-turn agent loops at scale. What waste patterns do you run into most?

Quickstart


bash
pip install contextlens-profiler

Enter fullscreen mode Exit fullscreen mode

Top comments (0)