ContextLens — py-spy/pprof but for what's inside your LLM prompt

#ai #llm #opensource #python

In multi-turn agent loops, the full context re-sends on every API call. A tool result added at turn 3 gets billed again at turns 4, 5, 6, 7... forever. Most of it is never read again.

Standard observability tools tell you the total token count. They never tell you what's in there or how much of it is waste.

That's what ContextLens fixes.

What it does

ContextLens is a diagnostic profiler for LLM agent context windows. It:

Decomposes the context window into regions: system prompt, tool schemas, tool results, retrieved chunks, user messages, assistant messages
Tracks which blocks get re-billed across turns using SHA-256 content hashing
Runs 5 waste detectors and ranks findings by dollar cost
Prints a concrete one-line fix for each finding
Renders an interactive D3 treemap report as a self-contained HTML file

No API key required. Works offline on saved traces.

The five detectors

Detector	What it finds
Duplicate	Same block re-sent verbatim across multiple turns
Near-Duplicate	>85% Jaccard similarity between distinct blocks
Stale Tool Result	Tool output never referenced by a later assistant message
Unused Tool Schema	Tool defined every turn but never called
Redundant Retrieval	Retrieved chunk with <15% overlap with model output

---Run the built-in demo (simulates a 30-turn agent loop, no API key needed):

python -c "import contextlens; contextlens.demo()"
python examples/demo.py
Live capture — Anthropic

import anthropic
import contextlens as cl

client = anthropic.Anthropic()

with cl.capture_anthropic(client, model="claude-3-5-sonnet-20241022") as collector:
for turn in range(20):
client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system="You are a helpful assistant.",
messages=build_messages(turn),
)

report = cl.analyze_trace(collector.build_trace())
print(f"Recoverable waste: {report.recoverable_tokens:,} tokens (${report.recoverable_cost_usd:.4f})")
Live capture — OpenAI

import openai
import contextlens as cl

client = openai.OpenAI()

with cl.capture_openai(client, model="gpt-4o") as collector:
for turn in range(20):
client.chat.completions.create(model="gpt-4o", messages=build_messages(turn))

report = cl.analyze_trace(collector.build_trace())
Analyze a saved trace

report = cl.analyze_file("trace.json")
html = cl.render_html_report(report)
open("report.html", "w").write(html)
Example terminal output

Context Composition by Region

Region Tokens Cost (USD) Share
assistant_message 11,490 $0.0345 ###....... 25.5%
tool_result 10,333 $0.0310 ##........ 22.9%
tool_schema 9,450 $0.0284 ##........ 21.0%
retrieved_content 5,805 $0.0174 #......... 12.9%
user_message 4,740 $0.0142 #......... 10.5%
system 3,240 $0.0097 #......... 7.2%
TOTAL 45,058 $0.1352

Re-billing: 43,185 tokens (95.8%) re-billing waste -> $0.1296 recoverable

Top Waste Findings
# Type Sev. Wasted Tokens Cost Fix
1 duplicate medium 7,084 $0.0213 Cache or externalize...
2 redundant_ret medium 5,805 $0.0174 Use a re-ranker...
3 unused_schema low 3,150 $0.0095 Remove send_email...
Try the live demo
No install, no API key: https://huggingface.co/spaces/Harshal0610/contextlens

Links
GitHub: https://github.com/HarshalSant/contextlens
Install: pip install contextlens-profiler
License: MIT
Feedback welcome — especially from anyone running multi-turn agent loops at scale. What waste patterns do you run into most?

Quickstart


bash
pip install contextlens-profiler

DEV Community

ContextLens — py-spy/pprof but for what's inside your LLM prompt

What it does

The five detectors

Quickstart

Top comments (0)