<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Harshal Sant</title>
    <description>The latest articles on DEV Community by Harshal Sant (@harshal_sant_be921c5039f2).</description>
    <link>https://dev.to/harshal_sant_be921c5039f2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3716736%2Fca8db413-fdf7-4e2c-b738-5b48d3a2a072.png</url>
      <title>DEV Community: Harshal Sant</title>
      <link>https://dev.to/harshal_sant_be921c5039f2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/harshal_sant_be921c5039f2"/>
    <language>en</language>
    <item>
      <title>ContextLens — py-spy/pprof but for what's inside your LLM prompt</title>
      <dc:creator>Harshal Sant</dc:creator>
      <pubDate>Mon, 08 Jun 2026 15:32:34 +0000</pubDate>
      <link>https://dev.to/harshal_sant_be921c5039f2/contextlens-py-spypprof-but-for-whats-inside-your-llm-prompt-59l7</link>
      <guid>https://dev.to/harshal_sant_be921c5039f2/contextlens-py-spypprof-but-for-whats-inside-your-llm-prompt-59l7</guid>
      <description>&lt;p&gt;In multi-turn agent loops, the full context re-sends on every API call. A tool result added at turn 3 gets billed again at turns 4, 5, 6, 7... forever. Most of it is never read again.&lt;/p&gt;

&lt;p&gt;Standard observability tools tell you the &lt;em&gt;total&lt;/em&gt; token count. They never tell you &lt;em&gt;what's in there&lt;/em&gt; or &lt;em&gt;how much of it is waste&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;That's what &lt;strong&gt;ContextLens&lt;/strong&gt; fixes.&lt;/p&gt;




&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;ContextLens is a diagnostic profiler for LLM agent context windows. It:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Decomposes the context window into regions: system prompt, tool schemas, tool results, retrieved chunks, user messages, assistant messages&lt;/li&gt;
&lt;li&gt;Tracks which blocks get re-billed across turns using SHA-256 content hashing&lt;/li&gt;
&lt;li&gt;Runs 5 waste detectors and ranks findings by dollar cost&lt;/li&gt;
&lt;li&gt;Prints a concrete one-line fix for each finding&lt;/li&gt;
&lt;li&gt;Renders an interactive D3 treemap report as a self-contained HTML file&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No API key required. Works offline on saved traces.&lt;/p&gt;




&lt;h2&gt;
  
  
  The five detectors
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Detector&lt;/th&gt;
&lt;th&gt;What it finds&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Duplicate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Same block re-sent verbatim across multiple turns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Near-Duplicate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&amp;gt;85% Jaccard similarity between distinct blocks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stale Tool Result&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tool output never referenced by a later assistant message&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Unused Tool Schema&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tool defined every turn but never called&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Redundant Retrieval&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Retrieved chunk with &amp;lt;15% overlap with model output&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;---Run the built-in demo (simulates a 30-turn agent loop, no API key needed):&lt;/p&gt;

&lt;p&gt;python -c "import contextlens; contextlens.demo()"&lt;br&gt;
python examples/demo.py&lt;br&gt;
Live capture — Anthropic&lt;/p&gt;

&lt;p&gt;import anthropic&lt;br&gt;
import contextlens as cl&lt;/p&gt;

&lt;p&gt;client = anthropic.Anthropic()&lt;/p&gt;

&lt;p&gt;with cl.capture_anthropic(client, model="claude-3-5-sonnet-20241022") as collector:&lt;br&gt;
    for turn in range(20):&lt;br&gt;
        client.messages.create(&lt;br&gt;
            model="claude-3-5-sonnet-20241022",&lt;br&gt;
            max_tokens=1024,&lt;br&gt;
            system="You are a helpful assistant.",&lt;br&gt;
            messages=build_messages(turn),&lt;br&gt;
        )&lt;/p&gt;

&lt;p&gt;report = cl.analyze_trace(collector.build_trace())&lt;br&gt;
print(f"Recoverable waste: {report.recoverable_tokens:,} tokens (${report.recoverable_cost_usd:.4f})")&lt;br&gt;
Live capture — OpenAI&lt;/p&gt;

&lt;p&gt;import openai&lt;br&gt;
import contextlens as cl&lt;/p&gt;

&lt;p&gt;client = openai.OpenAI()&lt;/p&gt;

&lt;p&gt;with cl.capture_openai(client, model="gpt-4o") as collector:&lt;br&gt;
    for turn in range(20):&lt;br&gt;
        client.chat.completions.create(model="gpt-4o", messages=build_messages(turn))&lt;/p&gt;

&lt;p&gt;report = cl.analyze_trace(collector.build_trace())&lt;br&gt;
Analyze a saved trace&lt;/p&gt;

&lt;p&gt;report = cl.analyze_file("trace.json")&lt;br&gt;
html = cl.render_html_report(report)&lt;br&gt;
open("report.html", "w").write(html)&lt;br&gt;
Example terminal output&lt;/p&gt;

&lt;p&gt;+---------------------------------------------------------------------+&lt;br&gt;
| ContextLens | Run demo-001                                          |&lt;br&gt;
| Model: claude-3-5-sonnet-20241022 | Provider: anthropic | Turns: 30 |&lt;br&gt;
+---------------------------------------------------------------------+&lt;/p&gt;

&lt;p&gt;Context Composition by Region&lt;/p&gt;




&lt;p&gt;Region              Tokens    Cost (USD)   Share&lt;br&gt;
  assistant_message   11,490    $0.0345      ###....... 25.5%&lt;br&gt;
  tool_result         10,333    $0.0310      ##........ 22.9%&lt;br&gt;
  tool_schema          9,450    $0.0284      ##........ 21.0%&lt;br&gt;
  retrieved_content    5,805    $0.0174      #......... 12.9%&lt;br&gt;
  user_message         4,740    $0.0142      #......... 10.5%&lt;br&gt;
  system               3,240    $0.0097       #.........  7.2%&lt;br&gt;
  TOTAL               45,058    $0.1352&lt;/p&gt;

&lt;p&gt;Re-billing: 43,185 tokens (95.8%) re-billing waste -&amp;gt; $0.1296 recoverable&lt;/p&gt;

&lt;p&gt;Top Waste Findings&lt;br&gt;
  #   Type              Sev.   Wasted Tokens  Cost      Fix&lt;br&gt;
  1   duplicate         medium     7,084     $0.0213   Cache or externalize...&lt;br&gt;
  2   redundant_ret     medium     5,805     $0.0174   Use a re-ranker...&lt;br&gt;
  3   unused_schema     low        3,150     $0.0095   Remove send_email...&lt;br&gt;
Try the live demo&lt;br&gt;
No install, no API key: &lt;a href="https://huggingface.co/spaces/Harshal0610/contextlens" rel="noopener noreferrer"&gt;https://huggingface.co/spaces/Harshal0610/contextlens&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Links&lt;br&gt;
GitHub: &lt;a href="https://github.com/HarshalSant/contextlens" rel="noopener noreferrer"&gt;https://github.com/HarshalSant/contextlens&lt;/a&gt;&lt;br&gt;
Install: pip install contextlens-profiler&lt;br&gt;
License: MIT&lt;br&gt;
Feedback welcome — especially from anyone running multi-turn agent loops at scale. What waste patterns do you run into most?&lt;/p&gt;

&lt;h2&gt;
  
  
  Quickstart
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
pip install contextlens-profiler

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>python</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
