<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: AdityaSharma2804</title>
    <description>The latest articles on DEV Community by AdityaSharma2804 (@adityasharma2804).</description>
    <link>https://dev.to/adityasharma2804</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3917303%2Fed962c34-328c-4f25-8f43-df3eae65263e.png</url>
      <title>DEV Community: AdityaSharma2804</title>
      <link>https://dev.to/adityasharma2804</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/adityasharma2804"/>
    <language>en</language>
    <item>
      <title>I Built My Own LLM Observability Tool — Here’s Why and How</title>
      <dc:creator>AdityaSharma2804</dc:creator>
      <pubDate>Thu, 07 May 2026 06:49:45 +0000</pubDate>
      <link>https://dev.to/adityasharma2804/i-built-my-own-llm-observability-tool-heres-why-and-how-3619</link>
      <guid>https://dev.to/adityasharma2804/i-built-my-own-llm-observability-tool-heres-why-and-how-3619</guid>
      <description>&lt;p&gt;When I started building applications on top of OpenAI and Anthropic APIs, I quickly ran into a frustrating problem. I had no idea how much money I was spending, how fast my API calls were, or how often they were failing. I'd run a script, it would finish, and I'd have no visibility into what actually happened under the hood.&lt;/p&gt;

&lt;p&gt;Commercial tools like LangSmith and Helicone exist for this — but they require account setup, SDK changes, and monthly fees. I didn't want any of that. I wanted something I could drop into any Python project in one line and immediately get visibility. So I built &lt;strong&gt;llm-lens&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Every time you call &lt;code&gt;client.chat.completions.create(...)&lt;/code&gt;, a lot of things happen that you never see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How long did it take?&lt;/li&gt;
&lt;li&gt;How many tokens did you use?&lt;/li&gt;
&lt;li&gt;How much did it cost?&lt;/li&gt;
&lt;li&gt;Did it fail silently?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building a serious application on top of LLMs, these questions matter. Cost can spiral quickly. Latency affects user experience. Errors need to be caught and understood.&lt;/p&gt;

&lt;p&gt;The existing solutions solve this, but they come with friction. You need to wrap your calls in their SDK, create an account, set up a project, and pay a monthly fee. For a developer who just wants local visibility with zero setup, there was no good answer.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Solution: Zero-Configuration Instrumentation
&lt;/h2&gt;

&lt;p&gt;llm-lens works by &lt;strong&gt;monkey-patching&lt;/strong&gt; the OpenAI and Anthropic SDKs at import time. This means it replaces the internal &lt;code&gt;create()&lt;/code&gt; method on both SDK clients with a wrapper — without you changing a single line of your existing code.&lt;/p&gt;

&lt;p&gt;Here's all you need to add:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;llm_lens&lt;/span&gt;   &lt;span class="c1"&gt;# patches both SDKs automatically
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# ^ this call is now fully tracked
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No decorators. No wrappers. No config files. Just an import at the top.&lt;/p&gt;




&lt;h2&gt;
  
  
  How It Works Internally
&lt;/h2&gt;

&lt;p&gt;When you run &lt;code&gt;import llm_lens&lt;/code&gt;, the library calls &lt;code&gt;patch_all()&lt;/code&gt; which does the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Imports the OpenAI and Anthropic SDK classes&lt;/li&gt;
&lt;li&gt;Saves a reference to their original &lt;code&gt;create()&lt;/code&gt; methods&lt;/li&gt;
&lt;li&gt;Replaces them with a wrapper function&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The wrapper does this on every API call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User code calls create()
    → wrapper starts a timer with time.perf_counter()
    → calls the original SDK method
    → extracts usage.input_tokens, usage.output_tokens, model
    → calculates cost from a pricing table
    → writes a record to SQLite at ~/.llm_lens/calls.db
    → returns the original response untouched
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The user gets the exact same response they would have gotten without llm-lens. The only difference is that a record was quietly saved in the background.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Gets Tracked
&lt;/h2&gt;

&lt;p&gt;Every API call logs the following to a local SQLite database:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;latency_ms&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;End-to-end response time in milliseconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;input_tokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Prompt tokens from the usage object&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;output_tokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Completion tokens from the usage object&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cost_usd&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Calculated cost in USD (8 decimal precision)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;model&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Model string returned by the API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;status&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ok&lt;/code&gt; or &lt;code&gt;error&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;error&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Exception message if the call failed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;timestamp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;UTC datetime of the call&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pricing table covers &lt;code&gt;gpt-4o&lt;/code&gt;, &lt;code&gt;gpt-4o-mini&lt;/code&gt;, &lt;code&gt;gpt-4-turbo&lt;/code&gt;, &lt;code&gt;claude-3-5-sonnet&lt;/code&gt;, &lt;code&gt;claude-3-5-haiku&lt;/code&gt;, and &lt;code&gt;claude-3-opus&lt;/code&gt;. Fuzzy model matching handles version suffixes automatically, so &lt;code&gt;gpt-4o-2024-08-06&lt;/code&gt; resolves correctly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Accessing Your Data
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;CLI — instant visibility in the terminal:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;llm-lens              &lt;span class="c"&gt;# rich table of every tracked call&lt;/span&gt;
llm-lens stats        &lt;span class="c"&gt;# total calls, error rate, avg latency, total cost&lt;/span&gt;
llm-lens serve        &lt;span class="c"&gt;# starts dashboard at localhost:8000&lt;/span&gt;
llm-lens config &lt;span class="nb"&gt;set &lt;/span&gt;cost_alert_usd 0.10  &lt;span class="c"&gt;# set a cost alert&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Live Dashboard:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Running &lt;code&gt;llm-lens serve&lt;/code&gt; starts a FastAPI server and opens a single-page dashboard built in vanilla JS with Chart.js. It auto-refreshes every 5 seconds and shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A stats bar: total calls, error rate, avg latency, total cost&lt;/li&gt;
&lt;li&gt;A latency-per-call line chart&lt;/li&gt;
&lt;li&gt;An error-per-call bar chart with red/green color coding&lt;/li&gt;
&lt;li&gt;A red alert banner if your cost threshold is breached&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No build step. No npm. Just a single HTML file served by FastAPI.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tech Stack Decisions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why SQLite?&lt;/strong&gt; No external database dependency. Data lives at &lt;code&gt;~/.llm_lens/calls.db&lt;/code&gt; on your machine. Works offline, works instantly, no setup required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why monkey-patching?&lt;/strong&gt; It's the only approach that requires zero changes to existing code. The alternative — wrapping calls manually — defeats the purpose of a zero-configuration tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why vanilla JS for the dashboard?&lt;/strong&gt; No build step. No node_modules. The entire frontend is a single HTML file that loads Chart.js from a CDN. Anyone can open it, read it, and understand it in minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why FastAPI?&lt;/strong&gt; Async, fast, and gives you automatic OpenAPI docs at &lt;code&gt;/docs&lt;/code&gt; for free. The REST API has five endpoints: &lt;code&gt;/calls&lt;/code&gt;, &lt;code&gt;/stats&lt;/code&gt;, &lt;code&gt;/alert&lt;/code&gt;, &lt;code&gt;/health&lt;/code&gt;, and &lt;code&gt;/&lt;/code&gt; for the dashboard.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cost Alerts
&lt;/h2&gt;

&lt;p&gt;You can set a cost threshold:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;llm-lens config &lt;span class="nb"&gt;set &lt;/span&gt;cost_alert_usd 0.10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This stores a value in &lt;code&gt;~/.llm_lens/config.json&lt;/code&gt;. The dashboard's &lt;code&gt;/alert&lt;/code&gt; endpoint checks the total spend against this threshold on every refresh. If you've crossed it, a red banner appears at the top of the dashboard.&lt;/p&gt;

&lt;p&gt;This is particularly useful when you're iterating quickly and lose track of how many API calls you've made.&lt;/p&gt;




&lt;h2&gt;
  
  
  Privacy First
&lt;/h2&gt;

&lt;p&gt;All data is stored locally at &lt;code&gt;~/.llm_lens/calls.db&lt;/code&gt;. Nothing leaves your machine unless you deploy the server yourself. No third party ever sees your API calls, prompts, or token usage.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;llm-lens is open source and available on GitHub at &lt;a href="https://github.com/AdityaSharma2804/llm-lens" rel="noopener noreferrer"&gt;github.com/AdityaSharma2804/llm-lens&lt;/a&gt;. The live demo dashboard is at &lt;a href="https://llm-lens.onrender.com" rel="noopener noreferrer"&gt;llm-lens.onrender.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Planned features include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Async support&lt;/strong&gt; for asyncio-based applications&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Streaming response tracking&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-model breakdown&lt;/strong&gt; in the dashboard&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ClickHouse migration&lt;/strong&gt; for high-volume production use cases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hallucination scoring&lt;/strong&gt; — running a second cheap model call to score response confidence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slack/email alerts&lt;/strong&gt; when cost thresholds are breached&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;llm-lens-py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Note: The PyPI package is &lt;code&gt;llm-lens-py&lt;/code&gt; but the import name is &lt;code&gt;llm_lens&lt;/code&gt; — this is standard Python convention.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Add one import to your project. That's all it takes.&lt;/p&gt;

&lt;p&gt;If you find it useful, a GitHub star goes a long way. And if you run into bugs or have feature requests, open an issue — contributions are welcome.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Aditya Sharma is a B.Tech CSE student at Manipal University Jaipur (2026). You can find him on GitHub at &lt;a href="https://github.com/AdityaSharma2804" rel="noopener noreferrer"&gt;@AdityaSharma2804&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>llm</category>
      <category>opensource</category>
      <category>devtools</category>
    </item>
  </channel>
</rss>
