DEV Community

Cover image for I Built My Own LLM Observability Tool — Here’s Why and How
AdityaSharma2804
AdityaSharma2804

Posted on • Originally published at Medium

I Built My Own LLM Observability Tool — Here’s Why and How

When I started building applications on top of OpenAI and Anthropic APIs, I quickly ran into a frustrating problem. I had no idea how much money I was spending, how fast my API calls were, or how often they were failing. I'd run a script, it would finish, and I'd have no visibility into what actually happened under the hood.

Commercial tools like LangSmith and Helicone exist for this — but they require account setup, SDK changes, and monthly fees. I didn't want any of that. I wanted something I could drop into any Python project in one line and immediately get visibility. So I built llm-lens.


The Problem

Every time you call client.chat.completions.create(...), a lot of things happen that you never see:

  • How long did it take?
  • How many tokens did you use?
  • How much did it cost?
  • Did it fail silently?

If you're building a serious application on top of LLMs, these questions matter. Cost can spiral quickly. Latency affects user experience. Errors need to be caught and understood.

The existing solutions solve this, but they come with friction. You need to wrap your calls in their SDK, create an account, set up a project, and pay a monthly fee. For a developer who just wants local visibility with zero setup, there was no good answer.


The Solution: Zero-Configuration Instrumentation

llm-lens works by monkey-patching the OpenAI and Anthropic SDKs at import time. This means it replaces the internal create() method on both SDK clients with a wrapper — without you changing a single line of your existing code.

Here's all you need to add:

import llm_lens   # patches both SDKs automatically
import openai

client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)
# ^ this call is now fully tracked
Enter fullscreen mode Exit fullscreen mode

That's it. No decorators. No wrappers. No config files. Just an import at the top.


How It Works Internally

When you run import llm_lens, the library calls patch_all() which does the following:

  1. Imports the OpenAI and Anthropic SDK classes
  2. Saves a reference to their original create() methods
  3. Replaces them with a wrapper function

The wrapper does this on every API call:

User code calls create()
    → wrapper starts a timer with time.perf_counter()
    → calls the original SDK method
    → extracts usage.input_tokens, usage.output_tokens, model
    → calculates cost from a pricing table
    → writes a record to SQLite at ~/.llm_lens/calls.db
    → returns the original response untouched
Enter fullscreen mode Exit fullscreen mode

The user gets the exact same response they would have gotten without llm-lens. The only difference is that a record was quietly saved in the background.


What Gets Tracked

Every API call logs the following to a local SQLite database:

Field Description
latency_ms End-to-end response time in milliseconds
input_tokens Prompt tokens from the usage object
output_tokens Completion tokens from the usage object
cost_usd Calculated cost in USD (8 decimal precision)
model Model string returned by the API
status ok or error
error Exception message if the call failed
timestamp UTC datetime of the call

The pricing table covers gpt-4o, gpt-4o-mini, gpt-4-turbo, claude-3-5-sonnet, claude-3-5-haiku, and claude-3-opus. Fuzzy model matching handles version suffixes automatically, so gpt-4o-2024-08-06 resolves correctly.


Accessing Your Data

CLI — instant visibility in the terminal:

llm-lens              # rich table of every tracked call
llm-lens stats        # total calls, error rate, avg latency, total cost
llm-lens serve        # starts dashboard at localhost:8000
llm-lens config set cost_alert_usd 0.10  # set a cost alert
Enter fullscreen mode Exit fullscreen mode

Live Dashboard:

Running llm-lens serve starts a FastAPI server and opens a single-page dashboard built in vanilla JS with Chart.js. It auto-refreshes every 5 seconds and shows:

  • A stats bar: total calls, error rate, avg latency, total cost
  • A latency-per-call line chart
  • An error-per-call bar chart with red/green color coding
  • A red alert banner if your cost threshold is breached

No build step. No npm. Just a single HTML file served by FastAPI.


Tech Stack Decisions

Why SQLite? No external database dependency. Data lives at ~/.llm_lens/calls.db on your machine. Works offline, works instantly, no setup required.

Why monkey-patching? It's the only approach that requires zero changes to existing code. The alternative — wrapping calls manually — defeats the purpose of a zero-configuration tool.

Why vanilla JS for the dashboard? No build step. No node_modules. The entire frontend is a single HTML file that loads Chart.js from a CDN. Anyone can open it, read it, and understand it in minutes.

Why FastAPI? Async, fast, and gives you automatic OpenAPI docs at /docs for free. The REST API has five endpoints: /calls, /stats, /alert, /health, and / for the dashboard.


Cost Alerts

You can set a cost threshold:

llm-lens config set cost_alert_usd 0.10
Enter fullscreen mode Exit fullscreen mode

This stores a value in ~/.llm_lens/config.json. The dashboard's /alert endpoint checks the total spend against this threshold on every refresh. If you've crossed it, a red banner appears at the top of the dashboard.

This is particularly useful when you're iterating quickly and lose track of how many API calls you've made.


Privacy First

All data is stored locally at ~/.llm_lens/calls.db. Nothing leaves your machine unless you deploy the server yourself. No third party ever sees your API calls, prompts, or token usage.


What's Next

llm-lens is open source and available on GitHub at github.com/AdityaSharma2804/llm-lens. The live demo dashboard is at llm-lens.onrender.com.

Planned features include:

  • Async support for asyncio-based applications
  • Streaming response tracking
  • Per-model breakdown in the dashboard
  • ClickHouse migration for high-volume production use cases
  • Hallucination scoring — running a second cheap model call to score response confidence
  • Slack/email alerts when cost thresholds are breached

Try It

pip install llm-lens-py
Enter fullscreen mode Exit fullscreen mode

Note: The PyPI package is llm-lens-py but the import name is llm_lens — this is standard Python convention.

Add one import to your project. That's all it takes.

If you find it useful, a GitHub star goes a long way. And if you run into bugs or have feature requests, open an issue — contributions are welcome.


Aditya Sharma is a B.Tech CSE student at Manipal University Jaipur (2026). You can find him on GitHub at @AdityaSharma2804.

Top comments (0)