I Built My Own LLM Observability Tool — Here’s Why and How

AdityaSharma2804 — Thu, 07 May 2026 06:49:45 +0000

When I started building applications on top of OpenAI and Anthropic APIs, I quickly ran into a frustrating problem. I had no idea how much money I was spending, how fast my API calls were, or how often they were failing. I'd run a script, it would finish, and I'd have no visibility into what actually happened under the hood.

Commercial tools like LangSmith and Helicone exist for this — but they require account setup, SDK changes, and monthly fees. I didn't want any of that. I wanted something I could drop into any Python project in one line and immediately get visibility. So I built llm-lens.

The Problem

Every time you call client.chat.completions.create(...), a lot of things happen that you never see:

How long did it take?
How many tokens did you use?
How much did it cost?
Did it fail silently?

If you're building a serious application on top of LLMs, these questions matter. Cost can spiral quickly. Latency affects user experience. Errors need to be caught and understood.

The existing solutions solve this, but they come with friction. You need to wrap your calls in their SDK, create an account, set up a project, and pay a monthly fee. For a developer who just wants local visibility with zero setup, there was no good answer.

The Solution: Zero-Configuration Instrumentation

llm-lens works by monkey-patching the OpenAI and Anthropic SDKs at import time. This means it replaces the internal create() method on both SDK clients with a wrapper — without you changing a single line of your existing code.

Here's all you need to add:

import llm_lens   # patches both SDKs automatically
import openai

client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)
# ^ this call is now fully tracked

That's it. No decorators. No wrappers. No config files. Just an import at the top.

How It Works Internally

When you run import llm_lens, the library calls patch_all() which does the following:

Imports the OpenAI and Anthropic SDK classes
Saves a reference to their original create() methods
Replaces them with a wrapper function

The wrapper does this on every API call:

User code calls create()
    → wrapper starts a timer with time.perf_counter()
    → calls the original SDK method
    → extracts usage.input_tokens, usage.output_tokens, model
    → calculates cost from a pricing table
    → writes a record to SQLite at ~/.llm_lens/calls.db
    → returns the original response untouched

The user gets the exact same response they would have gotten without llm-lens. The only difference is that a record was quietly saved in the background.

What Gets Tracked

Every API call logs the following to a local SQLite database:

Field	Description
`latency_ms`	End-to-end response time in milliseconds
`input_tokens`	Prompt tokens from the usage object
`output_tokens`	Completion tokens from the usage object
`cost_usd`	Calculated cost in USD (8 decimal precision)
`model`	Model string returned by the API
`status`	`ok` or `error`
`error`	Exception message if the call failed
`timestamp`	UTC datetime of the call

The pricing table covers gpt-4o, gpt-4o-mini, gpt-4-turbo, claude-3-5-sonnet, claude-3-5-haiku, and claude-3-opus. Fuzzy model matching handles version suffixes automatically, so gpt-4o-2024-08-06 resolves correctly.

Accessing Your Data

CLI — instant visibility in the terminal:

llm-lens              # rich table of every tracked call
llm-lens stats        # total calls, error rate, avg latency, total cost
llm-lens serve        # starts dashboard at localhost:8000
llm-lens config set cost_alert_usd 0.10  # set a cost alert

Live Dashboard:

Running llm-lens serve starts a FastAPI server and opens a single-page dashboard built in vanilla JS with Chart.js. It auto-refreshes every 5 seconds and shows:

A stats bar: total calls, error rate, avg latency, total cost
A latency-per-call line chart
An error-per-call bar chart with red/green color coding
A red alert banner if your cost threshold is breached

No build step. No npm. Just a single HTML file served by FastAPI.

Tech Stack Decisions

Why SQLite? No external database dependency. Data lives at ~/.llm_lens/calls.db on your machine. Works offline, works instantly, no setup required.

Why monkey-patching? It's the only approach that requires zero changes to existing code. The alternative — wrapping calls manually — defeats the purpose of a zero-configuration tool.

Why vanilla JS for the dashboard? No build step. No node_modules. The entire frontend is a single HTML file that loads Chart.js from a CDN. Anyone can open it, read it, and understand it in minutes.

Why FastAPI? Async, fast, and gives you automatic OpenAPI docs at /docs for free. The REST API has five endpoints: /calls, /stats, /alert, /health, and / for the dashboard.

Cost Alerts

You can set a cost threshold:

llm-lens config set cost_alert_usd 0.10

This stores a value in ~/.llm_lens/config.json. The dashboard's /alert endpoint checks the total spend against this threshold on every refresh. If you've crossed it, a red banner appears at the top of the dashboard.

This is particularly useful when you're iterating quickly and lose track of how many API calls you've made.

Privacy First

All data is stored locally at ~/.llm_lens/calls.db. Nothing leaves your machine unless you deploy the server yourself. No third party ever sees your API calls, prompts, or token usage.

What's Next

llm-lens is open source and available on GitHub at github.com/AdityaSharma2804/llm-lens. The live demo dashboard is at llm-lens.onrender.com.

Planned features include:

Async support for asyncio-based applications
Streaming response tracking
Per-model breakdown in the dashboard
ClickHouse migration for high-volume production use cases
Hallucination scoring — running a second cheap model call to score response confidence
Slack/email alerts when cost thresholds are breached

Try It

pip install llm-lens-py

Note: The PyPI package is llm-lens-py but the import name is llm_lens — this is standard Python convention.

Add one import to your project. That's all it takes.

If you find it useful, a GitHub star goes a long way. And if you run into bugs or have feature requests, open an issue — contributions are welcome.

Aditya Sharma is a B.Tech CSE student at Manipal University Jaipur (2026). You can find him on GitHub at @AdityaSharma2804.

DEV Community: AdityaSharma2804