Jarrad Bermingham

Posted on Feb 25

I built a free alternative to LangSmith — one decorator, local SQLite, zero infrastructure

#python #ai #opensource #devtools

LangSmith wants $400/month. Helicone needs you to proxy your AI traffic through their servers. Both require accounts, API keys, and sending your data to someone else's cloud.

I just wanted to know what my AI agents were costing me.

So I built bifrost-monitor — a Python decorator that tracks every AI call locally. No accounts. No infrastructure. No data leaving your machine.

Here's the full setup:

from bifrost_monitor import monitor

@monitor(name="support-agent", model="claude-sonnet-4-6")
async def handle_ticket(ticket):
    response = await client.messages.create(...)
    return response

That's it. Every call gets tracked — duration, tokens, cost, errors — stored in a local SQLite file.

Why I Built This

I was running multiple AI agents in production. Some used Claude, some used GPT-4o, one used Gemini. I had zero visibility into what any of them cost.

The existing options felt wrong:

	LangSmith	Helicone	bifrost-monitor
Setup	Account + API key + proxy	Account + API proxy	pip install
Cost	$400/mo+	$50/mo+	Free
Data location	Their cloud	Their cloud	Your machine

I didn't need a dashboard. I needed a decorator and a CLI.

How It Works

The Decorator

The @monitor decorator wraps your function without changing it. Sync or async — it detects automatically:

@monitor(name="classifier", model="gpt-4o")
def classify_email(email: str) -> str:
    response = client.chat.completions.create(...)
    return response.choices[0].message.content

@monitor(name="summarizer", model="claude-sonnet-4-6")
async def summarize_doc(doc: str) -> str:
    response = await anthropic_client.messages.create(...)
    return response.content[0].text

Under the hood it:

Times execution with time.perf_counter()
Auto-extracts token counts from the response object (duck-typed — works with Anthropic and OpenAI responses)
Calculates cost using built-in pricing
Records everything to SQLite
Re-raises any exceptions after recording them

The function behaves identically. Zero code changes to your business logic.

Auto Token Extraction

This was the part I'm most pleased with. The decorator inspects your function's return value and duck-type detects token usage:

# Anthropic responses → extracts:
#   usage.input_tokens
#   usage.output_tokens
#   usage.cache_read_input_tokens    (prompt caching)
#   usage.cache_creation_input_tokens

# OpenAI responses → extracts:
#   usage.prompt_tokens
#   usage.completion_tokens

If your function returns something without a .usage attribute, it still tracks everything else — duration, status, errors. Tokens just show as zero.

Built-in Pricing

13 models ship with current pricing (as of mid-2025):

Anthropic — Claude Opus 4.6, Sonnet 4.6, Haiku 4.5 (including cache token rates)
OpenAI — GPT-4o, GPT-4o-mini, GPT-4.1, GPT-4.1-mini, GPT-4.1-nano
Google — Gemini 2.5 Pro, Gemini 2.5 Flash

Custom models are one call:

from bifrost_monitor import ModelPricing

pricing = ModelPricing()
pricing.add_model("my-fine-tune",
    input_per_m=5.0,
    output_per_m=15.0,
    cache_read_per_m=0.5,       # optional
    cache_creation_per_m=1.0,   # optional
)

Cost calculations use 8-decimal precision — accurate down to fractions of a cent across thousands of calls.

The CLI

Query everything from your terminal:

# What am I spending, broken down by model?
$ bifrost-monitor costs --group-by model

# Which agents are failing?
$ bifrost-monitor errors --last 7d

# Full summary for a specific agent
$ bifrost-monitor summary --name support-agent --last 24h

# Recent runs with status
$ bifrost-monitor runs --last 24h --status error

Output is color-coded Rich tables — green for success, red for errors, yellow for timeouts.

Architecture Decisions

Pluggable Storage

Storage is behind a protocol:

@runtime_checkable
class RunStore(Protocol):
    def save(self, record: RunRecord) -> None: ...
    def query(self, **kwargs: Any) -> list[RunRecord]: ...

SQLite is the default (zero-config, stored at ~/.bifrost-monitor/runs.db). But the protocol means you could plug in PostgreSQL, DynamoDB, or anything else without changing a line of application code.

Pydantic Models, Not Dicts

Every data structure is a Pydantic model:

class TokenUsage(BaseModel):
    input_tokens: int = 0
    output_tokens: int = 0
    cache_read_tokens: int = 0
    cache_creation_tokens: int = 0

    @property
    def total_tokens(self) -> int:
        return self.input_tokens + self.output_tokens

No untyped dictionaries floating around. Pyright strict mode passes with zero errors.

Three-Index SQLite Schema

The database indexes name, started_at, and status — the three fields you filter on most. Queries stay fast even with thousands of recorded runs.

What I Learned

Prompt caching changes the cost math. Claude's cache tokens are 10x cheaper than standard input tokens. If you're not tracking cache hit rates, you're probably overestimating your costs. bifrost-monitor tracks cache_read_tokens and cache_creation_tokens separately so you can see the real numbers.

The decorator pattern is underrated for observability. Zero changes to the monitored function. No inheritance, no mixins, no context managers wrapping your code. Just @monitor and you're done.

Property-based testing catches edge cases you won't think of. I used Hypothesis to verify that cost calculations are always non-negative, monotonically increasing with token count, and consistent across cache/non-cache scenarios. Three property tests caught two bugs that unit tests missed.

Testing

99 tests. 95% coverage. 0.41 seconds.

The test suite includes:

Unit tests for pricing accuracy (including cache token math)
Decorator tests for both sync and async functions
Token extraction tests against mock Anthropic and OpenAI response objects
Property-based tests (Hypothesis) for cost calculation invariants
Integration tests for the full decorator → storage → query pipeline

Try It

pip install bifrost-monitor

from bifrost_monitor import monitor

@monitor(name="my-agent", model="claude-sonnet-4-6")
async def my_agent(input: str):
    response = await client.messages.create(...)
    return response

# Later:
# $ bifrost-monitor costs --group-by model

Full source on GitHub — MIT licensed, 99 tests, typed with py.typed marker.

If you're running AI agents and don't know what they cost, this is the fastest way to find out. One import, five minutes, full visibility.

This is the 9th open-source package I've shipped under github.com/Jbermingham1 — each one solves a specific pain point I hit building AI systems in production.

DEV Community