LangSmith wants $400/month. Helicone needs you to proxy your AI traffic through their servers. Both require accounts, API keys, and sending your data to someone else's cloud.
I just wanted to know what my AI agents were costing me.
So I built bifrost-monitor — a Python decorator that tracks every AI call locally. No accounts. No infrastructure. No data leaving your machine.
Here's the full setup:
from bifrost_monitor import monitor
@monitor(name="support-agent", model="claude-sonnet-4-6")
async def handle_ticket(ticket):
response = await client.messages.create(...)
return response
That's it. Every call gets tracked — duration, tokens, cost, errors — stored in a local SQLite file.
Why I Built This
I was running multiple AI agents in production. Some used Claude, some used GPT-4o, one used Gemini. I had zero visibility into what any of them cost.
The existing options felt wrong:
| LangSmith | Helicone | bifrost-monitor | |
|---|---|---|---|
| Setup | Account + API key + proxy | Account + API proxy | pip install |
| Cost | $400/mo+ | $50/mo+ | Free |
| Data location | Their cloud | Their cloud | Your machine |
I didn't need a dashboard. I needed a decorator and a CLI.
How It Works
The Decorator
The @monitor decorator wraps your function without changing it. Sync or async — it detects automatically:
@monitor(name="classifier", model="gpt-4o")
def classify_email(email: str) -> str:
response = client.chat.completions.create(...)
return response.choices[0].message.content
@monitor(name="summarizer", model="claude-sonnet-4-6")
async def summarize_doc(doc: str) -> str:
response = await anthropic_client.messages.create(...)
return response.content[0].text
Under the hood it:
- Times execution with
time.perf_counter() - Auto-extracts token counts from the response object (duck-typed — works with Anthropic and OpenAI responses)
- Calculates cost using built-in pricing
- Records everything to SQLite
- Re-raises any exceptions after recording them
The function behaves identically. Zero code changes to your business logic.
Auto Token Extraction
This was the part I'm most pleased with. The decorator inspects your function's return value and duck-type detects token usage:
# Anthropic responses → extracts:
# usage.input_tokens
# usage.output_tokens
# usage.cache_read_input_tokens (prompt caching)
# usage.cache_creation_input_tokens
# OpenAI responses → extracts:
# usage.prompt_tokens
# usage.completion_tokens
If your function returns something without a .usage attribute, it still tracks everything else — duration, status, errors. Tokens just show as zero.
Built-in Pricing
13 models ship with current pricing (as of mid-2025):
- Anthropic — Claude Opus 4.6, Sonnet 4.6, Haiku 4.5 (including cache token rates)
- OpenAI — GPT-4o, GPT-4o-mini, GPT-4.1, GPT-4.1-mini, GPT-4.1-nano
- Google — Gemini 2.5 Pro, Gemini 2.5 Flash
Custom models are one call:
from bifrost_monitor import ModelPricing
pricing = ModelPricing()
pricing.add_model("my-fine-tune",
input_per_m=5.0,
output_per_m=15.0,
cache_read_per_m=0.5, # optional
cache_creation_per_m=1.0, # optional
)
Cost calculations use 8-decimal precision — accurate down to fractions of a cent across thousands of calls.
The CLI
Query everything from your terminal:
# What am I spending, broken down by model?
$ bifrost-monitor costs --group-by model
# Which agents are failing?
$ bifrost-monitor errors --last 7d
# Full summary for a specific agent
$ bifrost-monitor summary --name support-agent --last 24h
# Recent runs with status
$ bifrost-monitor runs --last 24h --status error
Output is color-coded Rich tables — green for success, red for errors, yellow for timeouts.
Architecture Decisions
Pluggable Storage
Storage is behind a protocol:
@runtime_checkable
class RunStore(Protocol):
def save(self, record: RunRecord) -> None: ...
def query(self, **kwargs: Any) -> list[RunRecord]: ...
SQLite is the default (zero-config, stored at ~/.bifrost-monitor/runs.db). But the protocol means you could plug in PostgreSQL, DynamoDB, or anything else without changing a line of application code.
Pydantic Models, Not Dicts
Every data structure is a Pydantic model:
class TokenUsage(BaseModel):
input_tokens: int = 0
output_tokens: int = 0
cache_read_tokens: int = 0
cache_creation_tokens: int = 0
@property
def total_tokens(self) -> int:
return self.input_tokens + self.output_tokens
No untyped dictionaries floating around. Pyright strict mode passes with zero errors.
Three-Index SQLite Schema
The database indexes name, started_at, and status — the three fields you filter on most. Queries stay fast even with thousands of recorded runs.
What I Learned
Prompt caching changes the cost math. Claude's cache tokens are 10x cheaper than standard input tokens. If you're not tracking cache hit rates, you're probably overestimating your costs. bifrost-monitor tracks cache_read_tokens and cache_creation_tokens separately so you can see the real numbers.
The decorator pattern is underrated for observability. Zero changes to the monitored function. No inheritance, no mixins, no context managers wrapping your code. Just @monitor and you're done.
Property-based testing catches edge cases you won't think of. I used Hypothesis to verify that cost calculations are always non-negative, monotonically increasing with token count, and consistent across cache/non-cache scenarios. Three property tests caught two bugs that unit tests missed.
Testing
99 tests. 95% coverage. 0.41 seconds.
The test suite includes:
- Unit tests for pricing accuracy (including cache token math)
- Decorator tests for both sync and async functions
- Token extraction tests against mock Anthropic and OpenAI response objects
- Property-based tests (Hypothesis) for cost calculation invariants
- Integration tests for the full decorator → storage → query pipeline
Try It
pip install bifrost-monitor
from bifrost_monitor import monitor
@monitor(name="my-agent", model="claude-sonnet-4-6")
async def my_agent(input: str):
response = await client.messages.create(...)
return response
# Later:
# $ bifrost-monitor costs --group-by model
Full source on GitHub — MIT licensed, 99 tests, typed with py.typed marker.
If you're running AI agents and don't know what they cost, this is the fastest way to find out. One import, five minutes, full visibility.
This is the 9th open-source package I've shipped under github.com/Jbermingham1 — each one solves a specific pain point I hit building AI systems in production.
Top comments (0)