Mukunda Rao Katta

Posted on May 25

AgentSnap: Usage Tracking With Cost Aggregation for LLM Agents

#hermeschallenge #ai #python #agents

You want to know how much your agent is costing you. Not from the billing dashboard (which shows totals), but from the code: which model, which run, which task type, how many tokens, how many dollars. You want this queryable, filterable, and sortable.

agentsnap is a JSONL-based usage tracker with cost aggregation. Record every LLM call. Query by run, model, or time window. Get cost totals at any granularity.

The Shape of the Fix

from agentsnap import AgentSnap

snap = AgentSnap(path="./logs/usage.jsonl")

# After every LLM call
snap.record(
    run_id="run-abc123",
    model="claude-sonnet-4-6",
    input_tokens=1240,
    output_tokens=380,
)

# Query at any time
totals = snap.total_cost()
print(f"Total cost: ${totals.usd:.4f}")
print(f"Total tokens: {totals.tokens:,}")

# By run
run_cost = snap.cost_for_run("run-abc123")
print(f"Run cost: ${run_cost.usd:.4f}")

# Recent
today = snap.since(hours=24)
print(f"Last 24h: ${today.usd:.4f}")

One record() call per LLM call. Query functions aggregate on-demand from the JSONL file.

What It Does NOT Do

agentsnap does not fetch actual costs from provider billing APIs. It computes costs from token counts using rates you configure (or built-in defaults). The built-in rates may drift from actual provider pricing; update them as prices change.

It does not provide real-time alerting. It stores records; you query them. For real-time cost alerts (alert when cost exceeds $X in a time window), pair with agent-event-bus and subscribe to cost update events.

It does not handle token counting itself. You pass the token counts from response.usage.input_tokens and response.usage.output_tokens. The model's API response provides these values.

Inside the Library

The core data structure is a JSONL file with one record per LLM call:

{"run_id": "run-abc", "model": "claude-sonnet-4-6", "input_tokens": 1240, "output_tokens": 380, "ts": 1748107200.5}

The cost computation uses configurable per-model rates:

DEFAULT_RATES = {
    "claude-opus-4-5": {"input": 15.00 / 1_000_000, "output": 75.00 / 1_000_000},
    "claude-sonnet-4-6": {"input": 3.00 / 1_000_000, "output": 15.00 / 1_000_000},
    "claude-haiku-4-5": {"input": 0.25 / 1_000_000, "output": 1.25 / 1_000_000},
    "gpt-4o": {"input": 2.50 / 1_000_000, "output": 10.00 / 1_000_000},
    "gpt-4o-mini": {"input": 0.15 / 1_000_000, "output": 0.60 / 1_000_000},
}

class AgentSnap:
    def __init__(self, path: str, rates: dict | None = None):
        self._path = Path(path)
        self._rates = {**DEFAULT_RATES, **(rates or {})}
        self._lock = threading.Lock()

    def _compute_cost(self, model: str, input_tokens: int, output_tokens: int) -> float:
        rates = self._rates.get(model, {"input": 0, "output": 0})
        return (input_tokens * rates["input"]) + (output_tokens * rates["output"])

    def record(self, run_id: str, model: str, input_tokens: int, output_tokens: int, **metadata) -> None:
        cost_usd = self._compute_cost(model, input_tokens, output_tokens)
        entry = {
            "run_id": run_id,
            "model": model,
            "input_tokens": input_tokens,
            "output_tokens": output_tokens,
            "cost_usd": cost_usd,
            "ts": time.time(),
            **metadata,
        }
        with self._lock:
            with self._path.open("a") as f:
                f.write(json.dumps(entry) + "\n")

    def load_all(self) -> list[dict]:
        if not self._path.exists():
            return []
        return [json.loads(line) for line in self._path.read_text().splitlines() if line.strip()]

    def total_cost(self) -> UsageSummary:
        records = self.load_all()
        return self._summarize(records)

    def cost_for_run(self, run_id: str) -> UsageSummary:
        records = [r for r in self.load_all() if r["run_id"] == run_id]
        return self._summarize(records)

    def since(self, hours: float) -> UsageSummary:
        cutoff = time.time() - (hours * 3600)
        records = [r for r in self.load_all() if r["ts"] >= cutoff]
        return self._summarize(records)

    def _summarize(self, records: list[dict]) -> UsageSummary:
        return UsageSummary(
            usd=sum(r.get("cost_usd", 0) for r in records),
            tokens=sum(r.get("input_tokens", 0) + r.get("output_tokens", 0) for r in records),
            calls=len(records),
        )

When to Use It

Use it from day one on any agent that makes LLM calls. The JSONL overhead is negligible. Discovering that one task type accounts for 80% of your costs is only possible if you have been recording.

Use it for per-run cost analysis before and after prompt changes. Record costs for 100 runs with the old prompt and 100 with the new. Compare average cost and total cost per task type.

Use it for customer billing in multi-tenant systems. Group by run_id (which you can map to a customer). The JSONL query functions give you per-customer totals without a database.

Skip it for one-off scripts. If you run an agent once and never need to audit the cost, step log records are enough.

Install

pip install @mukundakatta/agentsnap

# From npm (TypeScript) or PyPI (Python)
pip install agentsnap-py

from agentsnap import AgentSnap
from agent_run_id import RunContext

snap = AgentSnap(path="./logs/usage.jsonl")

async def run_agent(task: str) -> str:
    ctx = RunContext.start()

    while True:
        response = await call_llm(messages)

        snap.record(
            run_id=str(ctx.run_id),
            model=response.model,
            input_tokens=response.usage.input_tokens,
            output_tokens=response.usage.output_tokens,
            task_type=classify_task(task),
        )

        if response.stop_reason == "end_turn":
            return extract_text(response)

        # ... continue loop

Sibling Libraries

Library	What it solves
`agent-run-id`	Generates run_id to pass to snap.record()
`agent-event-bus`	Publish cost events for real-time subscribers
`token-budget-pool`	Block runs that would exceed a total budget
`llm-budget-window`	Time-windowed hourly/daily cost caps
`agent-step-log`	Structured step log that includes token counts

The cost visibility stack: agentsnap for query-able usage records, agent-event-bus to broadcast cost events, token-budget-pool to block overruns, agent-run-id for grouping.

What's Next

Cost alerts: snap.watch(threshold_usd=10, window_hours=1, callback=alert_fn) that monitors the rolling window and fires the callback when total cost exceeds the threshold. Pairs with agent-event-bus for push-based alerting.

Per-model breakdown: snap.by_model() that returns a dict of model name to UsageSummary. Useful for auditing which models are being used most and what they cost relative to each other.

Export formats: snap.export_csv(), snap.export_parquet() for integration with analytics platforms like Redshift, BigQuery, or Superset.

Built as part of the agent-stack family: composable Python primitives for production LLM agents.

DEV Community