A Rails app starts calling OpenAI or Anthropic. A few months later someone in finance asks "who's burning $X a month on this and on what?" The answer requires per-user, per-feature, per-tenant attribution — and the obvious solutions all want you to give up something I wasn't willing to give up.
This is the design rationale behind llm_cost_tracker, a Rails Engine I've been building. It's not the only way to solve this problem; it's the way that fit the constraints I cared about.
The constraint set
Three non-negotiables shaped every other choice:
- No new infra. A Rails app already has a database, a request lifecycle, an authentication layer, a dashboard pattern. Anything I bolt on should reuse those, not duplicate them.
- No prompt storage. Prompt content is regulated data in a lot of contexts — PII, customer transcripts, medical, legal. The tracker has no business holding it.
- No traffic redirection. Direct calls to OpenAI / Anthropic / Gemini are the simplest path and the one with fewest failure modes. A proxy adds a hop, a key rotation surface, and a vendor relationship.
Those three rules ruled out most of the existing landscape.
Why not a proxy
The first instinct for "track LLM spend" is to put a proxy in front. Helicone, Portkey, LiteLLM Proxy, OpenRouter — they all model the problem this way: route OpenAI traffic through proxy.example.com, the proxy sees the request and response, logs cost, forwards to OpenAI.
It's a clean separation. It also means your API keys live in their config not yours, their downtime is your downtime, their TLS and data-residency posture is yours by default, their rate-limiting sits between your code and the provider, and a new SDK feature waits on their proxy supporting it.
For some teams that trade is fine. It wasn't fine for me — the LLM call is already the most expensive and most reliability-sensitive thing in the request, and putting another hop in front of it felt like the wrong move.
The alternative: capture what we need inside the Ruby process, on the way out. Patch the official SDK methods at boot, or wrap the underlying Faraday client. The call still goes straight to OpenAI / Anthropic / Gemini; we just observe the request and response as they pass through.
Why ActiveRecord, not a TSDB
The second fork: where does the data go? Cost tracking is shaped like a time series — append-mostly rows, aggregations over time windows. A TSDB (Timescale, ClickHouse, Influx) is the textbook answer.
I picked Postgres / MySQL via ActiveRecord anyway, for one reason: the data is operational, not analytical. It needs to join to your users table, your subscriptions table, your tenants table. It needs to live behind the same RLS and the same backups as the rest of your app data. Standing up a separate TSDB to query "show me LLM cost for tenant 42 last month" makes that join harder, not easier.
Three tables ship in the install generator: llm_cost_tracker_calls (one row per LLM call, with token counts and total cost), llm_cost_tracker_call_line_items (per-component breakdown — input, output, cache reads, hosted tool charges), and llm_cost_tracker_call_tags (the attribution rows). For the LLM volumes most Rails apps see today, a single Postgres handles this fine.
Why block-scoped tags
Attribution is the whole game. Tokens × rate × model gives you a total; tags answer "whose total is it?"
The mechanism is a block:
LlmCostTracker.with_tags(user_id: current_user.id, feature: "support_chat") do
client.chat.completions.create(model: "gpt-4o-mini", messages: ...)
end
Anything that hits a tracked SDK or Faraday client inside that block picks up the tags. You wrap it around an around_action in a controller, around perform in a job, around a feature module's entry point. The SDK call itself doesn't change.
The reason it's not a kwarg on the SDK call: I don't control the SDK call. The OpenAI gem's client.chat.completions.create has its own signature; threading a tag through it would mean either monkey-patching the call shape or asking every caller to use a wrapper. Block-scoped context fits Ruby's grain — same shape as ActiveSupport::CurrentAttributes, same shape as Rails request-store patterns.
Tags merge across nested blocks (inner wins), get sanitized for high-cardinality or secret-shaped values, and end up as a row per (call, key, value) in the database. Group by, filter, breakdown.
Why a frozen pricing snapshot per call
Prices change. OpenAI cut prompt caching rates twice in the last year; Anthropic introduced 1-hour cache TTL with its own rate; Gemini rolled out context-length-tiered pricing. If you compute cost lazily — "the rate is whatever the current price table says" — you have a moving floor under historical reports.
So every call freezes its pricing snapshot at write time: the exact per-component rate that produced its cost, stamped on the row. Run a report from three months ago today, you get what it cost then. Update the price table tomorrow, historical numbers don't shift.
The trade-off is storage: a few hundred bytes per call for the snapshot. At the volumes we're talking about for LLM calls, that's invisible next to the message bodies themselves (which we don't store).
What's there now
Version 0.11.0 instruments three official SDKs (OpenAI, Anthropic, RubyLLM) and ships Faraday middleware for everything else — OpenAI-compatible APIs like Groq, DeepSeek, OpenRouter; Azure OpenAI on both endpoint styles; Gemini; custom gateways. The mounted dashboard at /llm-costs has pages for cost overview, top models, the call ledger, tag breakdowns, data-quality signals, and a pricing reference. Budget guardrails block calls before send when an estimate would cross a configured monthly, daily, or per-call cap.
What it deliberately isn't: prompt or completion storage, trace replay, eval framework, model-routing logic, sidecar service, OpenTelemetry exporter. Each of those would justify a separate gem.
If your shape is "Rails app, direct API calls to one or two providers, finance asking where the spend goes" — this is the layer I wanted to exist.
Top comments (0)