DEV Community

Cover image for I built react-native-llm-meter, LLM cost tracking for Expo apps
Ankit Virdi
Ankit Virdi

Posted on

I built react-native-llm-meter, LLM cost tracking for Expo apps

If you ship Claude, GPT, or Gemini calls from a React Native app, you have a problem nobody's solved well, you don't know what's happening on the device.

Server-side observability is excellent. Langfuse, Helicone, LangSmith, Stripe's token-meter all work amazingly for Node backends. None of them work cleanly in an Expo app they assume a server, they pull Node only APIs they don't ship AsyncStorage adapters and streaming breaks under Hermes.

So I built it.

react-native-llm-meter is on npm. Currently three providers, two storage adapters, streaming TTFT, dev overlay, budget alerts, optional remote sink.

npm install react-native-llm-meter @react-native-async-storage/async-storage
Enter fullscreen mode Exit fullscreen mode
import { Meter, MeterProvider } from "react-native-llm-meter";
import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic({ apiKey: process.env.EXPO_PUBLIC_ANTHROPIC_API_KEY });
const meter = new Meter();
const client = meter.wrap(anthropic);

export default function App() {
  return (
    <MeterProvider meter={meter}>
      <YourApp client={client} />
    </MeterProvider>
  );
}
Enter fullscreen mode Exit fullscreen mode

Every call through the wrapped client gets recorded with provider, model, token counts, latency, TTFT for streams, and computed cost. Same interface as the original SDK you only change the construction.

What you get

meter.summary()
// {
//   count: 47,
//   totalCostUsd: 0.0894,
//   inputTokens: 24103,
//   outputTokens: 7379,
//   latencyP50: 612,
//   latencyP95: 1840,
//   ttftP50: 287,
//   ttftP95: 612,
//   byModel: { ... }
// }
Enter fullscreen mode Exit fullscreen mode

Same data through useMetrics() for live UI, or meter.getEvents({ from, to }) if you want to roll your own.

Streaming TTFT

The thing that took me longest because total latency is easy but time to first token isn't because every provider streams differently and "first token" means something different in each SDK.

ttftMs is captured separately from latencyMs. They answer different questions — TTFT is perceived responsiveness (how long the user waited before anything showed), latency is total wall-clock duration. A model can have low TTFT and high latency, or vice versa.

Detection rules:

Provider First-token signal
Anthropic First content_block_delta chunk
OpenAI First chunk where choices[0].delta.content is non-empty
Google First chunk where candidates[0].content.parts[0].text is non-empty

For OpenAI streaming you also need stream_options: { include_usage: true } to get usage at all. The library can't fix that because it's a provider quirk but it warns when usage is missing so you catch it in dev.

Storage

Two adapters: AsyncStorageAdapter (work everywhere, day-bucketed retention) and SqliteAdapter (for higher volume, via expo-sqlite). There's a migration helper for moving from one to the other. Skip both and events live in memory.

Budgets

meter.setBudget({
  daily: 5,
  weekly: 25,
  onCross: ({ period, threshold, spend }) => {
    Alert.alert(`${period} limit hit`, `$${spend.toFixed(2)} / $${threshold}`);
  },
});
Enter fullscreen mode Exit fullscreen mode

Soft alerts only fires the callback, doesn't block the request. Hard circuit-breakers change wrap()'s failure semantics and need more thought currently on the roadmap.

Dev overlay

import { MeterOverlay } from "react-native-llm-meter/overlay";
Enter fullscreen mode Exit fullscreen mode

Floating, draggable, defaults to __DEV__ so it doesn't ship to production. Subpath import keeps react-native out of non-RN bundles.

What it deliberately doesn't do

No prompt content, ever. Token counts, latency, model name, cost, your supplied metadata. The wrapper structurally never sees prompt strings no debug mode, no opt-in flag. Mobile apps handle sensitive content, iff you want prompt logging, this is the wrong tool.

No server-side observability. If your LLM calls happen from Node, use Langfuse or Helicone. They're better at that. This is for the case where calls happen on the device.

No web. The core is platform-agnostic the build isn't done.

No hosted dashboard. It's a library. The remote sink lets you POST events to your own endpoint sentry, datadog, whatever you want.

Model Token Costs

Hardcoded in src/pricing/table.ts, snapshot of published rates. There's a PR template for updates that takes two minutes. Unknown models log a one-time warning per provider/model pair so you spot drift in dev, not in your billing.

Try it

npm install react-native-llm-meter @react-native-async-storage/async-storage
Enter fullscreen mode Exit fullscreen mode

Repo: github.com/ankitvirdi4/react-native-llm-meter

Bugs, PRs, stale-pricing fixes all welcome. If you've shipped Claude or GPT in an Expo app and hit something I should know about, tell me I will like a shot at it!


Built by Ankit Virdi

Top comments (0)