I built react-native-llm-meter, LLM cost tracking for Expo apps

#reactnative #llm #ai #typescript

If you ship Claude, GPT, or Gemini calls from a React Native app, you have a problem nobody's solved well, you don't know what's happening on the device.

Server-side observability is excellent. Langfuse, Helicone, LangSmith, Stripe's token-meter all work amazingly for Node backends. None of them work cleanly in an Expo app they assume a server, they pull Node only APIs they don't ship AsyncStorage adapters and streaming breaks under Hermes.

So I built it.

react-native-llm-meter is on npm. Currently three providers, two storage adapters, streaming TTFT, dev overlay, budget alerts, optional remote sink.

npm install react-native-llm-meter @react-native-async-storage/async-storage

import { Meter, MeterProvider } from "react-native-llm-meter";
import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic({ apiKey: process.env.EXPO_PUBLIC_ANTHROPIC_API_KEY });
const meter = new Meter();
const client = meter.wrap(anthropic);

export default function App() {
  return (
    <MeterProvider meter={meter}>
      <YourApp client={client} />
    </MeterProvider>
  );
}

Every call through the wrapped client gets recorded with provider, model, token counts, latency, TTFT for streams, and computed cost. Same interface as the original SDK you only change the construction.

What you get

meter.summary()
// {
//   count: 47,
//   totalCostUsd: 0.0894,
//   inputTokens: 24103,
//   outputTokens: 7379,
//   latencyP50: 612,
//   latencyP95: 1840,
//   ttftP50: 287,
//   ttftP95: 612,
//   byModel: { ... }
// }

Same data through useMetrics() for live UI, or meter.getEvents({ from, to }) if you want to roll your own.

Streaming TTFT

The thing that took me longest because total latency is easy but time to first token isn't because every provider streams differently and "first token" means something different in each SDK.

ttftMs is captured separately from latencyMs. They answer different questions — TTFT is perceived responsiveness (how long the user waited before anything showed), latency is total wall-clock duration. A model can have low TTFT and high latency, or vice versa.

Detection rules:

Provider	First-token signal
Anthropic	First `content_block_delta` chunk
OpenAI	First chunk where `choices[0].delta.content` is non-empty
Google	First chunk where `candidates[0].content.parts[0].text` is non-empty

For OpenAI streaming you also need stream_options: { include_usage: true } to get usage at all. The library can't fix that because it's a provider quirk but it warns when usage is missing so you catch it in dev.

Storage

Two adapters: AsyncStorageAdapter (work everywhere, day-bucketed retention) and SqliteAdapter (for higher volume, via expo-sqlite). There's a migration helper for moving from one to the other. Skip both and events live in memory.

Budgets

meter.setBudget({
  daily: 5,
  weekly: 25,
  onCross: ({ period, threshold, spend }) => {
    Alert.alert(`${period} limit hit`, `$${spend.toFixed(2)} / $${threshold}`);
  },
});

Soft alerts only fires the callback, doesn't block the request. Hard circuit-breakers change wrap()'s failure semantics and need more thought currently on the roadmap.

Dev overlay

import { MeterOverlay } from "react-native-llm-meter/overlay";

Floating, draggable, defaults to __DEV__ so it doesn't ship to production. Subpath import keeps react-native out of non-RN bundles.

What it deliberately doesn't do

No prompt content, ever. Token counts, latency, model name, cost, your supplied metadata. The wrapper structurally never sees prompt strings no debug mode, no opt-in flag. Mobile apps handle sensitive content, iff you want prompt logging, this is the wrong tool.

No server-side observability. If your LLM calls happen from Node, use Langfuse or Helicone. They're better at that. This is for the case where calls happen on the device.

No web. The core is platform-agnostic the build isn't done.

No hosted dashboard. It's a library. The remote sink lets you POST events to your own endpoint sentry, datadog, whatever you want.

Model Token Costs

Hardcoded in src/pricing/table.ts, snapshot of published rates. There's a PR template for updates that takes two minutes. Unknown models log a one-time warning per provider/model pair so you spot drift in dev, not in your billing.

Try it

npm install react-native-llm-meter @react-native-async-storage/async-storage

Repo: github.com/ankitvirdi4/react-native-llm-meter

Bugs, PRs, stale-pricing fixes all welcome. If you've shipped Claude or GPT in an Expo app and hit something I should know about, tell me I will like a shot at it!

Built by Ankit Virdi