<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ankit Virdi</title>
    <description>The latest articles on DEV Community by Ankit Virdi (@ankitvirdi4).</description>
    <link>https://dev.to/ankitvirdi4</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3908228%2Fd6c523c5-9a15-40ba-9d39-dff80fd8b597.jpeg</url>
      <title>DEV Community: Ankit Virdi</title>
      <link>https://dev.to/ankitvirdi4</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ankitvirdi4"/>
    <language>en</language>
    <item>
      <title>I built react-native-llm-meter, LLM cost tracking for Expo apps</title>
      <dc:creator>Ankit Virdi</dc:creator>
      <pubDate>Fri, 01 May 2026 23:29:36 +0000</pubDate>
      <link>https://dev.to/ankitvirdi4/i-built-react-native-llm-meter-llm-cost-tracking-for-expo-apps-410h</link>
      <guid>https://dev.to/ankitvirdi4/i-built-react-native-llm-meter-llm-cost-tracking-for-expo-apps-410h</guid>
      <description>&lt;p&gt;If you ship Claude, GPT, or Gemini calls from a React Native app, you have a problem nobody's solved well, you don't know what's happening on the device.&lt;/p&gt;

&lt;p&gt;Server-side observability is excellent. Langfuse, Helicone, LangSmith, Stripe's token-meter all work amazingly for Node backends. None of them work cleanly in an Expo app they assume a server, they pull Node only APIs they don't ship AsyncStorage adapters and streaming breaks under Hermes.&lt;/p&gt;

&lt;p&gt;So I built it.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;react-native-llm-meter&lt;/code&gt; is on npm. Currently three providers, two storage adapters, streaming TTFT, dev overlay, budget alerts, optional remote sink.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;react-native-llm-meter @react-native-async-storage/async-storage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Meter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;MeterProvider&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;react-native-llm-meter&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;Anthropic&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@anthropic-ai/sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;anthropic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;EXPO_PUBLIC_ANTHROPIC_API_KEY&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;meter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Meter&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;meter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wrap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;App&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;MeterProvider&lt;/span&gt; &lt;span class="nx"&gt;meter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;meter&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;YourApp&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/MeterProvider&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every call through the wrapped client gets recorded with provider, model, token counts, latency, TTFT for streams, and computed cost. Same interface as the original SDK you only change the construction.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you get
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;meter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;// {&lt;/span&gt;
&lt;span class="c1"&gt;//   count: 47,&lt;/span&gt;
&lt;span class="c1"&gt;//   totalCostUsd: 0.0894,&lt;/span&gt;
&lt;span class="c1"&gt;//   inputTokens: 24103,&lt;/span&gt;
&lt;span class="c1"&gt;//   outputTokens: 7379,&lt;/span&gt;
&lt;span class="c1"&gt;//   latencyP50: 612,&lt;/span&gt;
&lt;span class="c1"&gt;//   latencyP95: 1840,&lt;/span&gt;
&lt;span class="c1"&gt;//   ttftP50: 287,&lt;/span&gt;
&lt;span class="c1"&gt;//   ttftP95: 612,&lt;/span&gt;
&lt;span class="c1"&gt;//   byModel: { ... }&lt;/span&gt;
&lt;span class="c1"&gt;// }&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same data through &lt;code&gt;useMetrics()&lt;/code&gt; for live UI, or &lt;code&gt;meter.getEvents({ from, to })&lt;/code&gt; if you want to roll your own.&lt;/p&gt;

&lt;h2&gt;
  
  
  Streaming TTFT
&lt;/h2&gt;

&lt;p&gt;The thing that took me longest because total latency is easy but time to first token isn't because every provider streams differently and "first token" means something different in each SDK.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ttftMs&lt;/code&gt; is captured separately from &lt;code&gt;latencyMs&lt;/code&gt;. They answer different questions — TTFT is perceived responsiveness (how long the user waited before &lt;em&gt;anything&lt;/em&gt; showed), latency is total wall-clock duration. A model can have low TTFT and high latency, or vice versa.&lt;/p&gt;

&lt;p&gt;Detection rules:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;First-token signal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;First &lt;code&gt;content_block_delta&lt;/code&gt; chunk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;First chunk where &lt;code&gt;choices[0].delta.content&lt;/code&gt; is non-empty&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;First chunk where &lt;code&gt;candidates[0].content.parts[0].text&lt;/code&gt; is non-empty&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For OpenAI streaming you also need &lt;code&gt;stream_options: { include_usage: true }&lt;/code&gt; to get usage at all. The library can't fix that because it's a provider quirk but it warns when usage is missing so you catch it in dev.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage
&lt;/h2&gt;

&lt;p&gt;Two adapters: &lt;code&gt;AsyncStorageAdapter&lt;/code&gt; (work everywhere, day-bucketed retention) and &lt;code&gt;SqliteAdapter&lt;/code&gt; (for higher volume, via &lt;code&gt;expo-sqlite&lt;/code&gt;). There's a migration helper for moving from one to the other. Skip both and events live in memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Budgets
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;meter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setBudget&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;daily&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;weekly&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;onCross&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;period&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;spend&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;Alert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;alert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;period&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; limit hit`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;`$&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;spend&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toFixed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt; / $&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Soft alerts only fires the callback, doesn't block the request. Hard circuit-breakers change &lt;code&gt;wrap()&lt;/code&gt;'s failure semantics and need more thought currently on the roadmap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dev overlay
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;MeterOverlay&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;react-native-llm-meter/overlay&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Floating, draggable, defaults to &lt;code&gt;__DEV__&lt;/code&gt; so it doesn't ship to production. Subpath import keeps &lt;code&gt;react-native&lt;/code&gt; out of non-RN bundles.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it deliberately doesn't do
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;No prompt content, ever.&lt;/strong&gt; Token counts, latency, model name, cost, your supplied metadata. The wrapper structurally never sees prompt strings no debug mode, no opt-in flag. Mobile apps handle sensitive content, iff you want prompt logging, this is the wrong tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No server-side observability.&lt;/strong&gt; If your LLM calls happen from Node, use Langfuse or Helicone. They're better at that. This is for the case where calls happen on the device.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No web.&lt;/strong&gt; The core is platform-agnostic the build isn't done.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No hosted dashboard.&lt;/strong&gt; It's a library. The remote sink lets you POST events to your own endpoint sentry, datadog, whatever you want.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model Token Costs
&lt;/h2&gt;

&lt;p&gt;Hardcoded in &lt;code&gt;src/pricing/table.ts&lt;/code&gt;, snapshot of published rates. There's a PR template for updates that takes two minutes. Unknown models log a one-time warning per provider/model pair so you spot drift in dev, not in your billing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;react-native-llm-meter @react-native-async-storage/async-storage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Repo: &lt;a href="https://github.com/ankitvirdi4/react-native-llm-meter" rel="noopener noreferrer"&gt;github.com/ankitvirdi4/react-native-llm-meter&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bugs, PRs, stale-pricing fixes all welcome. If you've shipped Claude or GPT in an Expo app and hit something I should know about, tell me I will like a shot at it! &lt;/p&gt;




&lt;p&gt;Built by &lt;a href="https://github.com/ankitvirdi4" rel="noopener noreferrer"&gt;Ankit Virdi&lt;/a&gt;&lt;/p&gt;

</description>
      <category>reactnative</category>
      <category>llm</category>
      <category>ai</category>
      <category>typescript</category>
    </item>
  </channel>
</rss>
