<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: serkan</title>
    <description>The latest articles on DEV Community by serkan (@serkanubayy).</description>
    <link>https://dev.to/serkanubayy</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3979976%2Faba5eed0-bde1-417e-af68-8bbaec2cf5b4.png</url>
      <title>DEV Community: serkan</title>
      <link>https://dev.to/serkanubayy</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/serkanubayy"/>
    <language>en</language>
    <item>
      <title>8 Practical Ways to Reduce Your LLM API Costs (With Real Numbers)</title>
      <dc:creator>serkan</dc:creator>
      <pubDate>Sun, 21 Jun 2026 17:54:31 +0000</pubDate>
      <link>https://dev.to/serkanubayy/8-practical-ways-to-reduce-your-llm-api-costs-with-real-numbers-4l36</link>
      <guid>https://dev.to/serkanubayy/8-practical-ways-to-reduce-your-llm-api-costs-with-real-numbers-4l36</guid>
      <description>&lt;p&gt;LLM API bills can spiral fast once you're in production. Here are eight concrete techniques that actually move the needle, ranked roughly by impact.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Cache repeated prompts
&lt;/h2&gt;

&lt;p&gt;If your app sends the same system prompt or common queries repeatedly, you're paying for the same computation over and over. Even a simple in-memory cache keyed on the exact prompt text can eliminate a meaningful chunk of spend — in our own usage data, repeated identical prompts accounted for a noticeable share of total cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Use cheaper models for simpler tasks
&lt;/h2&gt;

&lt;p&gt;Not every request needs your most capable model. Classification, simple extraction, and short-form responses often work fine on smaller, cheaper models (GPT-4o-mini, Claude Haiku, Gemini Flash). Reserve the expensive models for tasks that actually need the reasoning power.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Trim your system prompts
&lt;/h2&gt;

&lt;p&gt;Long system prompts get sent with every single request. If yours has grown organically over months of tweaks, audit it — every redundant sentence is a recurring cost multiplied by your request volume.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Set hard output limits
&lt;/h2&gt;

&lt;p&gt;Use max_tokens aggressively. Open-ended generation tasks can produce far more output than you need, and you pay per token either way.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Batch requests where possible
&lt;/h2&gt;

&lt;p&gt;Some providers offer batch APIs at a discount (often 50% off) for non-real-time workloads. If you're processing things asynchronously — summarizing a backlog, generating reports — batch APIs are free money left on the table if you're not using them.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Monitor for anomalies, not just totals
&lt;/h2&gt;

&lt;p&gt;A monthly total doesn't tell you when something went wrong. A buggy retry loop or an unexpected usage spike can burn through a budget in hours. Daily-level monitoring with alerting on deviations from your normal spend catches this before it becomes a surprise bill.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. A/B test before committing
&lt;/h2&gt;

&lt;p&gt;Before switching your whole app to a "cheaper" model, actually measure it. Sometimes a cheaper model needs more retries or longer prompts to get usable output, which erases the savings. Compare cost AND output quality side by side on real traffic.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Know your per-feature cost breakdown
&lt;/h2&gt;

&lt;p&gt;If you can't answer "which feature in my app costs the most," you can't prioritize optimization. Tagging requests by feature or use case (even just in your logs) turns a vague cost problem into a concrete, fixable one.&lt;/p&gt;




&lt;p&gt;I built &lt;a href="https://llmwatch-rho.vercel.app" rel="noopener noreferrer"&gt;LLMWatch&lt;/a&gt; after running into most of these problems myself — it's a proxy that logs cost/latency per request, flags repeated prompts you could cache, and warns you when spend spikes. Free tier covers 1,000 requests/month if you want to see your own breakdown.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openai</category>
      <category>webdev</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Building Simple Anomaly Detection for API Cost Tracking (No ML Required)</title>
      <dc:creator>serkan</dc:creator>
      <pubDate>Sun, 21 Jun 2026 15:40:01 +0000</pubDate>
      <link>https://dev.to/serkanubayy/building-simple-anomaly-detection-for-api-cost-tracking-no-ml-required-94h</link>
      <guid>https://dev.to/serkanubayy/building-simple-anomaly-detection-for-api-cost-tracking-no-ml-required-94h</guid>
      <description>&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;If you're tracking costs for any kind of usage-based API (LLM calls, cloud compute, etc.), you eventually want to know: "is today's spend unusual?"&lt;/p&gt;

&lt;p&gt;You don't need machine learning for this. A simple statistical comparison against a rolling average works well enough for most cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  The approach
&lt;/h2&gt;

&lt;p&gt;Calculate the average daily cost over the past N days, then compare today's cost against that average. If it's significantly higher (I used 2x as the threshold), flag it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;detectAnomaly&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;dailyCosts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// dailyCosts is an array of { date, cost }, sorted chronologically&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;dailyCosts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;previousDays&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;dailyCosts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;avgCost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;previousDays&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;sum&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cost&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;previousDays&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;today&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;dailyCosts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;dailyCosts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isAnomaly&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;avgCost&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;today&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;avgCost&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;isAnomaly&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;today&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;today&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cost&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;average&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;avgCost&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why 2x and not something more sophisticated
&lt;/h2&gt;

&lt;p&gt;I considered standard deviation-based approaches (z-scores), but for small datasets (a week or two of daily costs), a simple multiplier is more predictable and easier to explain to users. "Your spend is 2x your average" is immediately understandable. "Your spend is 2.3 standard deviations above the mean" requires more context.&lt;/p&gt;

&lt;p&gt;If you have enough historical data (weeks or months), z-scores become more reliable since they account for natural variance in your data.&lt;/p&gt;

&lt;h2&gt;
  
  
  A related problem: detecting duplicate work
&lt;/h2&gt;

&lt;p&gt;While building this, I also added detection for repeated identical requests — useful for catching when an app is re-sending the same prompt to an LLM API without caching.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;findDuplicates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="nx"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;totalCost&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;
    &lt;span class="nx"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;totalCost&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cost&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(([,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(([&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;potentialSavings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;totalCost&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;totalCost&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}))&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The savings estimate assumes you'd cache after the first call, so you only pay once instead of N times.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this lives
&lt;/h2&gt;

&lt;p&gt;Built this into &lt;a href="https://llmwatch-rho.vercel.app" rel="noopener noreferrer"&gt;LLMWatch&lt;/a&gt;, a tool I'm building for tracking LLM API costs. Both checks run client-side on data the dashboard already has, so there's no extra infrastructure needed — just array operations on data you're already fetching.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>api</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>I was integrating Paddle Billing into a SaaS product. The checkout kept failing with:</title>
      <dc:creator>serkan</dc:creator>
      <pubDate>Wed, 17 Jun 2026 18:26:08 +0000</pubDate>
      <link>https://dev.to/serkanubayy/i-was-integrating-paddle-billing-into-a-saas-product-the-checkout-kept-failing-with-2pe7</link>
      <guid>https://dev.to/serkanubayy/i-was-integrating-paddle-billing-into-a-saas-product-the-checkout-kept-failing-with-2pe7</guid>
      <description>&lt;p&gt;I spent weeks debugging this. I tried:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Different price IDs (verified active in dashboard)&lt;/li&gt;
&lt;li&gt;Different client-side tokens (regenerated multiple times)&lt;/li&gt;
&lt;li&gt;Removing the &lt;code&gt;customer&lt;/code&gt; object from &lt;code&gt;Checkout.open()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Removing the &lt;code&gt;settings&lt;/code&gt; object entirely&lt;/li&gt;
&lt;li&gt;Hardcoding values instead of using env variables&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nothing worked. The error was always identical, regardless of what I changed in my JavaScript.&lt;/p&gt;

&lt;h2&gt;
  
  
  The actual fix
&lt;/h2&gt;

&lt;p&gt;The issue had nothing to do with code. In Paddle's dashboard, under &lt;strong&gt;Checkout &amp;gt; General&lt;/strong&gt;, there's a field called &lt;strong&gt;Default Payment Link&lt;/strong&gt;. If it's not set, Paddle can't create transactions — even with a perfectly valid &lt;code&gt;priceId&lt;/code&gt; and token.&lt;/p&gt;

&lt;p&gt;Once I set this field to my app's URL, checkout started working immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is easy to miss
&lt;/h2&gt;

&lt;p&gt;The error message points to &lt;code&gt;transaction_checkout_id&lt;/code&gt;, which makes you think the problem is in how you're calling &lt;code&gt;Checkout.open()&lt;/code&gt;. It gives zero indication that a dashboard setting unrelated to your code is the actual blocker.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you're stuck on this
&lt;/h2&gt;

&lt;p&gt;Check Checkout Settings &amp;gt; General &amp;gt; Default Payment Link before spending hours debugging your integration code. Save yourself the time I lost.&lt;/p&gt;




&lt;p&gt;Building &lt;a href="https://llmwatch-rho.vercel.app" rel="noopener noreferrer"&gt;LLMWatch&lt;/a&gt; — a tool for tracking LLM API costs. This was part of getting payments working for it.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>saas</category>
      <category>paddle</category>
    </item>
    <item>
      <title>How to track OpenAI API costs with a simple proxy</title>
      <dc:creator>serkan</dc:creator>
      <pubDate>Thu, 11 Jun 2026 17:50:49 +0000</pubDate>
      <link>https://dev.to/serkanubayy/how-to-track-openai-api-costs-with-a-simple-proxy-2l5f</link>
      <guid>https://dev.to/serkanubayy/how-to-track-openai-api-costs-with-a-simple-proxy-2l5f</guid>
      <description>&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;You're building with the OpenAI API and suddenly get a $200 bill. Which feature caused it? Which user? Which prompt? You have no idea.&lt;/p&gt;

&lt;p&gt;This happens to almost every developer building with LLMs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The solution
&lt;/h2&gt;

&lt;p&gt;I built LLMWatch – a lightweight proxy that sits between your app and OpenAI. It logs every request with exact cost, latency, and token usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;Change one line in your code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Before&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; 
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://api.openai.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; 
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// After&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; 
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://llmwatch-rho.vercel.app/api/proxy&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;defaultHeaders&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;x-llmwatch-key&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;your_llmwatch_key&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Every request is now logged.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you get
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exact cost per request&lt;/strong&gt; – See which prompt costs $0.001 vs $0.05&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token breakdown&lt;/strong&gt; – Prompt tokens vs completion tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency tracking&lt;/strong&gt; – Which requests are slow?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost alerts&lt;/strong&gt; – Get an email when you hit 80% of your monthly budget&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Getting started
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Sign up at &lt;a href="https://llmwatch-rho.vercel.app" rel="noopener noreferrer"&gt;llmwatch-rho.vercel.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Create a project and copy your API key&lt;/li&gt;
&lt;li&gt;Change your baseURL&lt;/li&gt;
&lt;li&gt;Done&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Free tier: 1,000 requests/month. Pro plan: $20/month unlimited.&lt;/p&gt;

&lt;p&gt;Would love feedback from anyone building with LLMs!&lt;/p&gt;

</description>
      <category>openai</category>
      <category>webdev</category>
      <category>javascript</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
