DEV Community: serkan

8 Practical Ways to Reduce Your LLM API Costs (With Real Numbers)

serkan — Sun, 21 Jun 2026 17:54:31 +0000

LLM API bills can spiral fast once you're in production. Here are eight concrete techniques that actually move the needle, ranked roughly by impact.

1. Cache repeated prompts

If your app sends the same system prompt or common queries repeatedly, you're paying for the same computation over and over. Even a simple in-memory cache keyed on the exact prompt text can eliminate a meaningful chunk of spend — in our own usage data, repeated identical prompts accounted for a noticeable share of total cost.

2. Use cheaper models for simpler tasks

Not every request needs your most capable model. Classification, simple extraction, and short-form responses often work fine on smaller, cheaper models (GPT-4o-mini, Claude Haiku, Gemini Flash). Reserve the expensive models for tasks that actually need the reasoning power.

3. Trim your system prompts

Long system prompts get sent with every single request. If yours has grown organically over months of tweaks, audit it — every redundant sentence is a recurring cost multiplied by your request volume.

4. Set hard output limits

Use max_tokens aggressively. Open-ended generation tasks can produce far more output than you need, and you pay per token either way.

5. Batch requests where possible

Some providers offer batch APIs at a discount (often 50% off) for non-real-time workloads. If you're processing things asynchronously — summarizing a backlog, generating reports — batch APIs are free money left on the table if you're not using them.

6. Monitor for anomalies, not just totals

A monthly total doesn't tell you when something went wrong. A buggy retry loop or an unexpected usage spike can burn through a budget in hours. Daily-level monitoring with alerting on deviations from your normal spend catches this before it becomes a surprise bill.

7. A/B test before committing

Before switching your whole app to a "cheaper" model, actually measure it. Sometimes a cheaper model needs more retries or longer prompts to get usable output, which erases the savings. Compare cost AND output quality side by side on real traffic.

8. Know your per-feature cost breakdown

If you can't answer "which feature in my app costs the most," you can't prioritize optimization. Tagging requests by feature or use case (even just in your logs) turns a vague cost problem into a concrete, fixable one.

I built LLMWatch after running into most of these problems myself — it's a proxy that logs cost/latency per request, flags repeated prompts you could cache, and warns you when spend spikes. Free tier covers 1,000 requests/month if you want to see your own breakdown.

Building Simple Anomaly Detection for API Cost Tracking (No ML Required)

serkan — Sun, 21 Jun 2026 15:40:01 +0000

The problem

If you're tracking costs for any kind of usage-based API (LLM calls, cloud compute, etc.), you eventually want to know: "is today's spend unusual?"

You don't need machine learning for this. A simple statistical comparison against a rolling average works well enough for most cases.

The approach

Calculate the average daily cost over the past N days, then compare today's cost against that average. If it's significantly higher (I used 2x as the threshold), flag it.

function detectAnomaly(dailyCosts) {
  // dailyCosts is an array of { date, cost }, sorted chronologically
  if (dailyCosts.length < 2) return null

  const previousDays = dailyCosts.slice(0, -1)
  const avgCost = previousDays.reduce((sum, d) => sum + d.cost, 0) / previousDays.length

  const today = dailyCosts[dailyCosts.length - 1]
  const isAnomaly = avgCost > 0 && today.cost > avgCost * 2

  return isAnomaly ? { today: today.cost, average: avgCost } : null
}

Why 2x and not something more sophisticated

I considered standard deviation-based approaches (z-scores), but for small datasets (a week or two of daily costs), a simple multiplier is more predictable and easier to explain to users. "Your spend is 2x your average" is immediately understandable. "Your spend is 2.3 standard deviations above the mean" requires more context.

If you have enough historical data (weeks or months), z-scores become more reliable since they account for natural variance in your data.

A related problem: detecting duplicate work

While building this, I also added detection for repeated identical requests — useful for catching when an app is re-sending the same prompt to an LLM API without caching.

function findDuplicates(requests) {
  const counts = {}
  for (const req of requests) {
    const key = req.prompt
    if (!counts[key]) counts[key] = { count: 0, totalCost: 0 }
    counts[key].count++
    counts[key].totalCost += req.cost
  }

  return Object.entries(counts)
    .filter(([, data]) => data.count > 1)
    .map(([prompt, data]) => ({
      prompt,
      count: data.count,
      potentialSavings: data.totalCost - (data.totalCost / data.count),
    }))
}

The savings estimate assumes you'd cache after the first call, so you only pay once instead of N times.

Where this lives

Built this into LLMWatch, a tool I'm building for tracking LLM API costs. Both checks run client-side on data the dashboard already has, so there's no extra infrastructure needed — just array operations on data you're already fetching.

I was integrating Paddle Billing into a SaaS product. The checkout kept failing with:

serkan — Wed, 17 Jun 2026 18:26:08 +0000

I spent weeks debugging this. I tried:

Different price IDs (verified active in dashboard)
Different client-side tokens (regenerated multiple times)
Removing the customer object from Checkout.open()
Removing the settings object entirely
Hardcoding values instead of using env variables

Nothing worked. The error was always identical, regardless of what I changed in my JavaScript.

The actual fix

The issue had nothing to do with code. In Paddle's dashboard, under Checkout > General, there's a field called Default Payment Link. If it's not set, Paddle can't create transactions — even with a perfectly valid priceId and token.

Once I set this field to my app's URL, checkout started working immediately.

Why this is easy to miss

The error message points to transaction_checkout_id, which makes you think the problem is in how you're calling Checkout.open(). It gives zero indication that a dashboard setting unrelated to your code is the actual blocker.

If you're stuck on this

Check Checkout Settings > General > Default Payment Link before spending hours debugging your integration code. Save yourself the time I lost.

Building LLMWatch — a tool for tracking LLM API costs. This was part of getting payments working for it.

How to track OpenAI API costs with a simple proxy

serkan — Thu, 11 Jun 2026 17:50:49 +0000

The problem

You're building with the OpenAI API and suddenly get a $200 bill. Which feature caused it? Which user? Which prompt? You have no idea.

This happens to almost every developer building with LLMs.

The solution

I built LLMWatch – a lightweight proxy that sits between your app and OpenAI. It logs every request with exact cost, latency, and token usage.

How it works

Change one line in your code:

// Before
const openai = new OpenAI({ 
  baseURL: "https://api.openai.com" 
})

// After
const openai = new OpenAI({ 
  baseURL: "https://llmwatch-rho.vercel.app/api/proxy",
  defaultHeaders: {
    "x-llmwatch-key": "your_llmwatch_key"
  }
})

That's it. Every request is now logged.

What you get

Exact cost per request – See which prompt costs $0.001 vs $0.05
Token breakdown – Prompt tokens vs completion tokens
Latency tracking – Which requests are slow?
Cost alerts – Get an email when you hit 80% of your monthly budget

Getting started

Sign up at llmwatch-rho.vercel.app
Create a project and copy your API key
Change your baseURL
Done

Free tier: 1,000 requests/month. Pro plan: $20/month unlimited.

Would love feedback from anyone building with LLMs!