DEV Community

brian austin
brian austin

Posted on

How to build a production-ready AI cost monitor in 30 lines of Node.js

How to build a production-ready AI cost monitor in 30 lines of Node.js

With GPT-5.5 and DeepSeek v4 both launching this week, every developer I know is asking the same question: how much is my AI actually costing me?

Here's a lightweight cost monitor you can drop into any Node.js app in 10 minutes.

The problem

You build a feature, ship it, and three days later you check your billing dashboard and see a number that makes you want to sit down. Token costs are invisible until they're not.

The fix is a simple usage tracker that logs every API call with token counts and running totals.

The code

// ai-cost-monitor.js
const fs = require('fs');

const COSTS = {
  'claude-3-5-sonnet': { input: 0.000003, output: 0.000015 },
  'gpt-4o': { input: 0.0000025, output: 0.00001 },
  'gpt-5.5': { input: 0.000005, output: 0.000020 },
  'deepseek-v4': { input: 0.0000014, output: 0.0000028 },
};

class CostMonitor {
  constructor(logFile = './ai-costs.json') {
    this.logFile = logFile;
    this.session = { calls: 0, inputTokens: 0, outputTokens: 0, cost: 0 };
  }

  track(model, inputTokens, outputTokens) {
    const rate = COSTS[model] || { input: 0.000003, output: 0.000015 };
    const cost = (inputTokens * rate.input) + (outputTokens * rate.output);

    this.session.calls++;
    this.session.inputTokens += inputTokens;
    this.session.outputTokens += outputTokens;
    this.session.cost += cost;

    this._persist();

    if (this.session.cost > 1.0) {
      console.warn(`⚠️  AI spend this session: $${this.session.cost.toFixed(4)}`);
    }

    return cost;
  }

  _persist() {
    const log = { ...this.session, updatedAt: new Date().toISOString() };
    fs.writeFileSync(this.logFile, JSON.stringify(log, null, 2));
  }

  summary() {
    return `${this.session.calls} calls | $${this.session.cost.toFixed(4)} | ${this.session.inputTokens + this.session.outputTokens} tokens`;
  }
}

module.exports = new CostMonitor();
Enter fullscreen mode Exit fullscreen mode

Usage with Claude

const Anthropic = require('@anthropic-ai/sdk');
const monitor = require('./ai-cost-monitor');

const client = new Anthropic();

async function chat(userMessage) {
  const response = await client.messages.create({
    model: 'claude-3-5-sonnet-20241022',
    max_tokens: 1024,
    messages: [{ role: 'user', content: userMessage }]
  });

  // Track the cost
  const cost = monitor.track(
    'claude-3-5-sonnet',
    response.usage.input_tokens,
    response.usage.output_tokens
  );

  console.log(`This call: $${cost.toFixed(6)} | Session: ${monitor.summary()}`);
  return response.content[0].text;
}

// Example
chat('Explain async/await in one paragraph').then(console.log);
Enter fullscreen mode Exit fullscreen mode

Output

This call: $0.000084 | 12 calls | $0.001247 | 4,832 tokens
This call: $0.000091 | 13 calls | $0.001338 | 5,201 tokens
⚠️  AI spend this session: $1.0023
Enter fullscreen mode Exit fullscreen mode

Usage with GPT-5.5

const { OpenAI } = require('openai');
const monitor = require('./ai-cost-monitor');

const client = new OpenAI();

async function chat(userMessage) {
  const response = await client.chat.completions.create({
    model: 'gpt-5.5',
    messages: [{ role: 'user', content: userMessage }]
  });

  monitor.track(
    'gpt-5.5',
    response.usage.prompt_tokens,
    response.usage.completion_tokens
  );

  return response.choices[0].message.content;
}
Enter fullscreen mode Exit fullscreen mode

Add a daily budget cap

track(model, inputTokens, outputTokens, dailyBudget = 5.0) {
  // ... existing track code ...

  // Load today's total from persistent log
  let todayTotal = this._getTodayTotal();
  todayTotal += cost;

  if (todayTotal > dailyBudget) {
    throw new Error(`Daily AI budget exceeded: $${todayTotal.toFixed(4)} / $${dailyBudget}`);
  }

  return cost;
}
Enter fullscreen mode Exit fullscreen mode

Now your app throws before it ruins your month.

The flat-rate escape hatch

If you're building something where usage is unpredictable — a user-facing chatbot, a background agent, anything with unbounded conversation length — monitoring helps but doesn't solve the problem. The cost is variable by definition.

The escape hatch I use: SimplyLouie wraps the Claude API at a flat $2/month. I monitor my own API costs for internal tools. For user-facing features, I route through a flat-rate wrapper so cost is $2/month regardless of conversation volume.

Not the right solution for every use case, but it eliminates the "check billing and want to sit down" experience entirely for those apps.

Discussion

What's the most surprising AI cost you've encountered in production?

I've seen developers shocked by how fast context window accumulation compounds costs in multi-turn conversations. What's your war story? And are you tracking costs per-request, per-session, or just checking the monthly dashboard and hoping for the best?

Top comments (0)