How I Accidentally Spent $800/Month on LLM Tokens I Didn't Need (And How to Fix It)

#javascript #tutorial #ai #beginners

I spent six weeks shipping the wrong thing.

I built PromptFuel because I was hemorrhaging money on API calls. Not because I was building at scale—I wasn't. I was building dumb. I'd write a prompt in isolation, test it once, ship it, then wonder why my OpenAI bill jumped $200. Turns out I was doing things like:

Asking GPT-4 to write validation logic that Haiku could handle just fine
Sending full context windows when 30% of it was redundant
Retrying identical requests with slightly different temperatures instead of picking one and sticking with it
Including examples in prompts that the model was already trained on

The real kick? None of this was visible. I had no idea which requests were wasteful, which models were overkill for my tasks, or where I was throwing money away. I just had a credit card statement and regret.

So I built a tool to see what I was actually doing. And then I optimized it. Here's how.

The Problem Nobody Talks About

Choosing the right model for a job isn't about capabilities. A Haiku can validate JSON, classify text, and format output just as well as GPT-4o for most real work. The difference is cost: Haiku costs 10x less per token.

But without visibility, you default to the expensive one. Because it's safe. Because you can't see the waste.

After I started measuring, I found:

35% of my requests didn't need GPT-4o. They were hitting it because it was the default, not because it was the right tool.
20% of my prompts had bloat. Instructions that contradicted each other, examples I copy-pasted but never used, context I included "just in case."
15% of requests were duplicates. Same input, same model, within minutes. If I'd cached or batched them, I'd cut token spend by half.

Total: 40% waste. $800 → $480. Not revolutionary, but real money for an indie project.

The fix wasn't rocket science. It was boring infrastructure: measure, analyze, optimize, repeat.

Step 1: See What You're Actually Doing

Install PromptFuel:

npm install -g promptfuel

That's it. No API keys, no auth, no bullshit. The tool runs locally.

Now run this on any prompt or code snippet:

pf optimize --input "Your prompt here"

Or point it at a file:

pf optimize --file my-prompt.txt

You get back:

Token count — exactly what you'll be charged for
Cost estimates — broken down by model (Haiku, Sonnet, GPT-4o, etc.)
Optimization suggestions — what you can trim without losing meaning
Model recommendations — which model actually makes sense for this task

Example output:

Current prompt: 412 tokens

Optimization suggestions:
  - Remove redundant instruction (line 8)
  - Simplify JSON schema example (saves 34 tokens)
  - Collapse repeated context (saves 18 tokens)

Cost per call:
  - GPT-4o: $0.006 (❌ overpowered)
  - Claude 3.5 Sonnet: $0.002 (✓ recommended)
  - Claude 3 Haiku: $0.0004 (✓ if you only need classification)

Estimated monthly (1000 calls):
  - Current setup: $6.12
  - Optimized: $1.84

That's the insight. That's what I was missing.

Step 2: Understand Your Actual Costs

Open the dashboard:

pf dashboard

Your default browser opens to a local dashboard showing:

All your recent prompts and their token counts
Cost distribution — which requests ate the most budget
Model usage — are you using the expensive ones too much?
Optimization opportunities — ranked by potential savings

The dashboard doesn't need your API keys. It's analyzing local data. But it will tell you which of your shipped prompts are costing way more than they should.

Spend 10 minutes here. You'll probably find something you didn't realize you were doing.

Step 3: Integrate into Your Stack

Once you see the waste, you'll want to catch it earlier. That's where the SDK and MCP server come in.

Option A: JavaScript SDK (for Next.js, Node apps)

npm install @promptfuel/sdk

import { PromptOptimizer } from '@promptfuel/sdk';

const optimizer = new PromptOptimizer();

const prompt = `You are a helpful assistant...
Classify the following text into categories...
[20 more lines of context you don't actually need]`;

const analysis = await optimizer.analyze(prompt);

console.log(`This prompt costs $${analysis.costPerCall.gpt4o}`);
console.log(`Optimized version: $${analysis.optimized.costPerCall.gpt4o}`);

// Actually use the optimized version
const optimizedPrompt = analysis.optimized.text;

Option B: Claude Code MCP Server (for use in Claude directly)

If you're like me and you use Claude for a lot of your thinking, add the PromptFuel MCP server to your Claude Code settings. Then ask Claude directly:

@promptfuel optimize my prompt for cost

[paste your prompt]

Claude runs it through PromptFuel's analysis and tells you exactly where you're bleeding money. Then it generates an optimized version.

Both approaches catch waste before it ships.

What Happened Next

After I actually measured and optimized my stuff, here's what I learned:

You don't need the expensive model as often as you think. Most of my classification, formatting, and even some reasoning tasks work fine on Haiku.
Prompt bloat is real. Every instruction that contradicts another one, every "just in case" example, every "let me explain the context" paragraph adds tokens and confusion.
Token count scales weird. I thought I'd save 10%. I saved 40%. Because once you see the pattern, you fix it everywhere.

For me: $800 → $480/month. For you, it might be different. But it won't be zero.