I spent six weeks shipping the wrong thing.
I built PromptFuel because I was hemorrhaging money on API calls. Not because I was building at scale—I wasn't. I was building dumb. I'd write a prompt in isolation, test it once, ship it, then wonder why my OpenAI bill jumped $200. Turns out I was doing things like:
- Asking GPT-4 to write validation logic that Haiku could handle just fine
- Sending full context windows when 30% of it was redundant
- Retrying identical requests with slightly different temperatures instead of picking one and sticking with it
- Including examples in prompts that the model was already trained on
The real kick? None of this was visible. I had no idea which requests were wasteful, which models were overkill for my tasks, or where I was throwing money away. I just had a credit card statement and regret.
So I built a tool to see what I was actually doing. And then I optimized it. Here's how.
The Problem Nobody Talks About
Choosing the right model for a job isn't about capabilities. A Haiku can validate JSON, classify text, and format output just as well as GPT-4o for most real work. The difference is cost: Haiku costs 10x less per token.
But without visibility, you default to the expensive one. Because it's safe. Because you can't see the waste.
After I started measuring, I found:
- 35% of my requests didn't need GPT-4o. They were hitting it because it was the default, not because it was the right tool.
- 20% of my prompts had bloat. Instructions that contradicted each other, examples I copy-pasted but never used, context I included "just in case."
- 15% of requests were duplicates. Same input, same model, within minutes. If I'd cached or batched them, I'd cut token spend by half.
Total: 40% waste. $800 → $480. Not revolutionary, but real money for an indie project.
The fix wasn't rocket science. It was boring infrastructure: measure, analyze, optimize, repeat.
Step 1: See What You're Actually Doing
Install PromptFuel:
npm install -g promptfuel
That's it. No API keys, no auth, no bullshit. The tool runs locally.
Now run this on any prompt or code snippet:
pf optimize --input "Your prompt here"
Or point it at a file:
pf optimize --file my-prompt.txt
You get back:
- Token count — exactly what you'll be charged for
- Cost estimates — broken down by model (Haiku, Sonnet, GPT-4o, etc.)
- Optimization suggestions — what you can trim without losing meaning
- Model recommendations — which model actually makes sense for this task
Example output:
Current prompt: 412 tokens
Optimization suggestions:
- Remove redundant instruction (line 8)
- Simplify JSON schema example (saves 34 tokens)
- Collapse repeated context (saves 18 tokens)
Cost per call:
- GPT-4o: $0.006 (❌ overpowered)
- Claude 3.5 Sonnet: $0.002 (✓ recommended)
- Claude 3 Haiku: $0.0004 (✓ if you only need classification)
Estimated monthly (1000 calls):
- Current setup: $6.12
- Optimized: $1.84
That's the insight. That's what I was missing.
Step 2: Understand Your Actual Costs
Open the dashboard:
pf dashboard
Your default browser opens to a local dashboard showing:
- All your recent prompts and their token counts
- Cost distribution — which requests ate the most budget
- Model usage — are you using the expensive ones too much?
- Optimization opportunities — ranked by potential savings
The dashboard doesn't need your API keys. It's analyzing local data. But it will tell you which of your shipped prompts are costing way more than they should.
Spend 10 minutes here. You'll probably find something you didn't realize you were doing.
Step 3: Integrate into Your Stack
Once you see the waste, you'll want to catch it earlier. That's where the SDK and MCP server come in.
Option A: JavaScript SDK (for Next.js, Node apps)
npm install @promptfuel/sdk
import { PromptOptimizer } from '@promptfuel/sdk';
const optimizer = new PromptOptimizer();
const prompt = `You are a helpful assistant...
Classify the following text into categories...
[20 more lines of context you don't actually need]`;
const analysis = await optimizer.analyze(prompt);
console.log(`This prompt costs $${analysis.costPerCall.gpt4o}`);
console.log(`Optimized version: $${analysis.optimized.costPerCall.gpt4o}`);
// Actually use the optimized version
const optimizedPrompt = analysis.optimized.text;
Option B: Claude Code MCP Server (for use in Claude directly)
If you're like me and you use Claude for a lot of your thinking, add the PromptFuel MCP server to your Claude Code settings. Then ask Claude directly:
@promptfuel optimize my prompt for cost
[paste your prompt]
Claude runs it through PromptFuel's analysis and tells you exactly where you're bleeding money. Then it generates an optimized version.
Both approaches catch waste before it ships.
What Happened Next
After I actually measured and optimized my stuff, here's what I learned:
- You don't need the expensive model as often as you think. Most of my classification, formatting, and even some reasoning tasks work fine on Haiku.
- Prompt bloat is real. Every instruction that contradicts another one, every "just in case" example, every "let me explain the context" paragraph adds tokens and confusion.
- Token count scales weird. I thought I'd save 10%. I saved 40%. Because once you see the pattern, you fix it everywhere.
For me: $800 → $480/month. For you, it might be different. But it won't be zero.
Getting Started (Right Now)
- Install:
npm install -g promptfuel - Optimize a single prompt:
pf optimize --file your-prompt.txt - Open the dashboard:
pf dashboard - If you like it, integrate the SDK or MCP server into your workflow
No commitment. No API keys. No upsell. Just a free tool that shows you where your money's going.
The tool exists because I was tired of guessing. If you are too, give it a try: https://promptfuel.vercel.app?utm_source=devto&utm_medium=social&utm_campaign=max
Top comments (0)