Diallo West

Posted on May 8

How I Built an API That Cuts LLM Token Costs by 11-22%

#ai #openai #python #javascript

I've been building AI-powered tools for the past year, and one thing kept bugging me: I was wasting money on tokens.

Not because my prompts were bad — but because they were verbose. Every prompt I wrote had filler words, redundant phrases, and unnecessary politeness that inflated my token counts without improving the output.

So I built Fortress Token Optimizer — an API that compresses prompts before they reach the LLM. Same meaning, fewer tokens, lower cost.

The Problem

Look at a typical prompt:

Could you please help me analyze this sales data and provide detailed
insights and recommendations for improvement?

18 tokens. But the LLM doesn't need "Could you please help me" — that's 5 tokens of politeness that doesn't change the output.

After optimization:

Analyze this sales data and provide detailed insights and
recommendations for improvement?

14 tokens. 22% saved. The model produces the same quality response.

Real Benchmarks

I tested across 5 prompt styles (casual chatty, business, technical):

Prompt Type	Before	After	Savings
Casual chatty (cover letter request)	75 tokens	58 tokens	23%
Technical (debugging help)	100 tokens	92 tokens	8%
Learning request (ML resources)	90 tokens	81 tokens	10%
Business analysis	77 tokens	74 tokens	4%
Project planning	69 tokens	61 tokens	12%
Average	82 tokens	73 tokens	11%

The pattern: the chattier the prompt, the more savings. Casual prompts with filler like "basically", "I was wondering if", "um", "please help me" see 15-23% savings. Technical prompts that are already dense save less.

How It Works

Four optimization passes, server-side:

Phrase compression — removes filler ("Could you please help me" → removed)
Deduplication — "analyze the data and provide analysis" → "analyze the data"
Meta-removal — strips instructions-about-instructions
Sentence optimization — tightens phrasing without changing meaning

It's not a regex. The optimizer understands prompt structure — it won't strip a code block or remove meaningful qualifiers.

Usage

Three lines in Python or JavaScript:

pip install fortress-optimizer

from fortress_optimizer import FortressClient

client = FortressClient(api_key="fk_your_key")
result = client.optimize("Could you please help me analyze this data")

print(result["optimization"]["optimized_prompt"])
# → "Analyze this data"
print(f"{result['tokens']['savings_percentage']}% saved")
# → "22% saved"

npm install fortress-optimizer

const { FortressClient } = require('fortress-optimizer');
const client = new FortressClient(process.env.FORTRESS_API_KEY);
const result = await client.optimize('Your prompt here');

Also available as a VS Code extension that runs in the background.

What Does This Save At Scale?

At 500 prompts/day with balanced optimization (~11% savings):

Model	Monthly Savings	Annual Savings
GPT-4 ($0.03/1K)	$4.05	$48.60
Claude Opus ($0.015/1K)	$2.03	$24.30
GPT-4o ($0.005/1K)	$0.68	$8.10

For a team of 10 engineers at 500 prompts/day each, that's $486/year on GPT-4 — and it compounds as models get more expensive or usage grows.

The savings are modest for individual developers, but they add up for teams running batch processing, RAG pipelines, or high-volume applications.

Three Optimization Levels

Level	Savings	Use Case
Conservative	~5%	Production prompts, minimal changes
Balanced	~11-15%	General use (default)
Aggressive	~15-22%	Batch processing, cost-sensitive

Free to Try

50,000 tokens/month free, no credit card. Get a key and try it on your existing prompts.

I'd love feedback — especially if you're running high-volume LLM workloads where token costs are a real line item.

Links:

DEV Community