DEV Community

AgentAutopsy Team
AgentAutopsy Team

Posted on

How I Cut My AI API Costs by 70% (With Real Invoice Numbers)

───

I've been building an AI-powered IELTS speaking practice app. The core pipeline is straightforward: Whisper for speech-to-text → GPT-4o for evaluation and feedback → TTS for audio response.

Before launch, I ran the numbers and nearly killed the project on the spot. At 100 daily active users doing 3 sessions each, API costs alone would eat ¥3,000+/month (~$420). No revenue yet. Just burning cash.

Then I found SubRouter. Here's what happened after a few months of real usage.

The Actual Numbers

No cherry-picking. These are my real account stats:

Metric Value
Total spent (SubRouter) ¥3,538.78 (~$490)
Total API requests 6,447
Total tokens consumed 7.15 million+
Equivalent cost at official pricing ¥8,000–¥15,000 ($1,100–$2,070)
Savings ¥4,461–¥11,461 (~56–76%)

What Is SubRouter?

SubRouter is an API gateway that proxies requests to major AI models — OpenAI, Anthropic, Google — at 60–75% below official pricing. The interface is 100% OpenAI-compatible, so any SDK that supports a custom base_url works without modification.

Important: these are full-power, unmodified models. No downgraded or distilled versions. Same models, same capabilities, same responses — just cheaper.

Current pricing comparison (per million tokens):

Model Official Input Price SubRouter Savings
GPT-4o $5.00 ~$0.69 86%
Claude Sonnet 4 $3.00 ~$0.99 67%
Claude Opus 4 $15.00 ~$3.97 74%

Migration: The Actual Steps

Python (openai SDK)

from openai import OpenAI

client = OpenAI(
api_key="sk-subrouter-your-key",
base_url="https://api.subrouter.ai/v1"
)

Everything else stays the same

response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Evaluate my speaking sample."}
],
temperature=0.7
)
print(response.choices[0].message.content)

Environment Variables (Recommended)

.env

OPENAI_API_KEY=sk-subrouter-your-key
OPENAI_BASE_URL=https://api.subrouter.ai/v1

Zero changes to your code

import openai
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)

curl (Quick Test)

curl https://api.subrouter.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-subrouter-your-key" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'

Node.js

import OpenAI from 'openai';

const client = new OpenAI({
apiKey: process.env.SUBROUTER_API_KEY,
baseURL: 'https://api.subrouter.ai/v1',
});

const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello!' }],
});

What I've Noticed After a Few Months

Latency: Negligible difference. I run a streaming TTS pipeline where perceived latency matters — users can't tell the difference.

Reliability: No spikes in error rates. My pipeline has retries built in but I've rarely needed them.

Model availability: GPT-4o, GPT-5, Claude Sonnet 4, Claude Opus 4, Gemini — all available. Full-power versions, not distilled.

Billing: Per-token, same structure as official. No subscriptions, no mystery charges.

When This Makes Sense

✅ Indie developers and side projects — most impactful when you're self-funding
✅ MVPs and pre-revenue products — cut burn rate before you have users✅ High-volume AI agents and automation — savings compound fast
✅ Multi-model prototyping — one account, all models

⚠️ Enterprise with strict data compliance — review their data processing terms first
⚠️ Latency-critical production systems — benchmark your specific use case

Bottom Line

7.15 million tokens. ¥3,538 paid. ¥8,000–¥15,000 would have gone to official APIs.

That's a $600–$1,500 difference that went into server costs, marketing, and actually shipping features instead of burning into API overhead.

Register here (free credits on signup): https://subrouter.ai/register?aff=IdWY

───

Building an AI IELTS speaking coach — happy to discuss the Whisper+GPT-4o+TTS pipeline architecture if anyone's working on something similar.

───

Top comments (0)