DEV Community

Archit Mittal
Archit Mittal

Posted on • Originally published at architmittal.com

How I Cut a Client's AI API Bill from ₹85K to ₹12K/Month — Without Losing Quality

₹85,000 per month. That was the AI API bill sitting in my client's inbox when they called me in a mild panic last quarter. They run a mid-sized e-commerce operation in Pune — about 4,000 orders a day — and had integrated AI into customer support, product descriptions, and internal reporting. The AI was working beautifully. The invoice was not.

"Archit bhai, AI toh kaam kar raha hai, lekin cost control se bahar ja raha hai." (The AI is working, but the costs are spiralling out of control.)

Three weeks later, their monthly bill was ₹12,400. Same tasks. Same quality. No corners cut. Here's exactly what changed.

The Real Problem: Every Task Was Using the Most Expensive Model

When I audited their setup, the issue was obvious within five minutes. Every single API call — whether it was classifying a customer complaint into one of 8 categories or generating a 2,000-word product description — was hitting the same premium model. It's the most common mistake I see with businesses adopting AI: they pick one model during the proof-of-concept phase and never revisit that decision as they scale.

Think of it this way. You wouldn't hire a senior chartered accountant to do data entry. But that's essentially what was happening — a top-tier reasoning model was being used to answer "Is this complaint about shipping or billing?"

Fix #1: Model Routing — The Single Biggest Cost Lever

Model routing is the practice of sending each task to the cheapest model that can handle it at acceptable quality. I categorized their ~47 distinct API call types into three tiers.

Tier Task Types Model Class Cost Impact
Simple Classification, extraction, yes/no decisions, formatting Lightweight (Haiku-class) ~90% cheaper per call
Medium Summarization, short-form content, template-based responses Mid-tier (Sonnet-class) ~50% cheaper per call
Complex Long product descriptions, nuanced customer responses, analytics reports Premium (Opus-class) Full price — but only 12% of calls

The result? 68% of their API calls moved to the lightweight tier, 20% to mid-tier, and only 12% stayed on premium. That single change dropped the bill from ₹85K to roughly ₹38K. No quality degradation — we ran A/B tests on customer satisfaction scores for two weeks before fully switching.

Fix #2: Prompt Caching — Stop Paying for the Same Context Twice

Their customer support bot sent the same 1,200-token system prompt with every single API call. That's company policies, tone guidelines, product catalog context — all identical across thousands of daily calls. Every call was paying full input token pricing for information the model had already processed moments ago.

Prompt caching solves this. The first call processes the full system prompt, and subsequent calls within the cache window reference it at a fraction of the cost. For their volume — around 6,000 support interactions per day — this alone saved ₹8,000-10,000 monthly.

Quick math: 6,000 calls/day x 1,200 tokens x 30 days = 216 million input tokens/month just on repeated system prompts. At standard pricing, that's a significant chunk of the bill that prompt caching nearly eliminates.

Fix #3: Batching Non-Urgent Requests

Not everything needs a real-time response. Their internal reporting pipeline — daily sales summaries, inventory alerts, marketing performance digests — was making individual API calls as each data point came in. Sixty to eighty calls that could easily be batched into three or four.

We restructured their reporting to collect data throughout the day and process it in batch windows — once at 6 AM, once at 2 PM, once at 10 PM. Batch API pricing is typically 50% cheaper than real-time, and for internal reports, a few hours of delay is completely acceptable.

"Pehle har choti cheez ke liye alag call jaati thi. Ab ek baar mein sab ho jaata hai." (Earlier, every small thing triggered a separate call. Now everything processes at once.)

Fix #4: Output Token Discipline

This one is subtle but adds up fast. Their product description prompts asked the model to "write a detailed, comprehensive product description." The model happily obliged — averaging 800-1,000 tokens per response when the actual requirement was 200-300 tokens for their product cards.

We rewrote prompts with explicit length constraints and structured output formats. Instead of open-ended generation, the model received exact specifications: "Write a product description in exactly 3 sentences. First sentence: what it is. Second: key benefit. Third: who it's for."

Output tokens are more expensive than input tokens on most providers. Cutting average output length by 60% across thousands of daily calls compounded into real savings.

The Final Numbers

Optimization Monthly Savings
Model routing ₹47,000
Prompt caching ₹9,500
Request batching ₹8,200
Output token discipline ₹7,900
Total monthly bill ₹12,400 (down from ₹85,000)

That's an 85% reduction. The AI does exactly the same work. The customer satisfaction score actually went up by 3% — likely because the lighter models respond faster, and customers prefer quicker replies over marginally more eloquent ones.

What Most People Get Wrong About AI Costs

The instinct is to shop for a cheaper provider. "Should I switch from OpenAI to Claude? From Claude to an open-source model?" Sometimes that helps, but the real leverage is architectural. I've seen businesses switch providers three times and still overpay because the fundamental pattern — one model for everything, no caching, verbose outputs — never changes.

If your AI API bill is higher than you'd like, start with these questions: How many of your API calls actually need a premium model? Are you sending the same context repeatedly? Can any of your calls be batched? Are your prompts asking for more output than you use?

The answers usually reveal that 60-80% of your bill is waste hiding in plain sight. You don't need to spend less on AI. You need to spend smarter.


FAQ

How much do AI API calls typically cost for a small business in India?
Most small businesses using AI APIs spend between ₹15,000 and ₹1,00,000 per month depending on volume and model choice. The biggest cost driver is using premium models for tasks that cheaper models handle equally well.

What is model routing and how does it reduce AI costs?
Model routing means sending each task to the cheapest AI model that can handle it well. Simple classification tasks go to lightweight models, while complex reasoning goes to premium models. This alone can cut costs by 40-60%.

Can prompt caching really save money on AI API bills?
Yes. If your application sends similar system prompts or context repeatedly, prompt caching avoids reprocessing those tokens. Depending on your use case, this can reduce input token costs by 50-90% on repeated calls.

Is it worth switching AI providers just to save costs?
Not always. The real savings come from architectural changes — model routing, caching, batching, and output token optimization — not from switching providers. That said, comparing pricing across providers for your specific use case is always smart.


Archit Mittal helps businesses automate chaos. Follow on LinkedIn: @automate-archit

Related reads:

Top comments (0)