DEV Community

Abid niazi
Abid niazi

Posted on

How I Handle 150K Free API Calls/Month with a 5-Key Rotation System in Next.js

Most free-tier APIs give you 15 requests/minute. That's fine for a side project — until your tools start getting real traffic.

I built ToolForge, a free toolkit with 61 tools including 4 AI-powered features (summarizer, grammar checker, paraphraser, essay writer). All powered by Google Gemini's free tier.

The challenge: 15 req/min × 1 key = dead tool after 15 users.

The solution: rotate across 5 API keys with automatic failover.

The Problem

Gemini's free tier is generous — 15 requests per minute, 1,500 per day per key. But a single key means:

  • User #16 in any given minute gets a 429 error
  • Your "free AI tool" stops working during peak hours
  • Users lose trust and never come back

I needed a system that:

  1. Distributes requests across multiple keys
  2. Automatically skips rate-limited keys
  3. Never shows a 429 to the end user
  4. Works without any external state management

The Solution: Round-Robin with Failover

// src/lib/gemini-keys.ts (simplified)

const API_KEYS = [
  process.env.GEMINI_API_KEY_1,
  process.env.GEMINI_API_KEY_2,
  process.env.GEMINI_API_KEY_3,
  process.env.GEMINI_API_KEY_4,
  process.env.GEMINI_API_KEY_5,
].filter(Boolean) as string[];

let currentKeyIndex = 0;

export async function fetchWithKeyRotation(
  prompt: string,
  model: string = 'gemini-2.5-flash-lite'
) {
  const maxAttempts = API_KEYS.length;

  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    const key = API_KEYS[currentKeyIndex];

    try {
      const response = await fetch(
        `https://generativelanguage.googleapis.com/v1beta/models/${model}:generateContent`,
        {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
            'x-goog-api-key': key,
          },
          body: JSON.stringify({
            contents: [{ parts: [{ text: prompt }] }],
          }),
        }
      );

      if (response.status === 429) {
        // Rate limited — rotate to next key
        currentKeyIndex = (currentKeyIndex + 1) % API_KEYS.length;
        continue;
      }

      if (!response.ok) {
        throw new Error(`API error: ${response.status}`);
      }

      // Success — rotate for next request (spread load)
      currentKeyIndex = (currentKeyIndex + 1) % API_KEYS.length;

      const data = await response.json();
      return data.candidates?.[0]?.content?.parts?.[0]?.text || '';

    } catch (error) {
      currentKeyIndex = (currentKeyIndex + 1) % API_KEYS.length;
      if (attempt === maxAttempts - 1) throw error;
    }
  }

  throw new Error('All API keys exhausted');
}
Enter fullscreen mode Exit fullscreen mode

How It Works

  1. Round-robin by default: Each request uses the next key in sequence. This spreads load evenly — key 1 handles request 1, key 2 handles request 2, etc.

  2. Automatic failover on 429: If a key is rate-limited, the loop immediately tries the next key. No delay, no retry timer.

  3. Wraps around: After key 5, it goes back to key 1. The modulo operator handles this.

  4. No external state: currentKeyIndex lives in server memory. For a single Vercel serverless function, this works perfectly. For multi-instance deployments, you'd want Redis — but for our scale, this is enough.

The Math

Keys Requests/Min Requests/Day Requests/Month
1 key 15 1,500 45,000
3 keys 45 4,500 135,000
5 keys 75 7,500 225,000

With 5 keys, we handle 75 simultaneous requests per minute and 225,000 per month — all on the free tier.

Using It in API Routes

Every AI tool calls the same function:

// src/app/api/summarize/route.ts
import { fetchWithKeyRotation } from '@/lib/gemini-keys';

export async function POST(req: Request) {
  const { text } = await req.json();

  const prompt = `Summarize the following text in 3-5 sentences:\n\n${text}`;
  const summary = await fetchWithKeyRotation(prompt);

  return Response.json({ summary });
}
Enter fullscreen mode Exit fullscreen mode

The AI Summarizer, Grammar Checker, Paraphraser, and Essay Writer all use the exact same fetchWithKeyRotation() function. Adding a new AI tool takes 10 minutes — just create a new API route with a different prompt.

Rate Limiting Users (Not Just Keys)

Keys handle API-side limits. But we also need to prevent individual users from draining all keys:

// Simple IP-based rate limit
const rateLimitMap = new Map<string, number[]>();

function isRateLimited(ip: string, maxRequests = 5, windowMs = 60000) {
  const now = Date.now();
  const timestamps = rateLimitMap.get(ip) || [];
  const recent = timestamps.filter(t => now - t < windowMs);

  if (recent.length >= maxRequests) return true;

  recent.push(now);
  rateLimitMap.set(ip, recent);
  return false;
}
Enter fullscreen mode Exit fullscreen mode

This gives each user 5 requests per minute — enough for genuine use, not enough to abuse.

Lessons Learned

  1. Free tiers are viable at scale if you architect around their limits. Don't default to paid APIs until you've exhausted creative solutions.

  2. Round-robin beats random selection. Random can hit the same key twice; round-robin guarantees even distribution.

  3. Always rotate after success too — not just on failure. This prevents one key from handling 90% of traffic while others sit idle.

  4. Monitor key usage. We log which key handled each request. This tells us if one key is consistently rate-limited (meaning traffic exceeded capacity).

Try It

All 4 AI tools are live and free at freetoolforge.org:

Each tool handles hundreds of daily users without a single paid API call.


Built by Abid Niazi — Full Stack Developer, Pakistan
ToolForge: freetoolforge.org — 61 free tools, no ads, no signup

Top comments (0)