DEV Community

Vhub Systems
Vhub Systems

Posted on

How to Handle API Rate Limits Before They Break Your Production Integration

You didn't hit an API rate limit because you missed the docs — you hit it because the free tier worked perfectly in testing, and nobody told you about the burst limit you'd blow through on your first real batch.

The Problem

The integration looked perfect. You built it against the sandbox, validated every edge case, and shipped it to production. Then real users arrived.

Within hours, your logs filled with 429 errors. Jobs failed silently. User-facing features went dark. Your support inbox started filling up while you were asleep.

This is the exact moment thousands of developers describe in the same words: "I built the whole thing against the free tier and it worked perfectly — then in production with real data it started throwing 429s everywhere and I had no retry logic."

The free tier worked. The paid tier broke. That's not a coincidence.

Why It Happens

Most API documentation shows a headline rate limit — calls per minute, requests per day. What it doesn't show prominently:

  • Burst limits vs. sustained limits. You can make requests in a burst window, but sustained throughput is lower. Free tier testing never surfaces this because your test data volume doesn't trigger burst windows.
  • Per-endpoint limits. The global limit applies to one endpoint family; a specific search or write endpoint has a separate, lower cap. You discover this only after fixing the global limit.
  • Per-user or per-key limits on paid tiers. Your paid account has a pool limit, but individual API keys within that account can have their own sub-limits.

None of this appears in the summary table you read during development. It's scattered across SDK changelogs, community forum threads, and footnotes you didn't find because your free-tier integration never needed them.

The second failure mode is no graceful degradation. If your integration doesn't handle 429 responses, it doesn't queue, backoff, or fallback — it just fails and moves on. "My integration just fails silently when it hits the rate limit — there's no fallback, no queue, no retry — it just drops the job."

The Defense Pattern

Three layers protect you from production rate limit failures:

1. Exponential Backoff with Jitter

When you receive a 429, don't retry immediately and don't sleep for a fixed interval. Use exponential backoff:

async function withBackoff(fn, maxRetries = 5) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err) {
      if (err.status !== 429 || attempt === maxRetries - 1) throw err;

      const retryAfter = err.headers?.['retry-after'];
      const delay = retryAfter
        ? parseInt(retryAfter) * 1000
        : Math.min(1000 * 2 ** attempt + Math.random() * 500, 60000);

      await new Promise(r => setTimeout(r, delay));
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The jitter prevents thundering herd — when all your queued jobs retry at the same second after a rate limit window resets.

2. Queue-Based Request Throttling

A retry loop handles individual failures. A request queue handles sustained throughput:

import PQueue from 'p-queue';

const queue = new PQueue({
  interval: 60000,
  intervalCap: 50, // adjust to your API's sustained limit
  carryoverConcurrencyCount: true,
});

const result = await queue.add(() => apiClient.getData(params));
Enter fullscreen mode Exit fullscreen mode

If your integration hits rate limits, adjust intervalCap downward until 429s stop. The queue absorbs bursty request patterns without losing jobs.

3. Cached Fallback Responses

For read-heavy integrations, cache the last successful response. If retries are exhausted, return stale data rather than an error — stale data is better than a broken feature for most read operations.

Use Apify to Audit Rate Limits Before You Build

The root cause of most production rate limit incidents is an incomplete pre-build audit. Developers read the headline rate limit table and stop. The per-endpoint caps, burst windows, and Retry-After header format are buried in SDK documentation pages and changelog entries.

You can use Apify to scrape the full API documentation before you build — pulling every rate limit reference across documentation pages into a single dataset:

import { Actor } from 'apify';

await Actor.init();

const input = {
  startUrls: [{ url: 'https://docs.example-api.com/rate-limits' }],
  linkSelector: 'a[href*="rate"], a[href*="limit"], a[href*="quota"]',
  pageFunction: async ({ page }) => {
    const rateLimitSections = await page.evaluate(() => {
      const elements = document.querySelectorAll('table, .rate-limits, [class*="quota"]');
      return Array.from(elements).map(el => el.innerText);
    });
    return { url: page.url(), rateLimitSections };
  },
};

const run = await Actor.call('apify/web-scraper', input);
const dataset = await Actor.openDataset(run.defaultDatasetId);
const items = await dataset.getData();

console.log(JSON.stringify(items.items, null, 2));
await Actor.exit();
Enter fullscreen mode Exit fullscreen mode

Run this before you write a single line of integration code. The output gives you a complete map of every rate limit reference in the documentation — burst limits, per-endpoint limits, quota headers, and Retry-After formats. Apify's free tier covers this documentation scrape at $0.

The alternative is discovering these limits in production, at 2am, under customer-facing load.

Monitoring: Catch It Before Your Users Do

Three metrics to track once you ship:

429 rate per minute. A healthy integration has near-zero 429s under normal operation. If 429s exceed a small percentage of total requests in any 5-minute window, something is wrong — alert before it compounds.

Rate limit remaining header. Most APIs return headers like X-RateLimit-Remaining. Log this value with every response. If it approaches zero before your request volume justifies it, you have a secondary limit you haven't mapped.

Daily quota pre-alert at 80%. Alert when you've consumed 80% of the daily limit. That gives you enough runway to throttle requests before exhaustion — rather than discovering the limit when the day's remaining work is already queued.

The Real Cost of Skipping This

  • Pre-build rate limit audit: $0 (an afternoon and Apify's free tier)
  • Exponential backoff implementation: 30–60 minutes
  • Production incident response: 4–12 hours of engineering time, customer-facing failure window, possible SLA breach

"I hit the wall at scale with no graceful degradation path — everything that relied on that API call just broke at the same time."

Build the queue before you ship. Run the documentation scrape before you build. Set the monitoring alert before you go live.

The free tier worked because it couldn't show you the real limits. Now you can find them before your users do.

Top comments (0)