DEV Community

Cover image for Implementing 429 retries and throttling for API rate-limits
Mang-Git Ng for Anvil

Posted on • Updated on • Originally published at useanvil.com

Implementing 429 retries and throttling for API rate-limits

Learn how to handle 429 Too Many Requests responses when consuming 3rd party APIs.

Most APIs in the wild implement rate-limits. They say "you can only make X number of requests in Y seconds". If you exceed the specified rate-limits, their servers will reject your requests for a period of time, basically saying, "sorry we didn't process your request, please try again in 10 seconds."

Many language-specific SDKs and clients, even from major API providers, don't come with built-in rate-limit handling. For example, Dropbox's node client does not implement throttling.

Some companies provide an external module like GitHub's plugin-throttling package for their node clients. But often it's up to you to implement.

These rate-limits can be annoying to deal with, especially if you're working with a restrictive sandbox and trying to get something up and running quickly.

Handling these in an efficient manner is more complex than it seems. This post will walk through a number of different implementations and the pros and cons of each. We'll finish with an example script you can use to run benchmarks against the API of your choice. All examples will be in vanilla JavaScript.

Quick and dirty ⏱️

Maybe you just want to get something working quickly without error. The easiest way around a rate-limit is to delay requests so they fit within the specified window.

For example if an API allowed 6 requests over 3 seconds, the API will allow a request every 500ms and not fail (3000 / 6 = 500).

for (const item of items) {
  await callTheAPI(item)
  await sleep(500) // HACK!
}
Enter fullscreen mode Exit fullscreen mode

Where sleep is:

function sleep (milliseconds) {
  return new Promise((resolve) => setTimeout(resolve, milliseconds))
}
Enter fullscreen mode Exit fullscreen mode

This is poor practice! It still could error if you are on the edge of the time window, and it can't handle legitimate bursts. What if you only need to make 6 requests? The code above will take 3 seconds, but the API allows doing all 6 in parallel, which will be significantly faster.

The sleep approach is fine for hobby projects, quick scripts, etc—I admit I've used it in local script situations. But you probably want to keep it out of your production code.

There are better ways!

The dream

The ideal solution hides the details of the API's limits from the developer. I don't want to think about how many requests I can make, just make all the requests efficiently and tell me the results.

My ideal in JavaScript:

const responses = await Promise.all(items.map((item) => (
  callTheAPI(item)
)))
Enter fullscreen mode Exit fullscreen mode

As an API consumer, I also want all my requests to finish as fast as they can within the bounds of the rate-limits.

Assuming 10 requests at the previous example limits of 6 requests over 3 seconds, what is the theoretical limit? Let's also assume the API can make all 6 requests in parallel, and a single request takes 200ms

  • The first 6 requests should complete in 200ms, but need to take 3 seconds because of the API's rate-limit
  • The last 4 requests should start at the 3 second mark, and only take 200ms
  • Theoretical Total: 3200ms or 3.2 seconds

Ok, let's see how close we can get.

Handling the error response

The first thing we need to nail down is how to handle the error responses when the API limits are exceeded.

If you exceed an API provider's rate-limit, their server should respond with a 429 status code (Too Many Requests) and a Retry-After header.

429
Retry-After: 5
Enter fullscreen mode Exit fullscreen mode

The Retry-After header may be either in seconds to wait or a date when the rate-limit is lifted.

The header's date format is not an ISO 8601 date, but an 'HTTP date' format:

<day-name>, <day> <month> <year> <hour>:<minute>:<second> GMT
Enter fullscreen mode Exit fullscreen mode

An example:

Mon, 29 Mar 2021 04:58:00 GMT
Enter fullscreen mode Exit fullscreen mode

Fortunately if you are a JavaScript / Node user, this format is parsable by passing it to the Date constructor.

Here's a function that parses both formats in JavaScript:

function getMillisToSleep (retryHeaderString) {
  let millisToSleep = Math.round(parseFloat(retryHeaderString) * 1000)
  if (isNaN(millisToSleep)) {
    millisToSleep = Math.max(0, new Date(retryHeaderString) - new Date())
  }
  return millisToSleep
}

getMillisToSleep('4') // => 4000
getMillisToSleep('Mon, 29 Mar 2021 04:58:00 GMT') // => 4000
Enter fullscreen mode Exit fullscreen mode

Now we can build out a function that uses the Retry-After header to retry when we encounter a 429 HTTP status code:

async function fetchAndRetryIfNecessary (callAPIFn) {
  const response = await callAPIFn()
  if (response.status === 429) {
    const retryAfter = response.headers.get('retry-after')
    const millisToSleep = getMillisToSleep(retryAfter)
    await sleep(millisToSleep)
    return fetchAndRetryIfNecessary(callAPIFn)
  }
  return response
}
Enter fullscreen mode Exit fullscreen mode

This function will continue to retry until it no longer gets a 429 status code.

// Usage
const response = await fetchAndRetryIfNecessary (async () => (
  await fetch(apiURL, requestOptions)
))
console.log(response.status) // => 200
Enter fullscreen mode Exit fullscreen mode

Now we're ready to make some requests!

Setup

I'm working with a local API and running 10 and 20 requests with the same example limits from above: 6 requests over 3 seconds.

The best theoretical performance we can expect with these parameters is:

  • 10 requests: 3.2 seconds
  • 20 requests: 9.2 seconds

Let's see how close we can get!

Baseline: sleep between requests

Remember the "quick and dirty" request method we talked about at the beginning? We'll use its behavior and timing as a baseline to improve on.

A reminder:

const items = [...10 items...]
for (const item of items) {
  await callTheAPI(item)
  await sleep(3000 / 6)
}
Enter fullscreen mode Exit fullscreen mode

So how does it perform?

  • With 10 requests: about 7 seconds
  • With 20 requests: about 14 seconds

Our theoretical time for serial requests is 5 seconds at 10 requests, and 10 seconds for 20 requests, but there is some overhead for each request, so the real times are a little higher.

Here's a 10 request pass:

⏱️ Running Benchmark Sleep between requests, no retry
Request Start: 0 attempt:0 2021-03-29T00:53:09.629Z
Request End:   0 attempt:0 200 344ms
Request Start: 1 attempt:0 2021-03-29T00:53:10.479Z
Request End:   1 attempt:0 200 252ms
Request Start: 2 attempt:0 2021-03-29T00:53:11.236Z
Request End:   2 attempt:0 200 170ms
Request Start: 3 attempt:0 2021-03-29T00:53:11.910Z
Request End:   3 attempt:0 200 174ms
Request Start: 4 attempt:0 2021-03-29T00:53:12.585Z
Request End:   4 attempt:0 200 189ms
Request Start: 5 attempt:0 2021-03-29T00:53:13.275Z
Request End:   5 attempt:0 200 226ms
Request Start: 6 attempt:0 2021-03-29T00:53:14.005Z
Request End:   6 attempt:0 200 168ms
Request Start: 7 attempt:0 2021-03-29T00:53:14.675Z
Request End:   7 attempt:0 200 195ms
Request Start: 8 attempt:0 2021-03-29T00:53:15.375Z
Request End:   8 attempt:0 200 218ms
Request Start: 9 attempt:0 2021-03-29T00:53:16.096Z
Request End:   9 attempt:0 200 168ms
✅ Total Sleep between requests, no retry: 7136ms
Enter fullscreen mode Exit fullscreen mode

Approach 1: serial with no sleep

Now we have a function for handling the error and retrying, let's try removing the sleep call from the baseline.

const items = [...10 items...]
for (const item of items) {
  await fetchAndRetryIfNecessary(() => callTheAPI(item))
}
Enter fullscreen mode Exit fullscreen mode

Looks like about 4.7 seconds, definitely an improvement, but not quite at the theoretical level of 3.2 seconds.

⏱️ Running Benchmark Serial with no limits
Request Start: 0 attempt:0 2021-03-29T00:59:01.118Z
Request End:   0 attempt:0 200 327ms
Request Start: 1 attempt:0 2021-03-29T00:59:01.445Z
Request End:   1 attempt:0 200 189ms
Request Start: 2 attempt:0 2021-03-29T00:59:01.634Z
Request End:   2 attempt:0 200 194ms
Request Start: 3 attempt:0 2021-03-29T00:59:01.828Z
Request End:   3 attempt:0 200 177ms
Request Start: 4 attempt:0 2021-03-29T00:59:02.005Z
Request End:   4 attempt:0 200 179ms
Request Start: 5 attempt:0 2021-03-29T00:59:02.185Z
Request End:   5 attempt:0 200 196ms
Request Start: 6 attempt:0 2021-03-29T00:59:02.381Z
Request End:   6 attempt:0 429 10ms
❗ Retrying:   6 attempt:1 at Mon, 29 Mar 2021 00:59:05 GMT sleep for 2609 ms
Request Start: 6 attempt:1 2021-03-29T00:59:05.156Z
Request End:   6 attempt:1 200 167ms
Request Start: 7 attempt:0 2021-03-29T00:59:05.323Z
Request End:   7 attempt:0 200 176ms
Request Start: 8 attempt:0 2021-03-29T00:59:05.499Z
Request End:   8 attempt:0 200 208ms
Request Start: 9 attempt:0 2021-03-29T00:59:05.707Z
Request End:   9 attempt:0 200 157ms
✅ Total Serial with no limits: 4746ms
Enter fullscreen mode Exit fullscreen mode

Approach 2: parallel with no throttling

Let’s try burning through all requests in parallel just to see what happens.

const items = [...10 items...]
const responses = await Promise.all(items.map((item) => (
  fetchAndRetryIfNecessary(() => callTheAPI(item))
)))
Enter fullscreen mode Exit fullscreen mode

This run took about 4.3 seconds. A slight improvement over the previous serial approach, but the retry is slowing us down. You can see the last 4 requests all had to retry.

⏱️ Running Benchmark Parallel with no limits
Request Start: 0 attempt:0 2021-03-29T00:55:01.463Z
Request Start: 1 attempt:0 2021-03-29T00:55:01.469Z
Request Start: 2 attempt:0 2021-03-29T00:55:01.470Z
Request Start: 3 attempt:0 2021-03-29T00:55:01.471Z
Request Start: 4 attempt:0 2021-03-29T00:55:01.471Z
Request Start: 5 attempt:0 2021-03-29T00:55:01.472Z
Request Start: 6 attempt:0 2021-03-29T00:55:01.472Z
Request Start: 7 attempt:0 2021-03-29T00:55:01.472Z
Request Start: 8 attempt:0 2021-03-29T00:55:01.472Z
Request Start: 9 attempt:0 2021-03-29T00:55:01.473Z
Request End:   5 attempt:0 429 250ms
❗ Retrying:   5 attempt:1 at Mon, 29 Mar 2021 00:55:05 GMT sleep for 3278 ms
Request End:   6 attempt:0 429 261ms
❗ Retrying:   6 attempt:1 at Mon, 29 Mar 2021 00:55:05 GMT sleep for 3267 ms
Request End:   8 attempt:0 429 261ms
❗ Retrying:   8 attempt:1 at Mon, 29 Mar 2021 00:55:05 GMT sleep for 3267 ms
Request End:   2 attempt:0 429 264ms
❗ Retrying:   2 attempt:1 at Mon, 29 Mar 2021 00:55:05 GMT sleep for 3266 ms
Request End:   1 attempt:0 200 512ms
Request End:   3 attempt:0 200 752ms
Request End:   0 attempt:0 200 766ms
Request End:   4 attempt:0 200 884ms
Request End:   7 attempt:0 200 1039ms
Request End:   9 attempt:0 200 1158ms
Request Start: 5 attempt:1 2021-03-29T00:55:05.155Z
Request Start: 6 attempt:1 2021-03-29T00:55:05.156Z
Request Start: 8 attempt:1 2021-03-29T00:55:05.157Z
Request Start: 2 attempt:1 2021-03-29T00:55:05.157Z
Request End:   2 attempt:1 200 233ms
Request End:   6 attempt:1 200 392ms
Request End:   8 attempt:1 200 513ms
Request End:   5 attempt:1 200 637ms
✅ Total Parallel with no limits: 4329ms
Enter fullscreen mode Exit fullscreen mode

This looks pretty reasonable with only 4 retries, but this approach does not scale. Retries in this scenario only get worse when there are more requests. If we had, say, 20 requests, a number of them would need to retry more than once—we'd need 4 separate 3 second windows to complete all 20 requests, so some requests would need to retry at best 3 times.

Additionally, the ratelimiter implementation my example server uses will shift the Retry-After timestamp on subsequent requests when a client is already at the limit—it returns a Retry-After timestamp based on the 6th oldest request timestamp + 3 seconds.

That means if you make more requests when you’re already at the limit, it drops old timestamps and shifts the Retry-After timestamp later. As a result, the Retry-After timestamps for some requests waiting to retry become stale. They retry but fail because their timestamps were stale. The failure triggers yet another retry, and causes the Retry-After timestamp to be pushed out even further. All this spirals into a vicious loop of mostly retries. Very bad.

Here is a shortened log of it attempting to make 20 requests. Some requests needed to retry 35 times (❗) because of the shifting window and stale Retry-After headers. It eventually finished, but took a whole minute. Bad implementation, do not use.

⏱️ Running Benchmark Parallel with no limits

...many very messy requests...

Request End:   11 attempt:32 200 260ms
Request End:   5 attempt:34 200 367ms
Request End:   6 attempt:34 200 487ms
✅ Total Parallel with no limits: 57964ms
Enter fullscreen mode Exit fullscreen mode

Approach 3: parallel with async.mapLimit

It seems like a simple solution to the problem above would be only running n number of requests in parallel at a time. For example, our demo API allows 6 requests in a time window, so just allow 6 in parallel, right? Let’s try it out.

There is a node package called async implementing this behavior (among many other things) in a function called mapLimit.

import mapLimit from 'async/mapLimit'
import asyncify from 'async/asyncify'

const items = [...10 items...]
const responses = await mapLimit(items, 6, asyncify((item) => (
  fetchAndRetryIfNecessary(() => callTheAPI(item))
)))
Enter fullscreen mode Exit fullscreen mode

After many 10-request runs, 5.5 seconds was about the best case, slower than even the serial runs.

⏱️ Running Benchmark Parallel with `async.mapLimit`
Request Start: 0 attempt:0 2021-03-29T17:20:42.144Z
Request Start: 1 attempt:0 2021-03-29T17:20:42.151Z
Request Start: 2 attempt:0 2021-03-29T17:20:42.151Z
Request Start: 3 attempt:0 2021-03-29T17:20:42.152Z
Request Start: 4 attempt:0 2021-03-29T17:20:42.152Z
Request Start: 5 attempt:0 2021-03-29T17:20:42.153Z
Request End:   1 attempt:0 200 454ms
Request Start: 6 attempt:0 2021-03-29T17:20:42.605Z
Request End:   6 attempt:0 429 11ms
❗ Retrying:   6 attempt:1 at Mon, 29 Mar 2021 17:20:47 GMT sleep for 4384 ms
Request End:   5 attempt:0 200 571ms
Request Start: 7 attempt:0 2021-03-29T17:20:42.723Z
Request End:   7 attempt:0 429 15ms
❗ Retrying:   7 attempt:1 at Mon, 29 Mar 2021 17:20:47 GMT sleep for 4262 ms
Request End:   2 attempt:0 200 728ms
Request Start: 8 attempt:0 2021-03-29T17:20:42.879Z
Request End:   8 attempt:0 429 12ms
❗ Retrying:   8 attempt:1 at Mon, 29 Mar 2021 17:20:47 GMT sleep for 4109 ms
Request End:   4 attempt:0 200 891ms
Request Start: 9 attempt:0 2021-03-29T17:20:43.044Z
Request End:   9 attempt:0 429 12ms
❗ Retrying:   9 attempt:1 at Mon, 29 Mar 2021 17:20:47 GMT sleep for 3944 ms
Request End:   3 attempt:0 200 1039ms
Request End:   0 attempt:0 200 1163ms
Request Start: 6 attempt:1 2021-03-29T17:20:47.005Z
Request Start: 7 attempt:1 2021-03-29T17:20:47.006Z
Request Start: 8 attempt:1 2021-03-29T17:20:47.007Z
Request Start: 9 attempt:1 2021-03-29T17:20:47.007Z
Request End:   8 attempt:1 200 249ms
Request End:   9 attempt:1 200 394ms
Request End:   6 attempt:1 200 544ms
Request End:   7 attempt:1 200 671ms
✅ Total Parallel with `async.mapLimit`: 5534ms
Enter fullscreen mode Exit fullscreen mode

At 20 requests, it finished in about 16 seconds. The upside is that it does not suffer from the retry death spiral we saw in the previous parallel implementation! But it's still slow. Let's keep digging.

⏱️ Running Benchmark Parallel with `async.mapLimit`
Request Start: 0 attempt:0 2021-03-29T17:25:21.166Z
Request Start: 1 attempt:0 2021-03-29T17:25:21.173Z
Request Start: 2 attempt:0 2021-03-29T17:25:21.173Z
Request Start: 3 attempt:0 2021-03-29T17:25:21.174Z
Request Start: 4 attempt:0 2021-03-29T17:25:21.174Z
Request Start: 5 attempt:0 2021-03-29T17:25:21.174Z
Request End:   0 attempt:0 200 429ms
Request Start: 6 attempt:0 2021-03-29T17:25:21.596Z
Request End:   6 attempt:0 429 19ms
❗ Retrying:   6 attempt:1 at Mon, 29 Mar 2021 17:25:27 GMT sleep for 5385 ms
Request End:   5 attempt:0 200 539ms
Request Start: 7 attempt:0 2021-03-29T17:25:21.714Z
Request End:   7 attempt:0 429 13ms
❗ Retrying:   7 attempt:1 at Mon, 29 Mar 2021 17:25:27 GMT sleep for 5273 ms
Request End:   2 attempt:0 200 664ms
Request Start: 8 attempt:0 2021-03-29T17:25:21.837Z
Request End:   8 attempt:0 429 10ms
❗ Retrying:   8 attempt:1 at Mon, 29 Mar 2021 17:25:27 GMT sleep for 5152 ms
Request End:   1 attempt:0 200 1068ms
Request Start: 9 attempt:0 2021-03-29T17:25:22.241Z

.... more lines ....

❗ Retrying:   17 attempt:2 at Mon, 29 Mar 2021 17:25:37 GMT sleep for 3987 ms
Request Start: 19 attempt:1 2021-03-29T17:25:37.001Z
Request Start: 17 attempt:2 2021-03-29T17:25:37.002Z
Request End:   19 attempt:1 200 182ms
Request End:   17 attempt:2 200 318ms
✅ Total Parallel with `async.mapLimit`: 16154ms
Enter fullscreen mode Exit fullscreen mode

Approach 4: winning with a token bucket

So far none of the approaches have been optimal. They have all been slow, triggered many retries, or both.

The ideal scenario that would get us close to our theoretical minimum time of 3.2 seconds for 10 requests would be to only attempt 6 requests for each 3 second time window. e.g.

  1. Burst 6 requests in parallel
  2. Wait until the frame resets
  3. GOTO 1

The 429 error handling is nice and we will keep it, but we should treat it as an exceptional case as it is unnecessary work. The goal here is to make all the requests without triggering a retry under common circumstances.

Enter the token bucket algorithm. Our desired behavior is its intended purpose: you have n tokens to spend over some time window—in our case 6 tokens over 3 seconds. Once all tokens are spent, you need to wait the window duration to receive a new set of tokens.

Here is a simple implementation of a token bucket for our specific purpose. It will count up until it hits the maxRequests, any requests beyond that will wait the maxRequestWindowMS, then attempt to acquire the token again.

class TokenBucketRateLimiter {
  constructor ({ maxRequests, maxRequestWindowMS }) {
    this.maxRequests = maxRequests
    this.maxRequestWindowMS = maxRequestWindowMS
    this.reset()
  }

  reset () {
    this.count = 0
    this.resetTimeout = null
  }

  scheduleReset () {
    // Only the first token in the set triggers the resetTimeout
    if (!this.resetTimeout) {
      this.resetTimeout = setTimeout(() => (
        this.reset()
      ), this.maxRequestWindowMS)
    }
  }

  async acquireToken (fn) {
    this.scheduleReset()

    if (this.count === this.maxRequests) {
      await sleep(this.maxRequestWindowMS)
      return this.acquireToken(fn)
    }

    this.count += 1
    await nextTick()
    return fn()
  }
}
Enter fullscreen mode Exit fullscreen mode

Let's try it out!

const items = [...10 items...]
const tokenBucket = new TokenBucketRateLimiter({
  maxRequests: 6,
  maxRequestWindowMS: 3000
})
const promises = items.map((item) => (
  fetchAndRetryIfNecessary(() => (
    tokenBucket.acquireToken(() => callTheAPI(item))
  ))
))
const responses = await Promise.all(promises)
Enter fullscreen mode Exit fullscreen mode

With 10 requests it's about 4 seconds. The best so far, and with no retries!

⏱️ Running Benchmark Parallel with a token bucket
Request Start: 0 attempt:0 2021-03-29T01:14:17.700Z
Request Start: 1 attempt:0 2021-03-29T01:14:17.707Z
Request Start: 2 attempt:0 2021-03-29T01:14:17.708Z
Request Start: 3 attempt:0 2021-03-29T01:14:17.709Z
Request Start: 4 attempt:0 2021-03-29T01:14:17.709Z
Request Start: 5 attempt:0 2021-03-29T01:14:17.710Z
Request End:   2 attempt:0 200 301ms
Request End:   4 attempt:0 200 411ms
Request End:   5 attempt:0 200 568ms
Request End:   3 attempt:0 200 832ms
Request End:   0 attempt:0 200 844ms
Request End:   1 attempt:0 200 985ms
Request Start: 6 attempt:0 2021-03-29T01:14:20.916Z
Request Start: 7 attempt:0 2021-03-29T01:14:20.917Z
Request Start: 8 attempt:0 2021-03-29T01:14:20.918Z
Request Start: 9 attempt:0 2021-03-29T01:14:20.918Z
Request End:   8 attempt:0 200 223ms
Request End:   6 attempt:0 200 380ms
Request End:   9 attempt:0 200 522ms
Request End:   7 attempt:0 200 661ms
✅ Total Parallel with token bucket: 3992ms
Enter fullscreen mode Exit fullscreen mode

And 20 requests? It takes about 10 seconds total. The whole run is super clean with no retries. This is exactly the behavior we are looking for!

⏱️ Running Benchmark Parallel with a token bucket
Request Start: 0 attempt:0 2021-03-29T22:30:51.321Z
Request Start: 1 attempt:0 2021-03-29T22:30:51.329Z
Request Start: 2 attempt:0 2021-03-29T22:30:51.329Z
Request Start: 3 attempt:0 2021-03-29T22:30:51.330Z
Request Start: 4 attempt:0 2021-03-29T22:30:51.330Z
Request Start: 5 attempt:0 2021-03-29T22:30:51.331Z
Request End:   5 attempt:0 200 354ms
Request End:   2 attempt:0 200 507ms
Request End:   3 attempt:0 200 624ms
Request End:   4 attempt:0 200 969ms
Request End:   0 attempt:0 200 980ms
Request End:   1 attempt:0 200 973ms
Request Start: 6 attempt:0 2021-03-29T22:30:54.538Z
Request Start: 7 attempt:0 2021-03-29T22:30:54.539Z
Request Start: 8 attempt:0 2021-03-29T22:30:54.540Z
Request Start: 9 attempt:0 2021-03-29T22:30:54.541Z
Request Start: 10 attempt:0 2021-03-29T22:30:54.541Z
Request Start: 11 attempt:0 2021-03-29T22:30:54.542Z
Request End:   8 attempt:0 200 270ms
Request End:   10 attempt:0 200 396ms
Request End:   6 attempt:0 200 525ms
Request End:   7 attempt:0 200 761ms
Request End:   11 attempt:0 200 762ms
Request End:   9 attempt:0 200 870ms
Request Start: 12 attempt:0 2021-03-29T22:30:57.746Z
Request Start: 13 attempt:0 2021-03-29T22:30:57.746Z
Request Start: 14 attempt:0 2021-03-29T22:30:57.747Z
Request Start: 15 attempt:0 2021-03-29T22:30:57.748Z
Request Start: 16 attempt:0 2021-03-29T22:30:57.748Z
Request Start: 17 attempt:0 2021-03-29T22:30:57.749Z
Request End:   15 attempt:0 200 340ms
Request End:   13 attempt:0 200 461ms
Request End:   17 attempt:0 200 581ms
Request End:   16 attempt:0 200 816ms
Request End:   12 attempt:0 200 823ms
Request End:   14 attempt:0 200 962ms
Request Start: 18 attempt:0 2021-03-29T22:31:00.954Z
Request Start: 19 attempt:0 2021-03-29T22:31:00.955Z
Request End:   19 attempt:0 200 169ms
Request End:   18 attempt:0 200 294ms
✅ Total Parallel with a token bucket: 10047ms
Enter fullscreen mode Exit fullscreen mode

Approach 4.1: using someone else's token bucket

The token bucket implementation above was for demonstration purposes. In production, you might not want to maintain your own token bucket if you can help it.

If you're using node, there is a node module called limiter that implements token bucket behavior. The library is more general than our TokenBucketRateLimiter class above, but we can use it to achieve the exact same behavior:

import { RateLimiter } from 'limiter'
class LimiterLibraryRateLimiter {
  constructor ({ maxRequests, maxRequestWindowMS }) {
    this.maxRequests = maxRequests
    this.maxRequestWindowMS = maxRequestWindowMS
    this.limiter = new RateLimiter(this.maxRequests, this.maxRequestWindowMS, false)
  }

  async acquireToken (fn) {
    if (this.limiter.tryRemoveTokens(1)) {
      await nextTick()
      return fn()
    } else {
      await sleep(this.maxRequestWindowMS)
      return this.acquireToken(fn)
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Usage is exactly the same as the previous example, just swap LimiterLibraryRateLimiter in place of TokenBucketRateLimiter:

const items = [...10 items...]
const rateLimiter = new LimiterLibraryRateLimiter({
  maxRequests: 6,
  maxRequestWindowMS: 3000
})
const promises = items.map((item) => (
  fetchAndRetryIfNecessary(() => (
    rateLimiter.acquireToken(() => callTheAPI(item))
  ))
))
const responses = await Promise.all(promises)
Enter fullscreen mode Exit fullscreen mode

Other considerations

With the token bucket in the two approaches above, we have a workable solution for consuming APIs with rate-limits in production. Depending on your architecture there may be some other considerations.

Success rate-limit headers

APIs with rate-limits often return rate-limit headers on a successful request. e.g.

HTTP: 200
X-Ratelimit-Limit: 40         # Number of total requests in the window
X-Ratelimit-Remaining: 30     # Number of remaining requests in the window
X-Ratelimit-Reset: 1617054237 # Seconds since epoch til reset of the window
Enter fullscreen mode Exit fullscreen mode

The header names are convention at the time of writing, but many APIs use the headers specified above.

You could run your token bucket with the value from these headers rather than keep state in your API client.

Throttling in a distributed system

If you have multiple nodes making requests to a rate-limited API, storing the token bucket state locally on a single node will not work. A couple options to minimize the number of retries might be:

  • X-Ratelimit headers: Using the headers described above
  • Shared state: You could keep the token bucket state in something available to all nodes like redis

Verdict: use a token bucket

Hopefully it's clear that using a token bucket is the best way to implement API throttling. Overall this implementation is clean, scalable, and about as fast as we can go without triggering retries. And if there is a retry? You’re covered by the 429 Too Many Requests handling discussed in the beginning.

Even if you don't use JavaScript, the ideas discussed here are transferable to any language. Feel free to re-implement the TokenBucketRateLimiter above in your favorite language if you can’t find a suitable alternative!

Note: check out the example script I used to run these benchmarks. You should be able to use it against your own API by putting your request code into the callTheAPI function.

If you have questions, please do not hesitate to contact us at: developers@useanvil.com.

Top comments (0)