<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mang-Git Ng</title>
    <description>The latest articles on DEV Community by Mang-Git Ng (@manggit).</description>
    <link>https://dev.to/manggit</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F599688%2Ffa4ea94b-d503-488e-918e-73c329b136c3.jpeg</url>
      <title>DEV Community: Mang-Git Ng</title>
      <link>https://dev.to/manggit</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/manggit"/>
    <language>en</language>
    <item>
      <title>Implementing 429 retries and throttling for API rate-limits</title>
      <dc:creator>Mang-Git Ng</dc:creator>
      <pubDate>Tue, 30 Mar 2021 17:20:10 +0000</pubDate>
      <link>https://dev.to/anvilfoundry/implementing-429-retries-and-throttling-for-api-rate-limits-1nne</link>
      <guid>https://dev.to/anvilfoundry/implementing-429-retries-and-throttling-for-api-rate-limits-1nne</guid>
      <description>&lt;p&gt;&lt;em&gt;Learn how to handle 429 Too Many Requests responses when consuming 3rd party APIs.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Most APIs in the wild implement rate-limits. They say "you can only make X number of requests in Y seconds". If you exceed the specified rate-limits, their servers will reject your requests for a period of time, basically saying, "sorry we didn't process your request, please try again in 10 seconds."&lt;/p&gt;

&lt;p&gt;Many language-specific SDKs and clients, even from major API providers, don't come with built-in rate-limit handling. For example, Dropbox's &lt;a href="https://github.com/dropbox/dropbox-sdk-js"&gt;node client&lt;/a&gt; does not implement throttling.&lt;/p&gt;

&lt;p&gt;Some companies provide an external module like GitHub's &lt;a href="https://github.com/octokit/plugin-throttling.js"&gt;plugin-throttling package&lt;/a&gt; for their &lt;a href="https://github.com/octokit/graphql.js"&gt;node&lt;/a&gt; &lt;a href="https://github.com/octokit/rest.js/"&gt;clients&lt;/a&gt;. But often it's up to you to implement.&lt;/p&gt;

&lt;p&gt;These rate-limits can be annoying to deal with, especially if you're working with a restrictive sandbox and trying to get something up and running quickly.&lt;/p&gt;

&lt;p&gt;Handling these in an efficient manner is more complex than it seems. This post will walk through a number of different implementations and the pros and cons of each. We'll finish with an &lt;a href="https://gist.github.com/benogle/723e60aa020f85ad6adc8e7beb70e705"&gt;example script&lt;/a&gt; you can use to run benchmarks against the API of your choice. All examples will be in vanilla JavaScript.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick and dirty ⏱️
&lt;/h2&gt;

&lt;p&gt;Maybe you just want to get something working quickly without error. The easiest way around a rate-limit is to delay requests so they fit within the specified window.&lt;/p&gt;

&lt;p&gt;For example if an API allowed 6 requests over 3 seconds, the API will allow a request every 500ms and not fail (&lt;code&gt;3000 / 6 = 500&lt;/code&gt;).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;callTheAPI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// HACK!&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where &lt;code&gt;sleep&lt;/code&gt; is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nx"&gt;sleep&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;milliseconds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;milliseconds&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;This is poor practice!&lt;/em&gt; It still could error if you are on the edge of the time window, and it can't handle legitimate bursts. What if you only need to make 6 requests? The code above will take 3 seconds, but the API allows doing all 6 in parallel, which will be significantly faster.&lt;/p&gt;

&lt;p&gt;The sleep approach is fine for hobby projects, quick scripts, etc—I admit I've used it in local script situations. But you probably want to keep it out of your production code.&lt;/p&gt;

&lt;p&gt;There are better ways!&lt;/p&gt;

&lt;h2&gt;
  
  
  The dream
&lt;/h2&gt;

&lt;p&gt;The ideal solution hides the details of the API's limits from the developer. I don't want to think about how many requests I can make, just make all the requests efficiently and tell me the results.&lt;/p&gt;

&lt;p&gt;My ideal in JavaScript:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;responses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;callTheAPI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As an API consumer, I also want all my requests to finish as fast as they can within the bounds of the rate-limits.&lt;/p&gt;

&lt;p&gt;Assuming &lt;strong&gt;10&lt;/strong&gt; requests at the previous example limits of &lt;strong&gt;6&lt;/strong&gt; requests over &lt;strong&gt;3&lt;/strong&gt; seconds, what is the theoretical limit? Let's also assume the API can make all 6 requests in parallel, and a single request takes &lt;strong&gt;200ms&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The first 6 requests should complete in 200ms, but need to take 3 seconds because of the API's rate-limit&lt;/li&gt;
&lt;li&gt;The last 4 requests should start at the 3 second mark, and only take 200ms&lt;/li&gt;
&lt;li&gt;Theoretical Total: 3200ms or 3.2 seconds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ok, let's see how close we can get.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling the error response
&lt;/h2&gt;

&lt;p&gt;The first thing we need to nail down is how to handle the error responses when the API limits are exceeded.&lt;/p&gt;

&lt;p&gt;If you exceed an API provider's rate-limit, their server should respond with a &lt;code&gt;429&lt;/code&gt; status code  (&lt;code&gt;Too Many Requests&lt;/code&gt;) and a &lt;code&gt;Retry-After&lt;/code&gt; header.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;429
Retry-After: 5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;Retry-After&lt;/code&gt; header may be either in &lt;strong&gt;seconds&lt;/strong&gt; to wait or a &lt;strong&gt;date&lt;/strong&gt; when the rate-limit is lifted.&lt;/p&gt;

&lt;p&gt;The header's &lt;a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Date"&gt;date format&lt;/a&gt; is not an &lt;a href="https://en.wikipedia.org/wiki/ISO_8601"&gt;ISO 8601&lt;/a&gt; date, but an 'HTTP date' format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;day-name&amp;gt;, &amp;lt;day&amp;gt; &amp;lt;month&amp;gt; &amp;lt;year&amp;gt; &amp;lt;hour&amp;gt;:&amp;lt;minute&amp;gt;:&amp;lt;second&amp;gt; GMT
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Mon, 29 Mar 2021 04:58:00 GMT
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fortunately if you are a JavaScript / Node user, this format is parsable by passing it to the &lt;code&gt;Date&lt;/code&gt; constructor.&lt;/p&gt;

&lt;p&gt;Here's a function that parses both formats in JavaScript:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nx"&gt;getMillisToSleep&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;retryHeaderString&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;millisToSleep&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;parseFloat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;retryHeaderString&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;isNaN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;millisToSleep&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;millisToSleep&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;retryHeaderString&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;millisToSleep&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;getMillisToSleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;4&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// =&amp;gt; 4000&lt;/span&gt;
&lt;span class="nx"&gt;getMillisToSleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Mon, 29 Mar 2021 04:58:00 GMT&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// =&amp;gt; 4000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can build out a function that uses the &lt;code&gt;Retry-After&lt;/code&gt; header to retry when we encounter a &lt;code&gt;429&lt;/code&gt; HTTP status code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nx"&gt;fetchAndRetryIfNecessary&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;callAPIFn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;callAPIFn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;retryAfter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;retry-after&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;millisToSleep&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;getMillisToSleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;retryAfter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;millisToSleep&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;fetchAndRetryIfNecessary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;callAPIFn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This function will continue to retry until it no longer gets a &lt;code&gt;429&lt;/code&gt; status code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Usage&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;fetchAndRetryIfNecessary&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;apiURL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;requestOptions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// =&amp;gt; 200&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we're ready to make some requests!&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;p&gt;I'm working with a local API and running &lt;strong&gt;10&lt;/strong&gt; and &lt;strong&gt;20&lt;/strong&gt; requests with the same example limits from above: &lt;strong&gt;6&lt;/strong&gt; requests over &lt;strong&gt;3&lt;/strong&gt; seconds.&lt;/p&gt;

&lt;p&gt;The best theoretical performance we can expect with these parameters is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;10 requests: 3.2 seconds&lt;/li&gt;
&lt;li&gt;20 requests: 9.2 seconds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's see how close we can get!&lt;/p&gt;

&lt;h2&gt;
  
  
  Baseline: sleep between requests
&lt;/h2&gt;

&lt;p&gt;Remember the "quick and dirty" request method we talked about at the beginning? We'll use its behavior and timing as a baseline to improve on.&lt;/p&gt;

&lt;p&gt;A reminder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;...]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;callTheAPI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So how does it perform?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;With 10 requests: about 7 seconds&lt;/li&gt;
&lt;li&gt;With 20 requests: about 14 seconds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our theoretical time for serial requests is 5 seconds at 10 requests, and 10 seconds for 20 requests, but there is some overhead for each request, so the real times are a little higher.&lt;/p&gt;

&lt;p&gt;Here's a 10 request pass:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;⏱️ Running Benchmark Sleep between requests, no retry
Request Start: 0 attempt:0 2021-03-29T00:53:09.629Z
Request End:   0 attempt:0 200 344ms
Request Start: 1 attempt:0 2021-03-29T00:53:10.479Z
Request End:   1 attempt:0 200 252ms
Request Start: 2 attempt:0 2021-03-29T00:53:11.236Z
Request End:   2 attempt:0 200 170ms
Request Start: 3 attempt:0 2021-03-29T00:53:11.910Z
Request End:   3 attempt:0 200 174ms
Request Start: 4 attempt:0 2021-03-29T00:53:12.585Z
Request End:   4 attempt:0 200 189ms
Request Start: 5 attempt:0 2021-03-29T00:53:13.275Z
Request End:   5 attempt:0 200 226ms
Request Start: 6 attempt:0 2021-03-29T00:53:14.005Z
Request End:   6 attempt:0 200 168ms
Request Start: 7 attempt:0 2021-03-29T00:53:14.675Z
Request End:   7 attempt:0 200 195ms
Request Start: 8 attempt:0 2021-03-29T00:53:15.375Z
Request End:   8 attempt:0 200 218ms
Request Start: 9 attempt:0 2021-03-29T00:53:16.096Z
Request End:   9 attempt:0 200 168ms
✅ Total Sleep between requests, no retry: 7136ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Approach 1: serial with no sleep
&lt;/h2&gt;

&lt;p&gt;Now we have a function for handling the error and retrying, let's try removing the sleep call from the baseline.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;...]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;fetchAndRetryIfNecessary&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;callTheAPI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Looks like about 4.7 seconds, definitely an improvement, but not quite at the theoretical level of 3.2 seconds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;⏱️ Running Benchmark Serial with no limits
Request Start: 0 attempt:0 2021-03-29T00:59:01.118Z
Request End:   0 attempt:0 200 327ms
Request Start: 1 attempt:0 2021-03-29T00:59:01.445Z
Request End:   1 attempt:0 200 189ms
Request Start: 2 attempt:0 2021-03-29T00:59:01.634Z
Request End:   2 attempt:0 200 194ms
Request Start: 3 attempt:0 2021-03-29T00:59:01.828Z
Request End:   3 attempt:0 200 177ms
Request Start: 4 attempt:0 2021-03-29T00:59:02.005Z
Request End:   4 attempt:0 200 179ms
Request Start: 5 attempt:0 2021-03-29T00:59:02.185Z
Request End:   5 attempt:0 200 196ms
Request Start: 6 attempt:0 2021-03-29T00:59:02.381Z
Request End:   6 attempt:0 429 10ms
❗ Retrying:   6 attempt:1 at Mon, 29 Mar 2021 00:59:05 GMT &lt;span class="nb"&gt;sleep &lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;2609 ms
Request Start: 6 attempt:1 2021-03-29T00:59:05.156Z
Request End:   6 attempt:1 200 167ms
Request Start: 7 attempt:0 2021-03-29T00:59:05.323Z
Request End:   7 attempt:0 200 176ms
Request Start: 8 attempt:0 2021-03-29T00:59:05.499Z
Request End:   8 attempt:0 200 208ms
Request Start: 9 attempt:0 2021-03-29T00:59:05.707Z
Request End:   9 attempt:0 200 157ms
✅ Total Serial with no limits: 4746ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Approach 2: parallel with no throttling
&lt;/h2&gt;

&lt;p&gt;Let’s try burning through all requests in parallel just to see what happens.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;...]&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;responses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;fetchAndRetryIfNecessary&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;callTheAPI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This run took about 4.3 seconds. A slight improvement over the previous serial approach, but the retry is slowing us down. You can see the last 4 requests all had to retry.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;⏱️ Running Benchmark Parallel with no limits
Request Start: 0 attempt:0 2021-03-29T00:55:01.463Z
Request Start: 1 attempt:0 2021-03-29T00:55:01.469Z
Request Start: 2 attempt:0 2021-03-29T00:55:01.470Z
Request Start: 3 attempt:0 2021-03-29T00:55:01.471Z
Request Start: 4 attempt:0 2021-03-29T00:55:01.471Z
Request Start: 5 attempt:0 2021-03-29T00:55:01.472Z
Request Start: 6 attempt:0 2021-03-29T00:55:01.472Z
Request Start: 7 attempt:0 2021-03-29T00:55:01.472Z
Request Start: 8 attempt:0 2021-03-29T00:55:01.472Z
Request Start: 9 attempt:0 2021-03-29T00:55:01.473Z
Request End:   5 attempt:0 429 250ms
❗ Retrying:   5 attempt:1 at Mon, 29 Mar 2021 00:55:05 GMT &lt;span class="nb"&gt;sleep &lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;3278 ms
Request End:   6 attempt:0 429 261ms
❗ Retrying:   6 attempt:1 at Mon, 29 Mar 2021 00:55:05 GMT &lt;span class="nb"&gt;sleep &lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;3267 ms
Request End:   8 attempt:0 429 261ms
❗ Retrying:   8 attempt:1 at Mon, 29 Mar 2021 00:55:05 GMT &lt;span class="nb"&gt;sleep &lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;3267 ms
Request End:   2 attempt:0 429 264ms
❗ Retrying:   2 attempt:1 at Mon, 29 Mar 2021 00:55:05 GMT &lt;span class="nb"&gt;sleep &lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;3266 ms
Request End:   1 attempt:0 200 512ms
Request End:   3 attempt:0 200 752ms
Request End:   0 attempt:0 200 766ms
Request End:   4 attempt:0 200 884ms
Request End:   7 attempt:0 200 1039ms
Request End:   9 attempt:0 200 1158ms
Request Start: 5 attempt:1 2021-03-29T00:55:05.155Z
Request Start: 6 attempt:1 2021-03-29T00:55:05.156Z
Request Start: 8 attempt:1 2021-03-29T00:55:05.157Z
Request Start: 2 attempt:1 2021-03-29T00:55:05.157Z
Request End:   2 attempt:1 200 233ms
Request End:   6 attempt:1 200 392ms
Request End:   8 attempt:1 200 513ms
Request End:   5 attempt:1 200 637ms
✅ Total Parallel with no limits: 4329ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This looks pretty reasonable with only 4 retries, but &lt;strong&gt;this approach does not scale&lt;/strong&gt;. Retries in this scenario only get worse when there are more requests. If we had, say, 20 requests, a number of them would need to retry more than once—we'd need 4 separate 3 second windows to complete all 20 requests, so some requests would need to retry &lt;em&gt;at best&lt;/em&gt; 3 times.&lt;/p&gt;

&lt;p&gt;Additionally, the &lt;a href="https://github.com/tj/node-ratelimiter/blob/f929a523d43f5a393e626d7a685f15b77fab658d/index.js#L67-L73"&gt;ratelimiter implementation&lt;/a&gt; my example server uses will shift the &lt;code&gt;Retry-After&lt;/code&gt; timestamp on subsequent requests when a client is already at the limit—it returns a &lt;code&gt;Retry-After&lt;/code&gt; timestamp based on the 6th oldest request timestamp + 3 seconds.&lt;/p&gt;

&lt;p&gt;That means if you make more requests when you’re already at the limit, it drops old timestamps and shifts the &lt;code&gt;Retry-After&lt;/code&gt; timestamp later. As a result, the &lt;code&gt;Retry-After&lt;/code&gt; timestamps for some requests waiting to retry become stale. They retry but fail because their timestamps were stale. The failure triggers &lt;em&gt;yet another&lt;/em&gt; retry, &lt;em&gt;and&lt;/em&gt; causes the &lt;code&gt;Retry-After&lt;/code&gt; timestamp to be pushed out &lt;em&gt;even further&lt;/em&gt;. All this spirals into a vicious loop of mostly retries. Very bad.&lt;/p&gt;

&lt;p&gt;Here is a shortened log of it attempting to make 20 requests. Some requests needed to retry 35 times (❗) because of the shifting window and stale &lt;code&gt;Retry-After&lt;/code&gt; headers. It eventually finished, but took a whole minute. Bad implementation, do not use.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;⏱️ Running Benchmark Parallel with no limits

...many very messy requests...

Request End:   11 attempt:32 200 260ms
Request End:   5 attempt:34 200 367ms
Request End:   6 attempt:34 200 487ms
✅ Total Parallel with no limits: 57964ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Approach 3: parallel with async.mapLimit
&lt;/h2&gt;

&lt;p&gt;It seems like a simple solution to the problem above would be only running &lt;code&gt;n&lt;/code&gt; number of requests in parallel at a time. For example, our demo API allows 6 requests in a time window, so just allow 6 in parallel, right? Let’s try it out.&lt;/p&gt;

&lt;p&gt;There is a node package called &lt;a href="https://www.npmjs.com/package/async"&gt;async&lt;/a&gt; implementing this behavior (among many other things) in a function called &lt;code&gt;mapLimit&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;mapLimit&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;async/mapLimit&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;asyncify&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;async/asyncify&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;...]&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;responses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;mapLimit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;asyncify&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;fetchAndRetryIfNecessary&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;callTheAPI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After many 10-request runs, 5.5 seconds was about the best case, slower than even the serial runs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;⏱️ Running Benchmark Parallel with &lt;span class="sb"&gt;`&lt;/span&gt;async.mapLimit&lt;span class="sb"&gt;`&lt;/span&gt;
Request Start: 0 attempt:0 2021-03-29T17:20:42.144Z
Request Start: 1 attempt:0 2021-03-29T17:20:42.151Z
Request Start: 2 attempt:0 2021-03-29T17:20:42.151Z
Request Start: 3 attempt:0 2021-03-29T17:20:42.152Z
Request Start: 4 attempt:0 2021-03-29T17:20:42.152Z
Request Start: 5 attempt:0 2021-03-29T17:20:42.153Z
Request End:   1 attempt:0 200 454ms
Request Start: 6 attempt:0 2021-03-29T17:20:42.605Z
Request End:   6 attempt:0 429 11ms
❗ Retrying:   6 attempt:1 at Mon, 29 Mar 2021 17:20:47 GMT &lt;span class="nb"&gt;sleep &lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;4384 ms
Request End:   5 attempt:0 200 571ms
Request Start: 7 attempt:0 2021-03-29T17:20:42.723Z
Request End:   7 attempt:0 429 15ms
❗ Retrying:   7 attempt:1 at Mon, 29 Mar 2021 17:20:47 GMT &lt;span class="nb"&gt;sleep &lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;4262 ms
Request End:   2 attempt:0 200 728ms
Request Start: 8 attempt:0 2021-03-29T17:20:42.879Z
Request End:   8 attempt:0 429 12ms
❗ Retrying:   8 attempt:1 at Mon, 29 Mar 2021 17:20:47 GMT &lt;span class="nb"&gt;sleep &lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;4109 ms
Request End:   4 attempt:0 200 891ms
Request Start: 9 attempt:0 2021-03-29T17:20:43.044Z
Request End:   9 attempt:0 429 12ms
❗ Retrying:   9 attempt:1 at Mon, 29 Mar 2021 17:20:47 GMT &lt;span class="nb"&gt;sleep &lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;3944 ms
Request End:   3 attempt:0 200 1039ms
Request End:   0 attempt:0 200 1163ms
Request Start: 6 attempt:1 2021-03-29T17:20:47.005Z
Request Start: 7 attempt:1 2021-03-29T17:20:47.006Z
Request Start: 8 attempt:1 2021-03-29T17:20:47.007Z
Request Start: 9 attempt:1 2021-03-29T17:20:47.007Z
Request End:   8 attempt:1 200 249ms
Request End:   9 attempt:1 200 394ms
Request End:   6 attempt:1 200 544ms
Request End:   7 attempt:1 200 671ms
✅ Total Parallel with &lt;span class="sb"&gt;`&lt;/span&gt;async.mapLimit&lt;span class="sb"&gt;`&lt;/span&gt;: 5534ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At 20 requests, it finished in about 16 seconds. The upside is that it does not suffer from the retry death spiral we saw in the previous parallel implementation! But it's still slow. Let's keep digging.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;⏱️ Running Benchmark Parallel with &lt;span class="sb"&gt;`&lt;/span&gt;async.mapLimit&lt;span class="sb"&gt;`&lt;/span&gt;
Request Start: 0 attempt:0 2021-03-29T17:25:21.166Z
Request Start: 1 attempt:0 2021-03-29T17:25:21.173Z
Request Start: 2 attempt:0 2021-03-29T17:25:21.173Z
Request Start: 3 attempt:0 2021-03-29T17:25:21.174Z
Request Start: 4 attempt:0 2021-03-29T17:25:21.174Z
Request Start: 5 attempt:0 2021-03-29T17:25:21.174Z
Request End:   0 attempt:0 200 429ms
Request Start: 6 attempt:0 2021-03-29T17:25:21.596Z
Request End:   6 attempt:0 429 19ms
❗ Retrying:   6 attempt:1 at Mon, 29 Mar 2021 17:25:27 GMT &lt;span class="nb"&gt;sleep &lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;5385 ms
Request End:   5 attempt:0 200 539ms
Request Start: 7 attempt:0 2021-03-29T17:25:21.714Z
Request End:   7 attempt:0 429 13ms
❗ Retrying:   7 attempt:1 at Mon, 29 Mar 2021 17:25:27 GMT &lt;span class="nb"&gt;sleep &lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;5273 ms
Request End:   2 attempt:0 200 664ms
Request Start: 8 attempt:0 2021-03-29T17:25:21.837Z
Request End:   8 attempt:0 429 10ms
❗ Retrying:   8 attempt:1 at Mon, 29 Mar 2021 17:25:27 GMT &lt;span class="nb"&gt;sleep &lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;5152 ms
Request End:   1 attempt:0 200 1068ms
Request Start: 9 attempt:0 2021-03-29T17:25:22.241Z

.... more lines ....

❗ Retrying:   17 attempt:2 at Mon, 29 Mar 2021 17:25:37 GMT &lt;span class="nb"&gt;sleep &lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;3987 ms
Request Start: 19 attempt:1 2021-03-29T17:25:37.001Z
Request Start: 17 attempt:2 2021-03-29T17:25:37.002Z
Request End:   19 attempt:1 200 182ms
Request End:   17 attempt:2 200 318ms
✅ Total Parallel with &lt;span class="sb"&gt;`&lt;/span&gt;async.mapLimit&lt;span class="sb"&gt;`&lt;/span&gt;: 16154ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Approach 4: winning with a token bucket
&lt;/h2&gt;

&lt;p&gt;So far none of the approaches have been optimal. They have all been slow, triggered many retries, or both.&lt;/p&gt;

&lt;p&gt;The ideal scenario that would get us close to our theoretical minimum time of 3.2 seconds for 10 requests would be to only attempt 6 requests for each 3 second time window. e.g.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Burst 6 requests in parallel&lt;/li&gt;
&lt;li&gt;Wait until the frame resets&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GOTO&lt;/code&gt; 1&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;code&gt;429&lt;/code&gt; error handling is nice and we will keep it, but we should treat it as an exceptional case as it is unnecessary work. The goal here is to make all the requests without triggering a retry under common circumstances.&lt;/p&gt;

&lt;p&gt;Enter the &lt;a href="https://en.wikipedia.org/wiki/Token_bucket"&gt;token bucket algorithm&lt;/a&gt;. Our desired behavior is its intended purpose: you have &lt;code&gt;n&lt;/code&gt; tokens to spend over some time window—in our case 6 tokens over 3 seconds. Once all tokens are spent, you need to wait the window duration to receive a new set of tokens.&lt;/p&gt;

&lt;p&gt;Here is a simple implementation of a token bucket for our specific purpose. It will count up until it hits the &lt;code&gt;maxRequests&lt;/code&gt;, any requests beyond that will wait the &lt;code&gt;maxRequestWindowMS&lt;/code&gt;, then attempt to acquire the token again.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nx"&gt;TokenBucketRateLimiter&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;constructor&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;maxRequests&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;maxRequestWindowMS&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxRequests&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;maxRequests&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxRequestWindowMS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;maxRequestWindowMS&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;reset&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resetTimeout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;scheduleReset&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Only the first token in the set triggers the resetTimeout&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resetTimeout&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resetTimeout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
      &lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxRequestWindowMS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nx"&gt;acquireToken&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;scheduleReset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxRequests&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxRequestWindowMS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;acquireToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;nextTick&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's try it out!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;...]&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tokenBucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;TokenBucketRateLimiter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;maxRequests&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;maxRequestWindowMS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;promises&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;fetchAndRetryIfNecessary&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;tokenBucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;acquireToken&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;callTheAPI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;responses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;promises&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With 10 requests it's about 4 seconds. The best so far, and with no retries!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;⏱️ Running Benchmark Parallel with a token bucket
Request Start: 0 attempt:0 2021-03-29T01:14:17.700Z
Request Start: 1 attempt:0 2021-03-29T01:14:17.707Z
Request Start: 2 attempt:0 2021-03-29T01:14:17.708Z
Request Start: 3 attempt:0 2021-03-29T01:14:17.709Z
Request Start: 4 attempt:0 2021-03-29T01:14:17.709Z
Request Start: 5 attempt:0 2021-03-29T01:14:17.710Z
Request End:   2 attempt:0 200 301ms
Request End:   4 attempt:0 200 411ms
Request End:   5 attempt:0 200 568ms
Request End:   3 attempt:0 200 832ms
Request End:   0 attempt:0 200 844ms
Request End:   1 attempt:0 200 985ms
Request Start: 6 attempt:0 2021-03-29T01:14:20.916Z
Request Start: 7 attempt:0 2021-03-29T01:14:20.917Z
Request Start: 8 attempt:0 2021-03-29T01:14:20.918Z
Request Start: 9 attempt:0 2021-03-29T01:14:20.918Z
Request End:   8 attempt:0 200 223ms
Request End:   6 attempt:0 200 380ms
Request End:   9 attempt:0 200 522ms
Request End:   7 attempt:0 200 661ms
✅ Total Parallel with token bucket: 3992ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And 20 requests? It takes about 10 seconds total. The whole run is super clean with no retries. This is exactly the behavior we are looking for!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;⏱️ Running Benchmark Parallel with a token bucket
Request Start: 0 attempt:0 2021-03-29T22:30:51.321Z
Request Start: 1 attempt:0 2021-03-29T22:30:51.329Z
Request Start: 2 attempt:0 2021-03-29T22:30:51.329Z
Request Start: 3 attempt:0 2021-03-29T22:30:51.330Z
Request Start: 4 attempt:0 2021-03-29T22:30:51.330Z
Request Start: 5 attempt:0 2021-03-29T22:30:51.331Z
Request End:   5 attempt:0 200 354ms
Request End:   2 attempt:0 200 507ms
Request End:   3 attempt:0 200 624ms
Request End:   4 attempt:0 200 969ms
Request End:   0 attempt:0 200 980ms
Request End:   1 attempt:0 200 973ms
Request Start: 6 attempt:0 2021-03-29T22:30:54.538Z
Request Start: 7 attempt:0 2021-03-29T22:30:54.539Z
Request Start: 8 attempt:0 2021-03-29T22:30:54.540Z
Request Start: 9 attempt:0 2021-03-29T22:30:54.541Z
Request Start: 10 attempt:0 2021-03-29T22:30:54.541Z
Request Start: 11 attempt:0 2021-03-29T22:30:54.542Z
Request End:   8 attempt:0 200 270ms
Request End:   10 attempt:0 200 396ms
Request End:   6 attempt:0 200 525ms
Request End:   7 attempt:0 200 761ms
Request End:   11 attempt:0 200 762ms
Request End:   9 attempt:0 200 870ms
Request Start: 12 attempt:0 2021-03-29T22:30:57.746Z
Request Start: 13 attempt:0 2021-03-29T22:30:57.746Z
Request Start: 14 attempt:0 2021-03-29T22:30:57.747Z
Request Start: 15 attempt:0 2021-03-29T22:30:57.748Z
Request Start: 16 attempt:0 2021-03-29T22:30:57.748Z
Request Start: 17 attempt:0 2021-03-29T22:30:57.749Z
Request End:   15 attempt:0 200 340ms
Request End:   13 attempt:0 200 461ms
Request End:   17 attempt:0 200 581ms
Request End:   16 attempt:0 200 816ms
Request End:   12 attempt:0 200 823ms
Request End:   14 attempt:0 200 962ms
Request Start: 18 attempt:0 2021-03-29T22:31:00.954Z
Request Start: 19 attempt:0 2021-03-29T22:31:00.955Z
Request End:   19 attempt:0 200 169ms
Request End:   18 attempt:0 200 294ms
✅ Total Parallel with a token bucket: 10047ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Approach 4.1: using someone else's token bucket
&lt;/h2&gt;

&lt;p&gt;The token bucket implementation above was for demonstration purposes. In production, you might not want to maintain your own token bucket if you can help it.&lt;/p&gt;

&lt;p&gt;If you're using node, there is a node module called &lt;a href="https://www.npmjs.com/package/limiter"&gt;limiter&lt;/a&gt; that implements token bucket behavior. The library is more general than our &lt;code&gt;TokenBucketRateLimiter&lt;/code&gt; class above, but we can use it to achieve the exact same behavior:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;RateLimiter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;limiter&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nx"&gt;LimiterLibraryRateLimiter&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;constructor&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;maxRequests&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;maxRequestWindowMS&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxRequests&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;maxRequests&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxRequestWindowMS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;maxRequestWindowMS&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;limiter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;RateLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxRequests&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxRequestWindowMS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nx"&gt;acquireToken&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;limiter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tryRemoveTokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;nextTick&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxRequestWindowMS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;acquireToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Usage is exactly the same as the previous example, just swap &lt;code&gt;LimiterLibraryRateLimiter&lt;/code&gt; in place of &lt;code&gt;TokenBucketRateLimiter&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;...]&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rateLimiter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;LimiterLibraryRateLimiter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;maxRequests&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;maxRequestWindowMS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;promises&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;fetchAndRetryIfNecessary&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;rateLimiter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;acquireToken&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;callTheAPI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;responses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;promises&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Other considerations
&lt;/h2&gt;

&lt;p&gt;With the token bucket in the two approaches above, we have a workable solution for consuming APIs with rate-limits in production. Depending on your architecture there may be some other considerations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Success rate-limit headers
&lt;/h3&gt;

&lt;p&gt;APIs with rate-limits often return &lt;a href="https://stackoverflow.com/a/16022625"&gt;rate-limit headers&lt;/a&gt; on a successful request. e.g.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;HTTP: 200
X-Ratelimit-Limit: 40         &lt;span class="c"&gt;# Number of total requests in the window&lt;/span&gt;
X-Ratelimit-Remaining: 30     &lt;span class="c"&gt;# Number of remaining requests in the window&lt;/span&gt;
X-Ratelimit-Reset: 1617054237 &lt;span class="c"&gt;# Seconds since epoch til reset of the window&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The header names are convention &lt;a href="https://tools.ietf.org/id/draft-polli-ratelimit-headers-00.html#header-specifications"&gt;at the time of writing&lt;/a&gt;, but many APIs use the headers specified above.&lt;/p&gt;

&lt;p&gt;You could run your token bucket with the value from these headers rather than keep state in your API client.&lt;/p&gt;

&lt;h3&gt;
  
  
  Throttling in a distributed system
&lt;/h3&gt;

&lt;p&gt;If you have multiple nodes making requests to a rate-limited API, storing the token bucket state locally on a single node will not work. A couple options to minimize the number of retries might be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;X-Ratelimit headers&lt;/strong&gt;: Using the headers described above&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared state&lt;/strong&gt;: You could keep the token bucket state in something available to all nodes like &lt;code&gt;redis&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Verdict: use a token bucket
&lt;/h2&gt;

&lt;p&gt;Hopefully it's clear that using a token bucket is the best way to implement API throttling. Overall this implementation is clean, scalable, and about as fast as we can go without triggering retries. And if there is a retry? You’re covered by the &lt;code&gt;429 Too Many Requests&lt;/code&gt; handling discussed in the beginning.&lt;/p&gt;

&lt;p&gt;Even if you don't use JavaScript, the ideas discussed here are transferable to any language. Feel free to re-implement the &lt;code&gt;TokenBucketRateLimiter&lt;/code&gt; above in your favorite language if you can’t find a suitable alternative!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: check out the &lt;a href="https://gist.github.com/benogle/723e60aa020f85ad6adc8e7beb70e705"&gt;example script&lt;/a&gt; I used to run these benchmarks. You should be able to use it against your own API by putting your request code into the &lt;code&gt;callTheAPI&lt;/code&gt; function.&lt;/p&gt;

&lt;p&gt;If you have questions, please do not hesitate to contact us at: &lt;a href="mailto:developers@useanvil.com"&gt;developers@useanvil.com&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>SpectaQL - Autegenerated Documentation for GraphQL</title>
      <dc:creator>Mang-Git Ng</dc:creator>
      <pubDate>Fri, 19 Mar 2021 06:18:49 +0000</pubDate>
      <link>https://dev.to/manggit/spectaql-autegenerated-documentation-for-graphql-2ppo</link>
      <guid>https://dev.to/manggit/spectaql-autegenerated-documentation-for-graphql-2ppo</guid>
      <description>&lt;p&gt;&lt;a href="https://github.com/anvilco/spectaql"&gt;https://github.com/anvilco/spectaql&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Writing and publishing good documentation on APIs is hard. That's why Anvil created and open sourced our Documentation Generator for GraphQL.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.useanvil.com/blog/2021-03-17-autogenerate-graphql-docs-with-spectaql"&gt;Learn&lt;/a&gt; about the thinking behind this project. &lt;/p&gt;

</description>
      <category>graphql</category>
      <category>node</category>
      <category>javascript</category>
      <category>githunt</category>
    </item>
  </channel>
</rss>
