Wait, Wait, Wait ... Now Go! ⌚⏳

#node #ratelimit #webdev #bottleneck

Hello everyone,

Thank you for joining in for this article. I know the title is a bit vague, but I would like you to know that we will not be going to talk about the GO programming language, but rather, make it today a little discussion about blocking your code, because sometimes you need to do it. One of the reasons, if not the most common one, is to avoid being blocked by an API's rate limiter.

Prerequisites

Before we begin, as always we want to know what is the minimum for us to be able to start, be efficient and productive.

Prologue

Let us address the matter at hand - we want, on purpose by all means, block our code or some part(s) of it from executing sequentially in a short period of time and carry on with our day, but why is that? Well, like I mentioned in the beginning, one of the main reasons, the one that I am familiar with at least the most, is to by pass the rate limiting rule of any public facing API.

Ever encountered this 429 Too Many Requests error ? Well now we will demonstrate how to avoid it and quite elegantly if I may add.

Use case

I would like to talk about the motivation on implementing this scenario because you might not need this at all. There are certain use cases where you will have to have this implementation in your code buried somewhere inside a helper function and no body knows it but you. The one use case we are addressing for our example is rate limiting. This is a far common use case than others and I have decided to use it for building today our example. I also want to make sure and emphasis that we are not going to implement a rate limiter on our service but we are going to deal with one on a remote API which has nothing to do with us rather.

Assume that you are assigned a task to get all the information about the buses in your city and in order to achieve this information successfully you need to communicate with an external API, probably provided by the bus company or the city or what ever. The issues is that the data set is fairly large, let's assume 1,000,000 records, and you cannot get all of them in one go, so you need to basically paginate your way in getting the job done, but alas, you get the data in chunks and in the 10th try to get the next chunk you receive an error stating that you executed too many request to the server and now need to take a break. We need to understand that when this error occurs, we failed to accomplish the task because we did not retrieve all the records.

Drill down on the solutions

There are more that enough ways to solve this matter. You can argue that for example you do not need the entire data set or you can manually re-run the function from the point it failed or maybe even argue that this is not your problem because you are not responsible on something you have no control of, true story by the way, but you do realize that you add +1 to your fails counter as a developer who should be able to solve any task handed to you.

We want to talk about the solution that will guarantee us 100 per cent success on this matter, we want it to be fully automatic and no human intervention is needed and, from my stand of point the most important aspect of the matter, we take full ownership on the task the way we are accountable. Accountability is by far one of the most valued traits any employee can have that managers love and appreciate (we will leave this for another talk).

So, by now we fully understand the what, where and who but we did not yet determined the how. If you consider it a bit, we actually only need to do some 4th grade math in order to find the time period we need to wait. In Physics there is a constant called 'T' for time period and is equal to 1 second divided by the frequency.

    T = 1 / f

This equation still does not answer our question. We need to understand what we are looking for in the equation. The easy part is the 1 second, we know this is a constant and we cannot change it. Let us try and understand what the 'f' for frequency stands for. The 'f' tells us how many executions or attempts we can have in 1 second period the way that the other side of the equation will remain true at all times.

Let's see an example: Assume that we can approach the remote API 300 times in one minute. Our equation is addressing seconds so firstly we need to convert it to seconds. One minute consist of 60 second, then we divide 300 attempts in 60 and we get back 5 attempts per one second.

   // 300 attmpts per one minute

   T = 300 / 60
   T = 5

Now we want to place this value in the equation:

   // T = 1 / f

   5 = 1 / f
   f = 1 / 5
   f = 0.20
   f = 200 milliseconds

As we can see here, in order to not violate the rate limiting rule, we must have up to 5 attempts at the API per one second or wait at least 200 milliseconds between the executions. This was not that hard, but wait, JavaScript is asynchronous in nature. How will we make it run synchronously and sequentially ? The real question we are actually asking is, how do we make it that the HTTP requests to the remote API will wait the minimum time period to delay in between the executions. This is where we are going to use the tool that is called BottleNeck.

Bottleneck is a lightweight and zero-dependency Task Scheduler and Rate Limiter for Node.js and the browser.

With the help of this tool we can apply some logic, and not that complex if I may add, to solve our 3 part problem that we noted above.

I will give my 2 cents on this tool and how it works from a bird's eye view in the simplest manner I can. The tool is instantiated with the use of a constructor that receives some arguments, the instance that is created holds a number of methods for particular use case. We will need the wrap method in this example. The wrap method receives a function as an argument and returns a throttled version of it.

Let us see it in action

We will open our visual studio code and create a new folder anywhere, I will do it on my Desktop. We will open the integrated terminal, then we will create a file for our code with 'touch main.js' and initialize our project with 'npm init -y'. Last step is installing the bottleneck and axios npm packages with 'npm i -S axios bottleneck' command.

  mkdir ratelimitmycode
  cd ratelimitmycode
  touch main.js
  npm init -y
  npm i -S axios bottleneck

I will use the json placeholder API for this example and you can change it to any other url that you want to test.

When we look at what we need to code here, we basically understand that we need an HTTP client and for that reason we installed axios and we need the target url.

  // main.js

  const axios = require('axios');
  const url = 'https://jsonplaceholder.typicode.com/todos/';


  const main = async () => {

    const res = await axios.get(url);
    const data = await res.data;

  } // main


  main();

As you can see in this piece of code, there is nothing fancy in it. All we do here is fetching the entire available data set the remote API has to offer under this endpoint. We will now implement a different data fetching approach based on a particular item id and see what happens.

  // main.js

  const axios = require('axios');
  const url = 'https://jsonplaceholder.typicode.com/todos/';


  const main = async () => {

    for (let i=1; i<=200; i++) {
      const res = await axios.get(url + i);
      const data = await res.data;
      console.log(data)
    } // for

  } // main


  main();

You can see that this particular endpoint might hold 200 records and we can address them with their unique id. In this piece of code we are looping 200 times to execute the same endpoint to retrieve a different piece of data.

We also need to remember that the particular API we are addressing has no rate limit rule turned on. Let us assume that there was a rate limit rule here and we would fail after several attempts. What would we do ? Well, we already answered this question earlier. We need to calculate the rate limit quota and act by it.

We will assume the following, the API has a rate limit of 300 requests per one minute and it holds 1 million records. As we already did the math, we need to have 5 requests per one second, so in total it will take us 2 days and 7 hours approximately to complete the fetching successfully. Do not be frightened because of this long time period. We will not be fetching 1 million records to begin with and we need to also understand that there could be very long time consuming tasks.

Given this information we know now that with the current implementation we have the task will fail. We will not be able never to fetch the entire data set from the remote API. We need to change the strategy and for this we have Bottleneck to help us.

    // import the package
    const { default: Bottleneck } = require("bottleneck");


    // create the instance with a constructor
    const limiter = new Bottleneck({
        minTime: 200, //ms
        maxConcurrent: 1,
    });

    // created a throttled function
    const throttled = limiter.wrap( fn );

We see in this piece of code only configurations. We will explain what we see. first of all we import the package, then we create an instance and passing some configuration options and finally we create a throttled version of the function to act by the rate limit.

We look at the configurations options and understand what we see

minTime property should hold the minimal time period we need to wait between executions and it is in milliseconds. By default it is set to 0.
maxConcurrent property holds the number of jobs can be executed in the same time. By default it is null, and it should not be null ever, you always must control this value. This property exist to make sure that in case we have one job or execution that is longer the minTime value we set, it will not allow more jobs to start on different threads because it can break all of our logic and math.

Once we constructed our instance we want to use the wrap method on our function that is responsible to get the data, we pass that function as an argument and get a new throttled function.

Let us see an implementation with the latest code snippet from earlier. We will attempt to get 200 records and see how long it takes us with the configurations we set.

  // main.js

  const { default: Bottleneck } = require("bottleneck");
  const axios = require('axios');
  const url = 'https://jsonplaceholder.typicode.com/todos/';

  const limiter = new Bottleneck({
        minTime: 200, //ms
        maxConcurrent: 1,
  });


  const main = async () => {

    const requests = [];

    for (let i=1; i<=200; i++) {
      const throttled = limiter.wrap( axios.get(url + i) );
      requests.push( throttled );
    } // for


    console.log( Date.now() );
    await Promise.all( requests );
    console.log( Date.now() );

    /* rest of your code goes here */

  } // main


  main();

You can see that there is a bit going on here. First of all we create a throttled function on the fly inside the loop with a particular argument and push it into an array of throttled functions. We then await on the requests array with Promise.all to run all requests in the same time so we would receive one response. The responses are ordered by the requests in the responses array. We print in the terminal the timestamps before and after the promises resolves and we will see what is the time difference in milliseconds. According to our math we should get a roughly 40 seconds in total to get 200 records, this may vary with your hardware and link and add couple more seconds to the total. Once you have the data you can do what ever you want with it and it has no more need for the remote API until the next time.

Pay attention that we do not await on the wrapped function. I want to say also that we do not need to this on the fly, but rather to create a function that is responsible to create the Http request object with axios and pass it to the 'wrap' method, but we will still need to pass the argument to the throttled function inside the loop. I chose to do it like this because it is a simple implementation, but in case we have a more complex logic then definitely we will create a helper function.

I hope this was very informative for you and would become very useful in the future. I also recommend reading the documentation of Bottleneck, it has more to offer than what we'd cover in this article.

In a personal note i would really appreciate if you could provide some feedback on what you are reading, this would help me a lot. I am talking about my English skills, or something that i missed to address in the article itself, whatever you find can be very valuable for me to improve.

Stay tuned for next
Like, subscribe, comment and whatever ...
Thank you & Goodbye