Although Javascript is designed to be single threaded, you could still do things concurrently.
For example, we can read multiple files concurrently.
const readFile = require('util').promisify(require('fs').readFile);
const readAllFiles = async (paths) => {
return await Promise.all(paths.map(p => readFile(p, 'utf8')));
}
However, reading files could be quite computationally expensive; if there are more than 10k paths, you will probably hear the fans on your machine speed up as your machine struggles. Your node server/program will respond significantly slower too as there are 10k+ file reading operations in the OS's thread-pool competing with the node server.
The solution is simple. Simply limit the number of file reading operations in the thread-pool. In another words, limit the number of concurrent calls to readFile
.
Let's define a generic function asyncLimit(fn, n)
which will return a function that does exactly what fn
does, but with the number of concurrent calls to fn
limited to n
. We will assume fn
returns a Promise
.
const asyncLimit = (fn, n) => {
return function (...args) {
return fn.apply(this, args);
};
};
Since we know that asyncLimit
returns a function that does whatever fn
does, we first write this out. Note that we don't use arrow function as fn
might need the binding to this
. Arrow function does not have it's own binding.
If you are not familiar with this
in Javascript, read my article explaining what is this
later. For now, just ignore it.
const asyncLimit = (fn, n) => {
let pendingPromises = [];
return function (...args) {
const p = fn.apply(this, args);
pendingPromises.push(p);
return p;
};
};
Since fn
returns a Promise
, we could keep track of the "process" of each call by keeping the promises they returns. We keep those promises in the list pendingPromises
.
const asyncLimit = (fn, n) => {
let pendingPromises = [];
return async function (...args) {
if (pendingPromises.length >= n) {
await Promise.race(pendingPromises);
}
const p = fn.apply(this, args);
pendingPromises.push(p);
return p;
};
};
We mark our returning function as async
, this enables us to use await
in the function. We only want to execute fn
only if there are less than n
concurrent calls going on. pendingPromises
contains all previous promises. So we can just check the pendingPromises.length
to find out how many concurrent calls there are.
If pendingPromises.length >= n
, we will need to wait until one of the pendingPromises
finishes before executing. So we added await Promise.race(pendingPromises)
.
const asyncLimit = (fn, n) => {
let pendingPromises = [];
return async function (...args) {
if (pendingPromises.length >= n) {
await Promise.race(pendingPromises);
}
const p = fn.apply(this, args);
pendingPromises.push(p);
await p;
pendingPromises = pendingPromises.filter(pending => pending !== p);
return p;
};
};
We want to get rid of the promise in the pendingPromises
once they are finished. First we execute fn
, and it returns p
. Then we add p
to the pendingPromises
. After this, we can do await p
; p
will be finished after this line. So we simply filter
out p
from pendingPromises
.
We are almost done. Let's recap what we are doing here:
if pendingPromises.length < n
- we call
fn
and obtain the promisep
- push
p
ontopendingPromises
- wait
p
to finish - remove
p
frompendingPromises
- return p
if pendingPromises.length >= n
, we will wait until one of the pendingPromises
resolves/rejects before doing the above.
There is one problem with our code tho. Let's consider the following:
const f = limitAsync(someFunction, 1);
f(); // 1st call, someFunction returns promise p1
f(); // 2nd call, someFunction returns promise p2
f(); // 3rd call, someFunction returns promise p3
The first call goes perfectly and pendingPromises.length
becomes 1
.
Since pendingPromises.length >= 1
, we know that both 2nd and 3rd call will be calling await Promise.race([p1])
. This means that when p1
finishes, both 2nd and 3rd calls will both get notified and executes someFunction
concurrently.
Put it simple, our code does not make the 3rd call to wait until the 2nd call has finished!
We know that 2nd call will get notified first and resumes from await Promise.race([p1])
. 2nd call executes someFunction
and pushes its promise to pendingPromises
, then it will do await p
.
As 2nd call does await p
, 3rd call will resume from await Promise.race([p1])
. And here is where the problem is. The current implementation allow the 3rd call to execute someFunction
and blah blah blah that follows.
But what we want is that the 3rd call would check pendingPromises.length >= n
again and do await Promise.race([p2])
. To do this, we could simply change if
to while
.
So the final code would be:
const asyncLimit = (fn, n) => {
let pendingPromises = [];
return async function (...args) {
while (pendingPromises.length >= n) {
await Promise.race(pendingPromises).catch(() => {});
}
const p = fn.apply(this, args);
pendingPromises.push(p);
await p.catch(() => {});
pendingPromises = pendingPromises.filter(pending => pending !== p);
return p;
};
};
Notice that I have added .catch(() => {})
to the Promise.race
and await p
. This is because we don't care if the promise resolves or rejects, we just wanna know if they are finished.
I have publish this to npm if you wish to use. Here is the github link if you want to see how I added tests for this function.
What do you think? Did you follow the tutorial?
EDIT:
- removed
async
forasyncLimit
. Thanks to @benjaminblack
Top comments (14)
I implemented this whole thing with a working code, let me know if I am doing something incorrectly.
Note: I am using setTimeout for async calls so promise failure will not happen.
hey, tried your code, did not work...
so, assigned the same delay for all of those...like 2 seconds... I did expect to see it process 2, schedule + 2; so, the result would be observable every 2 seconds and everytime 2 task
using the corrections from @kusha and the data and functino of your code it did work as expected!
hurray!
exceptional implementation to deal with REST API
Can you post your working snippet here?
hey I'm terrible with markdown, but the example bellow is in a lib, basically it does limit the number of active connections to a http API rest service to the a defined number (8 in this case), so a the http request is resolved, it start another connection keeping always 8 active connections to the API server; it will completely hide the connection/handshake delay while guarantee no 429 error (too many requests); from my experience fastest safe approach as you can know the maximum number of calls per second of the API service
Great article! Some minor improvements.
This is nice! 👍👍👍
Thank you so much for this post, learn a lot from this brilliant idea, also I added types (typescript) for this function in case someone needs this:
As written,
asyncLimit
will not resolve until the async function completes, because ofawait p.catch(() => {});
.Instead,
But we need the async function
fn
to complete before resolving theasync function (...args) {
. Isn't it?It's difficult to mentally follow the chain of promises here, because you have an async function (
const asyncLimit = async...
) which returns an async function (return async function
) which itself awaits at least two promises (one in a loop) before resolving.Note that
asyncLimit
does not have to be async, as it does not useawait
; removingasync
from the function signature would help comprehension a bit, since you would reduce one level of Promise-ception and stay out of limbo.I guess it doesn't really matter, because the promise returned by
asyncLimit
can be used by the calling code. But attaching afinally
clause top
instead ofawait
ing it allows the function to return immediately, instead of waiting forp
to resolve.oh, I mistyped haha! Thanks for catching this!
asyncLimit
shouldn't be an async function.No, because the function returns
p
itself, not the chained promise. The caller can attach its own.catch()
clauses top
.As in,
Thanks!! Benjamin's reply is accurate! :)