DEV Community

Aleksei Revin
Aleksei Revin

Posted on

Request coalescing

TBH, you don’t get that excited about challenges these days. But when you do, you get properly involved.

This time, it was a BlueSky developer article about request coalescing.

The problem, in short, is the following: once you release some popular content on your platform, the time it takes for the cache to warm up can be long enough to drag your servers down into the mud—both DB and web servers.

Let me quickly demonstrate that:

let cache = new Map<string, string>();

// Just a fake db
// takes the `path` on `get` and returns a promise that 
// resolves to `${path}: handled` in 1s
// also has a `getCounter` method that returns the number 
// of times `get` was called
const db = (() => {
  let counter = 0;
  const getCounter = () => counter;
  const get = async (key: string) => {
    counter += 1;
    return new Promise<string>((resolve) => {
      setTimeout(() => resolve(`${key}: handled`), 100);
    });
  };
  return Object.freeze({
    get,
    getCounter,
  });
})();

// just a fake route handler
// checks if path is in cache, and if 
// not - triggers db.get and stores in cache
const handleUrl = async (path: string) => {
  if (cache.has(path)) {
    return cache.get(path);
  }

  const r = await db.get(path);
  cache.set(path, r);
  return r;
};

// just a fake server
const server = () => ({
  handle: async (path: string) => handleUrl(path),
});

// test runner
(async () => {
  const s = server();
  const startTime = Date.now();
  // simulating concurrent requests to the server
  const results = await Promise.all([
    ...Array(10)
      .fill(undefined)
      .map(() => s.handle("url")),
    ...Array(10)
      .fill(undefined)
      .map(() => s.handle("url-2")),
  ]);
  const endTime = Date.now();
  console.log(results);
  console.log("db counter: ", db.getCounter());
  console.log(`Execution time: ${endTime - startTime} ms`);
})();
Enter fullscreen mode Exit fullscreen mode

The result of that call would be:

npx tsx benchmark_unoptimized.ts
[
  'url: handled',   'url: handled',
  'url: handled',   'url: handled',
  'url: handled',   'url: handled',
  'url: handled',   'url: handled',
  'url: handled',   'url: handled',
  'url-2: handled', 'url-2: handled',
  'url-2: handled', 'url-2: handled',
  'url-2: handled', 'url-2: handled',
  'url-2: handled', 'url-2: handled',
  'url-2: handled', 'url-2: handled'
]
db counter:  20
Execution time: 101 ms
Enter fullscreen mode Exit fullscreen mode

As you can see, before we got the result stored in the cache, we had already triggered 20 database calls for 20 requests in the first second. Imagine some rap star posting for 5 million followers at once—at least 500,000 accessing the post as soon as the push notification arrives?

The author of the original blog post is tackling this problem with Go + channels, essentially subscribing all listeners to the queue while waiting for the initial database request to be resolved.

Not bad! But can we do that in JavaScript? Even easier:

let cache = new Map<string, string>();

const db; // .. same

// this is where we pass the Server context to map
// different requests to the same handler
const handleUrl = async (path: string, ctx: Map<string, CtxHandler>) => {
  if (cache.has(path)) {
    return cache.get(path);
  }

  // just defaulting to throw to make it TS consistent
  const { process } = ctx.get(path) ?? {
    process: () => {
      throw new Error("Handler not found");
    },
  };
  // this is where we wrap the call to the DB
  // with a deduping handler
  const r = await process(() => db.get(path));
  cache.set(path, r);
  return r;
};

// deduping handler expects a Promise returning `string`
// and returns Promise with `string` himself
type CtxHandler = {
  process: (fn: () => Promise<string>) => Promise<string>;
};

// This is where we define a Demo context handler
// which is assigned per-route if needed
const ctx = () => {
  // store is a map of path -> CtxHandler, 
  // it stores the handler deduplication for each path.
  // Once the deduplicator has processed the path,
  // it removes the handler from the store - 
  // from now on we don't need it, as the value lives in cache.
  // Once the cache expires, the handler will be added 
  // back to the store on request
  const store = new Map<string, CtxHandler>();
  return {
    store,
    add: (path: string) => {
      store.set(
        path,
        (() => {
          let _q: Promise<string> | undefined;
          // deduplicator is very simple:
          //   if there's no active promise, create one 
          //   and store it in `_q`
          //   all subsequent calls will return the same promise
          //   once the promise resolves, remove it from the store
          return {
            process: async (fn) => {
              if (!_q) {
                _q = fn();
              }
              // if `_q` is defined, we just forward it
              const result = await _q;
              _q = undefined;
              store.delete(path);
              return result;
            },
          };
        })()
      );
    },
  };
};

// Here we initialize server with a Context
// not overcomplicating for simplicity

const server = (c = ctx()) => ({
  handle: async (path: string) => {
    // if `path` is not in cache and not in store, add it to store
    if (!cache.has(path) && !c.store.has(path)) {
      c.add(path);
    }
    return handleUrl(path, c.store);
  },
});

// test runner
(async () => {
  const s = server();
  const startTime = Date.now();
  const results = await Promise.all([
    ...Array(10)
      .fill(undefined)
      .map(() => s.handle("url")),
    ...Array(10)
      .fill(undefined)
      .map(() => s.handle("url_2")),
  ]);
  const endTime = Date.now();
  console.log(results);
  console.log("db counter: ", db.getCounter());
  console.log(`Execution time: ${endTime - startTime} ms`);
})();

Enter fullscreen mode Exit fullscreen mode

As a result we get

 npx tsx benchmark.ts
[
  'url: handled',   'url: handled',
  'url: handled',   'url: handled',
  'url: handled',   'url: handled',
  'url: handled',   'url: handled',
  'url: handled',   'url: handled',
  'url_2: handled', 'url_2: handled',
  'url_2: handled', 'url_2: handled',
  'url_2: handled', 'url_2: handled',
  'url_2: handled', 'url_2: handled',
  'url_2: handled', 'url_2: handled'
]
db counter:  2
Execution time: 101 ms
Enter fullscreen mode Exit fullscreen mode

So, we've hit DB for 2 times for 2 paths only!

Top comments (0)