DEV Community

loading...

Mock all you want: supporting ES modules in the Testdouble.js mocking library

giltayar profile image Gil Tayar ・15 min read

ES Module are a new way of using modules in JavaScript. Having ES modules (ESM) in Node.js means that you can now write:

import fs from 'fs'
import {doSomething} from './mylib.mjs'

instead of using the classic CommonJS (CJS) modules:

const fs = require('fs')
const {doSomething} = require('./mylib.js')

If you want to learn more about the whys and the hows (and are maybe wondering about that .mjs extension...), see my Node.TLV talk about ESM in Node.js:

But this blog post is not about Node.js ES modules, but rather about how I went about retrofitting my favorite mocking framework to support ES modules. The experience itself was great (and so was the encouragement from Justin Searls (@searls)), but I want to to talk about the more technical aspects of how to build a mocking library to support ES modules in Node.js.

So buckle your seatbelts. It's a long and deeply technical ride.

Testdouble.js

I usually

Testdouble.js is a fabulous mocking library. It can mock any function, method, or module. So mocking a CommonJS module would go something like this:

const td = require('testdouble')

const {doSomething} = td.replace('../src/mylib')

td.when(doSomething()).thenReturn('done')

Which would mean that app code that does this:

const {doSomething} = require('./mylib')

console.log(doSomething())

Would print done to the console. And, lastly, if we call:

td.reset()

Then the above app code will call the original mylib.js, and not the mocked version. Note aso that calling td.replace multiple times with different mocks replaces the original module multiple times.

Simple, clear, and to the point. Really nice mocking framework! Unfortunately, it only supports CommonJS modules.

How Testdouble.js works in CommonJS

Testdouble uses a technique that is used by all the various mocking libraries, auto-transpiler libraries (think babel-register or ts-node), and others. They monkey-patch Node.js' module loading code.

Specifically, Testdouble.js overrides Module._load and inserts its own loading mechanism, so that if a CommonJS module needs mocking (because it was td.replace-ed), it loads the mocked code instead of the original module's code. And, obviously, if the module doesn't need mocking, it calls the original Module._load.

An important thing to remember, and this fact is important when I talk about ESM support, is that Module._load is called only when Node.js needs to load the module's source code. If the module was already loadedm, and is in the cache, then it won't be called, and the mocking won't work. This is why Testdouble.js always deletes a mocked module from the cache immediately after creating it, so that the tests are able to call td.replace as many times as they want to change the mock.

Till now, I've always said that it is Testdouble.js that does the mocking, but that is not strictly true. Testdouble.js uses another package, quibble, that does all the "dirty work" of replacing a module for it. Quibble does only module replacement, and so its API is pretty simple, and much simpler than Testdouble.js':

const quibble = require('quibble')

quibble('./mylib', {doSomething: () => 'done'})

When mocking a module, you specify the path to the module, plus the replacement you want to the module.exports if that module. The above code is equivalent to the testdouble code we showed earlier.

Kudos to Justin Searls for splitting out the module replacement code to a separate package. It made adding ESM support much easier, as most of the work needed to be done in Quibble, separated from the noise of a general purpose mocking library.

Why do we even need ES module support

But, but, but (I hear you saying), why do we even need explicit ESM support? Won't the Module._load monkey patching (or any other various monkey-patching tricks around require) work with ES modules?

The answer is an emphatic "no". For two reasons.

The first is simple: When importing ES modules (using import), Node.js does not go through the same code paths that loads CommonJS modules (using require). So monkey patching Module._load won't work because it just isn't called!

Second, and more importantly: the designers and implementors of ES Module support in Node.js designed it in such a way that monkey-patching is not supported. To accomodate code that does need to hook into the module loading, there is an official way to hook into it, and it is the only way to affect how ES modules are loaded in Node.js.

Hooking into the ES Module loading mechanism

So how does one hook into the ES module loading mechanism? One word: loaders. This is the official API that enables us to hook into the ES module loading mechanism. How does one go about using it?

It's actually pretty easy and straightforward. First, you write a module (has to be ESM!) that exports various hook functions. For example, tbe following loader module adds a console.log("loaded") to all modules:

// my-loader.mjs
export async function transformSource(source,
                                      context,
                                      defaultTransformSource) {
  const { url } = context;

  const originalSource = defaultTransformSource(source, context, defaultTransformSource);

  return {source: `${originalSource};\nconsole.log('loaded ${url}');`}
}

Node.js calls this loader module's transformSource function (note that it is exported by this module, so Node.js can easily import the module and call the function) whenever it has loaded the source, enabling the loader to transform the source. A TypeScript transpiler, for example, could easily use this hook to transform the source from TypeScript to JavaScript.

But how does Node.js know about this loader module? By us adding it to the Node command line:

node --loader=./my-loader.mjs

There is no API to load a loader: the only way to load a loader is via the command-line. (Will this change? Doesn't seem likely.)

Note: ES module loaders are an experimental mechanism, and some parts of them are bound to change. The information about loaders here is relevant to May 2020.

So now that we know how to hook into the ES module loading mechanism, we can start understanding how we implemented module replacement in Quibble. Oh, but one last thing! We saw above that we need to enable multiple replacements, and the ability to reset. In the CommonJS implementation of Quibble, this was done by deleting the cache entry for the module whenever we replaced it with a mock, so that Node.js always calls Module._load. Unfortunately, this won't work in ES modules because there is no way to clear the ESM cache, as it is separate from the CJS one, and not exposed by Node.js. So how do we do it for ESM? Patience, patience...

How to use the Quibble ESM support

But before we explain how it works, let's see how to use it. As you will see, it is very similar to Quibble CJS support. Let's assume we have a module:

// mylib.mjs
export function doSomething() {
  return task
}

let config = {}

export default 'doing'

This module has one "named export" (doSomething), and one "default export" (the value 'doing'). In ESM, these are separate, unlike in CJS.

First, to replace a module, use quibble.esm(...):

await quibble.esm('./mylib.mjs', {doSomething: () => 'done'}, 'yabadabadoing')

Why await? We'll see why when we discuss implementation, but intuitively, it makes sense, given that ESM is an asynchronous module system (to understand the why, I again refer you to the youtube video above that discusses the why and how of ESM), whereas CJS is synchronous.

To "reset" all ESM modules back to their original modules, we use:

quibble.reset()

Besides these two functions, there's a third function, used by testdouble.js (for reasons we won't get into in this blog post):

const {module, modulePath} =  quibble.esmImportWithPath('./mylib.mjs')

This returns the module mentioned (just like await import('./mylib.mjs') does), and the full path to the module file.

That's it. That's the Quibble ESM API, which the next sections explains how they work.

ESM replacement in Quibble

Architecture of ESM support in Quibble

As you can see, quibble has three separate parts:

  • The store, which is stored globally in global.__quibble, and stores all the mocking information.
  • The API, quibble.js, which updates the store with the mocks based on calls to quibble.esm() and quibble.reset().
  • The module loader, quibble.mjs, which implements the mocking based on the data written to store. This file is the loader specified in node --loader=....

Let's start explaining the Quibble ESM architecture, by explaining each part one by one. I usually like to start with the data model, so let's start with that:

The Store (global.__quibble)

The store, which is available in global.__quibble, has the following properties:

  • The important property is quibbledModules, which is a Map from the absolute path of the module to the mocks for the named and default exports. When you're doing quibble.esm(modulePath, namedExportsReplacement, defaultExportReplacement), you're basically doing global.__quibble.quibbledModules.set(absoluteModulePath, {namedExportsReplacement, defaultExportReplacement})

  • But the more interesting property is stubModuleGeneration: a number that starts at 1 and is incremented on every quibble.esm. Remember that we can't delete modules in ESM? This property enables us to have multiple "generations" (versions) of the same module in memory, and use only the latest one. How? We'll see later.

The API (quibble.esm/reset/esmImportWithPath(...))

This is also pretty simple. Let's start by looking at the code, block by block. You can follow here, and also try and follow from this flowchart that expresses most of the details from here:

quibble.esm flowchart

quibble.esm = async function (importPath, namedExportStubs, defaultExportStub) {
  checkThatLoaderIsLoaded()

The signature we've already explained. The first line of the function checks that the loader is loaded. How? It checks that there's a global.__quibble. If not, it throws an exception. Good DX, but not very interesting code-wise. Let's continue:

if (!global.__quibble.quibbledModules) {
    global.__quibble.quibbledModules = new Map()
    ++global.__quibble.stubModuleGeneration
  }

We'll see later that quibble.reset deletes the quibbledModules (because no more mocking needed, right?), so this restores it, and increments the generation (I promise we'll see what this generation thing is for when we get to the module loader!).

I want to skip ahead to the last lines, which are the important ones:

global.__quibble.quibbledModules.set(fullModulePath, {
  defaultExportStub,
  namedExportStubs
})

When we talked about the store, we said that this is the crux of quibble.esm: writing the mocks to the store. Well, these are the lines that do it! So why all the rest of the lines? They're there for one reason: figuring out the fullModulePath. How do we do that?

Well, it depends. The "import path", which is what the user puts in quibble.esm('./mylib.mjs') can be one of three things, and the absolute path is figured out based on this:

  • An absolute path. This can theoretically happen, but not very practical. In this case, if the path is absolute, just use it as the fullModulePath!
  • A relative path. The path is relative, and relative to the caller file (the file that called quibble.esm), so we need to figure out the absolute path the caller file. This is done in hackErrorStackToGetCallerFile(), and I won't go into the details, because it's the same hack that is used in CJS: create an Error and retrieve the stack from that. I just modified it a bit: the stack when the module is ESM may have URLs and not file paths, because ESM is URL-based. Once we have the caller file, we can absolutize the relative path to get the absolute path.
  • A bare specifier. In ESM parlance, a bare-specifier is something that is not a path, but is supposed to be a package in node_modules. Examples: lodash, uuid/v4, fs. This is the more difficult one, because to figure out which module file Node.js loads for the package, we need to duplicate the same algorithm that Node.js uses to figure it out. And that is a problematic thing, especially in ES modules, where we need to take care of things like the conditional exports. I really wanted to avoid it. So I had a trick up my sleeve, which we'll see in a second when we look at the code.

So let's look at the code:

  const importPathIsBareSpecifier = isBareSpecifier(importPath)
  const isAbsolutePath = path.isAbsolute(importPath)
  const callerFile = isAbsolutePath || importPathIsBareSpecifier ? undefined : hackErrorStackToGetCallerFile()

  const fullModulePath = importPathIsBareSpecifier
    ? await importFunctionsModule.dummyImportModuleToGetAtPath(importPath)
    : isAbsolutePath
      ? importPath
      : path.resolve(path.dirname(callerFile), importPath)

The first two lines figure out which kind of module this is. The third line figures out the caller file if the module path is relative.

The last lines generate the module path. The most interesting one is what we do when the import path is a bare specifier. Let's look at dummyImportModuleToGetAtPath, whcih is used to get the absolute path to the bare specifier module file:

async function dummyImportModuleToGetAtPath (modulePath) {
  try {
    await import(modulePath + '?__quibbleresolvepath')
  } catch (error) {
    if (error.code === 'QUIBBLE_RESOLVED_PATH') {
      return error.resolvedPath
    } else {
      throw error
    }
  }

  throw new Error(
    'Node.js is not running with the Quibble loader. Run node with "--loader=quibble"'
  )
}

This is interesting. We import the bare specifier, but add a ?__quibbleresolvepath to it. What? How does that help? Remember: we have a loader running, and that loader (as we'll see later), will catch requests for a module, notice the __quibbleresolvepath, figure out the module path (we'll see how later), and throw an exception with the module path, which this code catches.

Sneaky!

There. We've covered how quibble.esm(...) works. quibble.reset is MUCH simpler:

quibble.reset = function () {
  delete global.__quibble.quibbledModules
}

That's it (it has stuff for CJS, but we're ignoring that). We're just deleting quibbledModules so that the loader will know that there are no replacements to do, and that it should return all the original modules.

The last one is quibble.esmImportWithPath, and we won't describe the implementation, because it's mostly similar to quibble.esm, except for one line:

await import(fullImportPath + '?__quibbleoriginal')

After determining the full import path (in exactly the same way done by quibble.esm) it import-s the module, but adds ?__quibbleoriginal to it. The loader will see this "signal" and know that even if the module is quibbled, it should load the original module this time.

Notice the repeated use of query parameters in the code. This is a recurring theme, and we'll see it used in onre more place—the most important place.

The Module Loader (quibble.mjs)

We finally come to the module you've all been waiting for: the module loader. To remind you, this is the module we specify when we run node: node --loader=quibble, and Node.js will call it in various phases of loading the module. Each such "phase" is a call to a different named export function. We will concern ourselves with two interesting hook functions:

  • resolve(specifier, {parentURL}, defaultResolve): an async function that (and this is important) Node.js will call even if the module is in the cache. It will do this to determine what the full path to the module is, given the specifier (what we called the "import path" above), and the parentURL (what we called "caller file" above). The important thing to understand about this function is that the resulting URL is the cache key of the module.

  • getSource(url, context, defaultGetSource): an async function that retrieves the source of the module, in case the module is not in the cache. The defaultGetSource just reads the file from the disk, but our implementation will return some articially produced source if the module needs to be mocked. The important thing to understand about this function is that the URL it receives is the URL returned by the resolve hook.

But what are these URLs we're constantly talking about? Why are we dealing with http URLs and not file paths? The answer is simple: the ES modules specification in JavaScript says that module paths are URLs and not file paths. They could be http://... URLs or file://... URLs or whatever conforms to the URI spec. Node.js currently supports only file://... URLs, but we could easily write a loader that supports loading from HTTP. Node.js keeps the URLs, and translates them to a file path on the disk (using new URL(url).pathname) only when actually reading the source file.

Let's start going over the code of each hook function. You can follow here

resolve(specifier, {parentURL}, defaultResolve)

We first prepare an inner function that will be used in other parts of this function:

const resolve = () => defaultResolve(
  specifier.includes('__quibble')
    ? specifier.replace('?__quibbleresolvepath', '').replace('?__quibbleoriginal', '')
    : specifier,
  context
)

This function, when called, will call the default resolver to get the default URL for the module. The nice thing about this, is that if the specifier ("import path") is a bare-specifier, then it will resolve the full module path for us! We have to remove the query parameters, because bare specifiers aren't really URLs, so query parameters aren't allowed. The fact that we can let Node.js resolve a specifier for us is why we use it in the next lines:

if (specifier.includes('__quibbleresolvepath')) {
  const resolvedPath = new URL(resolve().url).pathname
  const error = new Error()
  error.code = 'QUIBBLE_RESOLVED_PATH'
  error.resolvedPath = resolvedPath
  throw error
}

Remember when explaining quibble.esm we appended ?__quibbleresolvepath to get at the full module path? This is where it's used. We throw an exception here, and attach all the information to the error, so that quibble.esm can use it.

Sneaky! But let's continue:

  if (!global.__quibble.quibbledModules || specifier.includes('__quibbleoriginal')) {
    return resolve()
  }

We default to the default resolver in two cases: there are no quibbled modules (because quibble.reset was called), or because quibble.esmImportWithPath imported the path with an additional ?__quibbleoriginal (see above for the reason why). Let's continue:

const {url} = resolve()
if (url.startsWith('nodejs:')) {
  return {url}
}

We now resolve the specifier. If the module is an internal module (e.g. fs, dns) then the URL has a nodejs scheme, and we don't need to do anything, just return what was resolved.

All the above was just setting the stage. Now come the important lines:

    return { url: `${url}?__quibble=${global.__quibble.stubModuleGeneration}` }

We "decorate" the URL with a ?__quibble with the generation. This decoration will notify getSource, that gets this URL, to return a mocked source, and not the original source. This also allows the original module to have a regular URL (without __quibble) and the mocked one a "decorated" URL (with __quibble). This is more important than it seems, because it enables both versions of the module to reside in memory. How? Remember that the cache key for the module is the full URL returned by the resolve hook. So if the URLs differ by a query parameter, then both versions of the module (the original and the mocked) reside in the cache.

And because the resolve hook is called before checking the cache, then that means we can route Node.js to whatever version of the module we want, based on whether it needs to be mocked or not, and this can change on the fly.

Sneaky!

But why do we append the generation? Why not just __quibble? Similar to the above, this allows to to generate a different version of the mock every time we need it. And because we can quibble.reset and then quibble.esm a different mock module, then we will need a different cache key for the new version of the mock module. This is the reason for the mock generation.

Sneaky!

And so we reach the end of our journey, with the last hook, the one that actually returns the mocked module:

getSource (url, context, defaultGetSource)

As in resolve, we define a function to get the default source:

  const source = () => defaultGetSource(url, context, defaultGetSource)

Now we check whether quibble.reset was called, and so we can return the original source:

if (!global.__quibble.quibbledModules) {
  return source()
}

And here we check that we need to quibble the module, and if we do, we call transformModuleSource(stubsInfo):

const shouldBeQuibbled = new URL(url).searchParams.get('__quibble')

if (!shouldBeQuibbled) {
  return source()
} else {
  const stubsInfo = getStubsInfo(url) // find the stubs in global.__quibble.quibbledModules

  return stubsInfo ? { source: transformModuleSource(stubsInfo) } : source()
}

And, now, here it is, in all it's glory: the mocked module code generation:

function transformModuleSource ([moduleKey, stubs]) {
  return `
${Object.keys(stubs.namedExportStubs || {})
  .map(
    (name) =>
      `export let ${name} = global.__quibble.quibbledModules.get(${JSON.stringify(
        moduleKey
      )}).namedExportStubs["${name}"]`
  )
  .join(';\n')};
${
  stubs.defaultExportStub
    ? `export default global.__quibble.quibbledModules.get(${JSON.stringify(
        moduleKey
      )}).defaultExportStub;`
    : ''
}
`
}

What do we do here? This is a code generator that generates a named export for each of the mocked named exports. The value of the named export comes from the store, which the generated code accesses. Same goes for the default export.

And the journey is done.

Summary

We covered a lot here. But it's actually more complicated than it seems. Let's try and summarize the important things to remember:

  • The store (global.__quibble) holds all the mocks per each mocked module's absolute module path.
  • The API stores the information in the store. Since it needs the full module path, it makes use of the fact that the resolver can return the module path of bare specifiers (by adding a query parameter to signal xthis), to do just that.
  • The module loader's resolve hook checks for signals from the API that tell it to resolve the module path using the default module. It also adds __quibble for the getSource hook to tell it that it needs to return the source of the mocked module.
  • The _quibble query parameter has a "generation" number added to it to enable multiple versions of the mock to be used and discarded.
  • The getSource looks at the __quibble parameter to determine whether to return the original source or whether to return the code of the mocked module.
  • The mocked module source code exports named and default exports, whose values come from the global store.

The future

How fragile is this? What are the odds that some change renders the design above obsolete? I don't know really, but the above hooks have been stable for a pretty long time (minor changes notwithstanding), so I'm pretty confident that I'll be able to navigate Quibble and Testdouble.js through changes in loaders.

There is one change on the horizon, however, that is somewhat worrying:

WIP: Move ESM loaders to worker thread #31229

bmeck avatar
bmeck posted on
Checklist
  • [ ] make -j4 test (UNIX), or vcbuild test (Windows) passes
  • [ ] tests and/or benchmarks are included
  • [ ] documentation is changed or added
  • [ ] commit message follows commit guidelines

This has some widespread implications:

  • dynamicInstantiate no longer exists since there is no 1st class references between loaders and the thread they are operating on
  • only 1 shared loader is spawned for all the threads it affects, unlike currently where node spins up a new loader on each thread
  • data is done by passing messages which are serialized
  • loaders can no long be affected by mutated globals from non-loader code

This roughly follows some of the older design docs and discussions from @nodejs/modules .

This does not seek to allow having multiple user specified loaders, nor is it seeking to change the loader API signatures, it is purely about moving them off thread and the implications of such.

This does introduce a new type of Worker for loading an internal entry point and also expands the worker_threads API for convenience by allowing a transferList in the workerData to avoid extraneous postMessages.

This will need quite a large writeup on how it works and how data is transferred but this seems a good point to start discussions.

If implemented, this change will move the loaders to a worker thread. In general, this is a good thing, but it also means that the way the API and the module loader communicate today—through the global scope—will not work, and we will need a way to communicate the stubs and other things between the API and the loader. I am certain that if this PR is fully implemented, a way to do this will be given.

Thanks

I'd like to thank Justin Searls (@searls) for his encouragement and quicknessin accepting the PRs. (Not to mention patience at my frequest zigzags in the code!)

Discussion (1)

pic
Editor guide