How to Build a Module Bundler: Complete Guide to 8 Core Techniques with Code Examples

#programming #devto #javascript #softwareengineering

As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!

Let's walk through how module bundlers work. I want to show you the important parts by building a simple one together. Think of it like taking a box of separate Lego pieces and connecting them into a single, solid model ready for display. We will focus on eight key methods.

We start with the entry point, usually a file like index.js. Our first job is to find it. The module resolver is our file system detective. It takes a request like ./utils and figures out the exact location of the file on your computer.

The code begins by checking if the path is relative. It joins the path with the directory of the file that asked for it. For a non-relative request like lodash, it has a different job. It walks up the folder tree, looking for a node_modules folder.

When it finds node_modules/lodash, it doesn't stop. It looks inside for a package.json file. This file tells it which file is the main entry, like lib/index.js. If that file doesn't exist, it tries index.js as a backup. It also needs to try adding extensions like .js or .ts if the path doesn't have one.

// A simplified look at the resolution logic
async resolve(request, fromDir) {
  // Is it a direct path?
  if (request.startsWith('.')) {
    let candidate = path.resolve(fromDir, request);
    // Does it have an extension? If not, add one.
    if (!path.extname(candidate)) {
      candidate = await this.tryWithExtensions(candidate);
    }
    return candidate;
  }

  // It's a package name, look in node_modules
  let dir = fromDir;
  while (dir !== path.parse(dir).root) {
    const pkgPath = path.join(dir, 'node_modules', request);
    if (await this.exists(pkgPath)) {
      // Found the package folder, now find its main file
      return this.resolvePackageMain(pkgPath);
    }
    // Go up one directory and check again
    dir = path.dirname(dir);
  }
  throw new Error(`Module not found: ${request}`);
}

Once we can find files, we need to understand their relationships. This is the dependency graph. We begin at the entry file, parse it to see what it imports, then find and parse those files, and repeat. The result is a map of every file and what it needs.

To do this, we must read the source code. We need to find the import and export statements. We can't just use simple text search; we need to understand the code's structure. This is where an Abstract Syntax Tree, or AST, comes in.

Think of an AST as a detailed family tree for your code. Each part of your code—a variable, a function call, an import—becomes a node in this tree. We use a tool like Babel's parser to create this tree from raw text.

const parser = require('@babel/parser');
const code = `import { formatDate } from './dateUtils.js';`;
const ast = parser.parse(code, { sourceType: 'module' });

console.log(ast.program.body[0].type); // 'ImportDeclaration'
console.log(ast.program.body[0].source.value); // './dateUtils.js'

With the AST, we can reliably traverse it and collect all the dependencies. We write a small visitor that acts like a security guard checking IDs, looking specifically for nodes of type ImportDeclaration.

const traverse = require('@babel/traverse').default;
const dependencies = [];

traverse(ast, {
  ImportDeclaration(path) {
    // Record where this import is pointing to
    dependencies.push(path.node.source.value);
  },
  ExportNamedDeclaration(path) {
    // Also check for re-exports: export { x } from './module'
    if (path.node.source) {
      dependencies.push(path.node.source.value);
    }
  }
});
// Now 'dependencies' contains ['./dateUtils.js']

We have a graph, but it's just data. Our browser can't execute import statements directly from this graph. We need to transform each module's source code into a new form. We will replace modern ES module syntax with a simple function call system our bundle can understand.

The goal is to turn import { capitalize } from './stringLib'; into something like const { capitalize } = __require__(2);. The number 2 is a unique ID we will give to the ./stringLib module.

Similarly, export const version = '1.0'; needs to become module.exports.version = '1.0';. We are essentially translating from ES module language to CommonJS-style language, which is easier to bundle.

Let's look at transforming an import statement. We find the ImportDeclaration node in the AST. We replace the entire node with a new variable declaration.

const t = require('@babel/types');

// Suppose we know the './stringLib' module has ID = 2
const moduleId = 2;

// Create the call to our runtime function: __require__(2)
const requireCall = t.callExpression(
  t.identifier('__require__'),
  [t.numericLiteral(moduleId)]
);

// Create the destructuring: const { capitalize } = __require__(2);
const newDeclaration = t.variableDeclaration('const', [
  t.variableDeclarator(
    t.objectPattern([
      t.objectProperty(
        t.identifier('capitalize'),
        t.identifier('capitalize'),
        false,
        true // shorthand
      )
    ]),
    requireCall
  )
]);

// 'path' is the AST node for the import statement
path.replaceWith(newDeclaration);

We must do this for every module, and also handle exports. After all transformations, we wrap each module's code in a function. This creates a private scope, so one module's variables don't clash with another's.

function(module, exports, __require__) {
  // The transformed module code lives here
  const { capitalize } = __require__(2);
  module.exports.greet = (name) => capitalize(name);
}

Now we have a collection of wrapped modules. How do we run them? We need a runtime, a small piece of machinery that stitches everything together. This runtime does two main things: it stores all modules, and it provides a __require__ function to load them.

We build a modules registry, which is just an array. The index of each module in the array is its ID. Module 0 might be our entry point, module 1 could be a library, and so on.

The __require__ function is surprisingly simple. It receives an ID, looks up the module function in the registry, and calls it.

// The final bundle output starts with this runtime
(function(modules) {
  // A cache for modules that have already been executed
  const moduleCache = [];

  // The require function our transformed code will call
  function __require__(moduleId) {
    // Return cached module if it exists
    if (moduleCache[moduleId]) {
      return moduleCache[moduleId].exports;
    }

    // Create a new module object for this id
    const module = { exports: {} };
    // Store it in cache immediately to handle circular dependencies
    moduleCache[moduleId] = module;

    // Get the module's wrapper function and execute it
    // It receives the module object, its exports, and the __require__ function
    modules[moduleId].call(module.exports, module, module.exports, __require__);

    // Return the now-populated exports
    return module.exports;
  }

  // Start the application by requiring the entry module (id 0)
  __require__(0);
})([
  // Our modules registry array starts here
  // Each element is a wrapped module function
  function(module, exports, __require__) {
    // Transformed code for module ID 0 (entry point)
    const utils = __require__(1);
    console.log(utils.greet('world'));
  },
  function(module, exports, __require__) {
    // Transformed code for module ID 1
    module.exports.greet = (name) => `Hello, ${name}`;
  }
])

Handling exports is the other side of the coin. When our transformed module function runs, it receives the empty module.exports object. Its job is to attach properties to it. The runtime __require__ function then returns this object to whatever module asked for it.

This approach also solves a subtle problem: circular dependencies. What if module A imports from module B, and module B imports from module A? Our simple cache handles this. When __require__(A) is called, we create and cache the module object for A before executing its function. If, while executing, module A calls __require__(B), that call will run. If module B then tries to __require__(A), it will find the partially constructed module object for A in the cache and return it immediately, even if A's code hasn't finished running yet.

The final step is putting it all together in a single file. Our bundler's main function orchestrates the process: resolve, create graph, transform, generate the runtime, and concatenate everything into a string of JavaScript.

async function build(entryPath) {
  // 1. Resolve the absolute path of the entry
  const entry = await resolver.resolve(entryPath, process.cwd());

  // 2. Create the dependency graph
  const graph = await createGraph(entry);

  // 3. Transform each module in the graph
  const transformedModules = [];
  for (const moduleNode of graph) {
    const transformedCode = transformModule(
      moduleNode.source,
      moduleNode.id,
      moduleNode.mapping // Maps import paths to module IDs
    );
    transformedModules.push(transformedCode);
  }

  // 4. Generate the final bundle with runtime
  const bundle = `
    (function(modules) {
      const moduleCache = [];
      function __require__(id) {
        if (moduleCache[id]) return moduleCache[id].exports;
        const module = { exports: {} };
        moduleCache[id] = module;
        modules[id].call(module.exports, module, module.exports, __require__);
        return module.exports;
      }
      __require__(0);
    })([
      ${transformedModules.join(',\n')}
    ])
  `;

  return bundle;
}

These are the core ideas. You start with a resolver to find files, build a graph to see connections, transform the code to a compatible format, and wrap it in a runtime that knows how to load pieces on demand. When you use tools like Webpack, they are doing all of this, plus many optimizations. Seeing it built step-by-step makes those tools feel less like magic and more like a set of clear, logical operations. You can start with this basic structure and add features like handling CSS files, code splitting, or tree shaking. The foundation is understanding these eight techniques: resolution, graphing, AST parsing, transformation, wrapping, registry creation, runtime design, and final concatenation.

📘 Checkout my latest ebook for free on my channel!

Be sure to like, share, comment, and subscribe to the channel!

101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!