Josh Carvel

Posted on Jul 31, 2023

Understanding JS Build Tools (Part 1: Modules and Packages)

#javascript #beginners #webdev #tooling

Intro 🏗️

When you learn front-end development for the first time, you learn a simple fact: to run JavaScript in the browser, you add a <script> tag in an HTML file. You can set the src attribute to link to a JS file, or just write the JS inside the tag, and if you’re not using much JavaScript, you may not need to know much more than this to run it.

But these days, web applications and sites tend to use a lot of JavaScript. JS is gaining more and more responsibility in the browser, and on the server too. For this reason, we also have a lot of techniques and tools to help us construct our JS projects. And that can mean a lot of wrestling with build tools you don’t fully understand, even if you’re an experienced dev.

Of course, every layer of complexity or configuration is meant to improve either the developer or end user’s experience. But over time, it adds up to quite a heap of technical knowledge. And while we have tools and frameworks that can abstract away these details for you, it always pays to know what’s really going on, so you can take greater control of your projects.

In this two-part article, I’ll help you do that. We will demystify the key methods and tools commonly used to set up modern JS projects, examining what they do, why they exist, and their pros and cons. By the end, you should be able to make intelligent decisions about build tools and feel confident in tackling their configuration.

In Part 1, we’ll ground ourselves with a solid understanding of JavaScript modules and packages.

Prerequisites

Basic understanding of HTML, JavaScript and JSON
Basic understanding of Git

Modules 🗃️

Modules are at the heart of modern front-end development. This is because the JavaScript for our project is likely to be spread across different files. Even if we wrote all the JS ourselves, a single large file is not very nice to manage. And we definitely need other files if we’re using code other people wrote.

As a starting point, we could split up our JS files into separate scripts. Perhaps one for each page of the website or application, for example. But when you add multiple <script> tags in HTML, by default the JavaScript runs as if it was still all in a single file. If I declare a function or a variable globally (not inside a function), I can use it later on, even in a different file.

So if I have two scripts like this:

    <script src="./script1.js"></script>
    <script src="./script2.js"></script>

With a function like this in script1:

function sayHi() {
  alert("hi");
}

I can use the function in script2:

sayHi();
// the "hi" alert shows in the browser

This behaviour probably seemed OK back when JS was a novelty that could add a sprinkling of interactivity to mostly ‘static’ websites. But for today’s complex sites and applications, it’s far from ideal.

Firstly, we have to manage dependency order. Because script2 depends on a function from script1, the <script> tag for script2 must come after script1 in the HTML. The scripts are run in order, and you can't use a function or variable before it is defined. This may not seem like a big deal here, but the problem grows with the amount of scripts you have. Keeping track of what script needs to go where is manual work that we don’t want to do.

Secondly, the relationships between parts of our code are implicit, which can break our code in unexpected ways. If there is another sayHi function somewhere after script1, and I call it later expecting to get the script1 version, actually it will have been overwritten by the other implementation. This happens for function declarations and variables defined with var. It’s even worse for any variables (including function expressions) defined with const or let - these cause an error if another already exists with the same name. For code that uses a lot of libraries and frameworks, this is simply unworkable.

Finally, global variables and functions create a security issue. Anyone running your code could run a script to access your code’s inner workings and even change what it does.

So how do we deal with this? The answer is modules. But before proper module systems were used in JavaScript, there was a way to reduce the issues of global scope a little, sometimes referred to as the ‘module pattern’.

The 'module pattern'

Before modules, library authors would wrap their code in a single function, so the inner workings were not in global scope and couldn't conflict with anything. Inside that function, they would add all of their library’s public API to a single object (e.g. the $ object in the famous example of jQuery), and add it to the window object in order to make it globally accessible. Then finally they would call the wrapping function to make it work.

The catch here is that the name of your wrapping function would still end up on the global scope. This problem was sidestepped with an obscure technique. We can define functions without names in JavaScript if they are part of an expression (a line of code that resolves to a value). Usually expressions are assigned to a variable, but they don’t have to be - just writing a value such as 3 is a perfectly valid line of JavaScript.

However, a function not assigned to a variable is treated as a function declaration by the JavaScript compiler, so it will expect a function name. Putting the expression inside parentheses convinces the compiler it’s not a function declaration. We then need to call the function, and since we can't do it later (there’s no name to reference), we have to do it immediately. Hence we get this weird looking construction called an immediately invoked function expression (IIFE):

(function() {
    const library = {}
    // my library code...
    window.library = library;
})();

This pattern was very widely used in JavaScript code, but definitely wasn’t ideal. Each ‘module’ still exposes a public object to all other scripts, and we still have to manage the dependency order of scripts.

So let’s look at how a module system can solve these problems for us.

ECMAScript Modules (ESM)

There is a module syntax described in the ECMAScript specification (which defines JavaScript). It uses the keywords import and export to link dependencies.

For example, in script1 I could have:

export function sayHi() {
  alert("hi");
}

Then, in script2:

import { sayHi } from "./script1.js";

sayHi();
// the "hi" alert shows in the browser

We can immediately see how much nicer this is - the relationships between different files are now explicitly defined. And the code inside each module is scoped to the module, not globally scoped. So sayHi can only be used where it is imported. If I don't import it, I get an error:

sayHi();
// Uncaught ReferenceError: sayHi is not defined

Though I could have a different definition of sayHi if I wanted, and both would work fine in their respective files. If I both import sayHi and define a new version with the same name, I will get a (different) error when running the program, but at least the conflict of names is obvious in the file, not implicit.

import { sayHi } from "./script1.js";

function sayHi() {}

sayHi();
// Uncaught SyntaxError: Identifier 'sayHi' has already been declared

One way to avoid naming conflicts is using the module object syntax, which adds any exports from the imported file to an object with whatever name you assign it.

import * as Script1 from "./script1.js";

Script1.sayHi()
// the "hi" alert shows in the browser

Using Modules in the Browser

Now comes the complicated bit. ESM has become the standard for modern web applications, but there is a bit of history.

Firstly, this system was only added to the ECMAScript specification in the 2015 release commonly referred to as ES6 (the 6th release). Before this, browsers simply did not understand ESM syntax or have the functionality to wrap up scripts into isolated modules.

The first JS module system was actually created for running JavaScript outside the browser, by the CommonJS project in 2009. It was then implemented in Node.js, the popular JS runtime environment.

CommonJS modules use a different syntax to ESM. For example, here is a named export in script1:

module.exports.sayHi = function sayHi() {
  console.log("hi");
}

And here’s the import in script2:

const { sayHi } = require('./script1')

sayHi()
// "hi" is logged to the console

This system made it easy for authors of server-side and command-line JS projects to split up their code and use other people’s modules. But there was an obstacle to implementing it in the browser.

In Node.js, a JS file can be located on the machine and loaded immediately, so the code can run synchronously - line by line. The browser environment is different. HTML was not designed with modules in mind. You can’t just hand the browser a bunch of modules and get it to run them in the right order. You have to specify individual script tags, and each one is a network request that takes time to complete. So unless we want to block the rendering of the page while we wait for each script, that makes the process asynchronous.

Thus began the tradition of using tools to transform JavaScript modules into code the browser can run correctly. Early approaches to this transformed the code at the time of execution. There was a popular specification called Asynchronous Module Definition (AMD), where your scripts defined a function which wrapped the code to be executed and listed the dependencies required, but did not call that function. At runtime, a third-party library known as a script loader retrieved the script elements from the DOM, sorted out the required order and added the module code into the DOM dynamically.

Other libraries were available as command-line build tools. These compiled the CommonJS code before it was delivered to the browser. A popular tool called Browserify was created to handled the dependencies between modules and bundle all the code inside a single script tag. It was later overtaken by webpack and other bundlers that could handle ESM syntax and provide other features we’ll talk about in the next article.

Unfortunately, before ESM there was no consensus on the best module approach, so library authors needed their code to work for users using either CommonJS or AMD. This led to Universal Module Definition (UMD), which basically took authors back to using an IIFE, then adding code to export the module in whatever way was being used at runtime.

You won’t need to use these approaches in your projects, but you will hear them mentioned in bundler documentation. The bundler needs to handle different types of exports and imports, because many popular libraries are still not exported using ESM (for example, React still exports CommonJS modules). And when we bundle for the browser, we usually output our code in an IIFE so it can run in older browsers, while still being isolated from any other JS that may be running on the page.

Default exports

Another legacy of previous module systems is that exporting a single object containing all the library’s functionality was very common. ESM has a syntax that allows for this called default exports.

For example, let’s say I have a file called MyLibrary.js, and all I want to export from it is a class called MyLibrary. If I export it with export default, it can then be imported without curly brackets:

import MyLibrary from "./MyLibrary.js"

To bundle or not to bundle?

Ok, we’re done with the history lesson and we want to use ESM syntax. But we still have a shoot-out between bundling the modules before they hit the browser, and just using the browser implementation. If you want to use import statements without bundling, the <script> tag for each module must have its type attribute set to “module”. With our example scripts that looks like this:

<script type="module" src="./script1.js"></script>
<script type="module" src="./script2.js"></script>

When the browser reads these files, it figures out the dependencies before any of the code is executed, so we don’t have to worry about the order. If script1 imports something from script2, that tells the browser to execute script2 first, even though it comes later in the HTML.

Note that this makes the imports static - they’re not allowed to change during execution. So you can't use an import or export statement inside a function or condition - it must be at the 'top level'. (There is an alternative called ‘dynamic imports’ however, which we’ll cover in Part 2).

So now this exists in the browser, why bother using build tools instead? Well, there are a number of reasons why bundling JavaScript has become so popular. One is that it cuts down on network requests - why make the browser request multiple files when we can send just one? Another is that build tools can handle other useful operations before outputting our code, which we’ll cover in Part 2.

But perhaps the biggest reason relates to how we use other people’s code.

Sharing code 🎁

We often want to use other people's libraries and frameworks, known as dependencies, in our JS. What's the best way to do this?

You could go to the website or GitHub repository of the library you want to use and download the code, then add it to your project. But that’s slow, and you would have to also download all of the libraries that it depends on. Since JS projects these days have so many dependencies, that’s not really feasible. And this is just for one version of the library, never mind dealing with updates.

A slightly more common approach is linking in a <script> tag to the URL of a server that the library code (and its dependencies) is stored on. To do this, the author must have first publicised this link and the server must allow cross-origin requests (requests coming from other URLs). Additionally, in order for the code to get to the browser quickly, they will most likely use a content delivery network (CDN).

CDN Links

A CDN is a network of servers across the world. Companies such as Cloudflare, Akamai, Amazon, Google, Microsoft and many others provide these networks, which code authors pay to handle serving their resource, e.g. a website, application or library.

The author usually puts their code on their own server (known as the origin server), but either sets up a redirect or alters their urls to point to the CDN instead. The CDN servers pull that resource from the origin and cache it so wherever the user is, there should be a copy nearby which can reach them quickly. Periodically, the CDN servers request the resource from the origin server to ensure it is up-to-date, or the author can manually invalidate the cache.

The advantage of using a library via a CDN is that you don't have to download the code or its dependencies yourself, and it's generally fast and reliable. Plus it can be better for the user, because since multiple sites can access a library from the same CDN, the user’s browser might already have cached the resource you’re linking to, in which case it doesn’t need to fetch it again over the network. So why not use CDN links for everything?

Firstly, there are some small drawbacks that might be an issue, such as not being able to work offline or not having control of serving the resources needed for your project. Another simple one is that there might not be a CDN link available for the library you want to use.

But there are bigger reasons too:

Admin. During development, you need to add all the <script> tags in the right place. Then as time goes on, you need to manually manage what version of the library you need when third party packages are updated.
Performance. This is the same problem we mentioned earlier - the browser has to make a separate request for each <script> tag (unless the resource is in the browser cache), which will slow down load times in production.

For a very small project, these downsides might be no problem. But for everything else, this is why package managers are so popular. Package managers provide a system for managing dependencies. All the code is stored on the developer's computer, so the application can be developed offline if necessary. And you can use a bundler to output only one <script> tag (or however many you prefer).

The downside of packages? They come from the Node.js world. You have to run Node.js on your computer, and a bundler to join all the modules together. This can be a hurdle for junior developers, and adds some complexity to getting started with JS projects. But for most JS devs this isn’t a big deal - we’ve accepted the inconvenience of setting up Node and a bundler config every once in a while, in return for the benefits it provides.

It does mean, however, that native ESM in the browser was somewhat dead on arrival in 2015. It has its use cases, but people were already used to using packages and a bundler, and they had lots of reasons to stick with that.

So let’s dive into how packages work.

Packages

A package is a library of code together with information about the code, such as the version number. A package manager is the tool that lets you download, manage, and if necessary create those packages. It has a command-line interface which communicates with the registry, which is the database of all the packages that have been published by users. These aren't all public (access can be scoped to a particular organisation, for example), but many are.

We’ll be looking at the most popular package manager for JS, npm. In the early years of npm, there were some issues that led some developers at Facebook and elsewhere to create yarn, a similar alternative. However, npm is generally considered to have caught up with the main yarn improvements. There is also pnpm, which offers some speed and disk space improvements, but is less widely used. As a beginner, you should get used to npm for the foreseeable, since the others are similar to use anyway.

To use npm, you just need to download Node.js from the Node.js website - npm is included with it because it’s so central to JS development these days. You can find the list of npm packages on the npm website. To install a package, navigate to your project’s directory and run npm install followed by the package name, e.g. npm install react. There are built-in aliases (names for the same command) such as npm i, which you can use if your prefer. If you need to uninstall a package, you can use npm uninstall, which also has aliases such as npm remove or npm r.

The install command downloads a package from the registry. The installation is local by default - you can only use the package in the directory you installed it in. This is the standard approach nowadays and is useful when you need to use different versions of a package for different projects. However, you can install a package globally if you need to, using the global flag (option) -g, e.g. npm install -g react. Then it will be installed somewhere accessible everywhere (the exact location depends on the operating system).

Packages are downloaded to a folder npm creates in your project called node_modules, with a subfolder for each package. Most packages in turn have their own dependencies (sub-dependencies), which are also installed to your machine in the node_modules folder. This means the node_modules folder gets very big very quickly! You definitely don't want to commit this large folder into version control, so add the node_modules folder to your .gitignore file.

Security and npm

Please note that anyone can upload code to npm, and it relies on user reports to take down malicious code. In particular there is a practice called typosquatting, where malicious code is uploaded with a slightly different name from a popular npm package so users might install it by accident. So be careful what you install, and check the package information on the npm website, including the ‘weekly downloads’ figure, to determine if a package seems legitimate and reliable.

Using a package

Once you have installed a package, you can import it in any JS file. You might expect you need to specify the path to the folder in node_modules, but luckily bundlers allow bare specifiers, for example:

import React from "react"

When we do this, the bundler (or a plugin) knows to look for the dependency in node_modules and import it from there.

package.json

When you install a package, an entry is added to a file called package.json, which npm creates for you if you haven’t created it already. This lists your dependencies in the dependencies property, and the file should be committed to version control. That way, when someone else clones your project to work on it, they can just run npm install and all the dependencies listed in package.json will be installed for them.

The package.json is also used by package authors to specify information about their package. This includes the package name, description, version number, and devDependencies, which is dependencies they used to create the package, but the package users don’t need installed. There is also peerDependencies, which should be specified by plugin authors who are adding functionality to another package without using that package directly.

You will often see package users installing packages that aren’t used in the application code, such as testing libraries, with the --save-dev flag, so they are listed in devDependencies. This is not strictly necessary, since bundlers usually eliminate code that is not imported by your application (covered in the next article). However, it has become something of a standard practice, which you can follow if you like.

Peer dependencies are also worth understanding because since npm 7, npm automatically installs peer dependencies by default, meaning you may see an error relating to conflicting versions of a package being installed in your project. This error can be ignored with npm install --legacy-peer-deps, though it’s better in the long run to resolve the conflict.

Finally, you might see some custom properties in package.json that are not documented by npm. These can be used to specify configuration to build tools, which we’ll cover in Part 2.

Package versions

Now we can get to one of the main advantages and complications of package managers - managing package versions.

When packages are published, they are classified by a version number. The traditional approach in software is that any release has three aspects to it, represented by three numbers separated by dots: the 'major' release version for major feature additions (1.x.x), the ‘minor’ release version for minor feature additions (x.1.x), and the 'patch' release version for small bug fixes (x.x.1). These numbers are incremented whenever necessary, so a package could skip straight from version 1.1.0 to version 1.2.0 for example, if the change made was a minor feature.

npm packages are recommended to follow a strict, systemised interpretation of this approach called semantic versioning (often abbreviated to 'semver'). This aims to make working with dependencies more reliable. The main rule is that minor and patch releases cannot contain breaking changes, i.e. changes that would break code currently using the package.

When you install a npm package, you can specify a version using @ if that version is available, e.g. npm install react@16.14.0. But if no version is specified, and we don’t already have the package installed, npm will always install the latest version. In the example of npm install react, that may look like this in the dependencies property of package.json:

"react": "^18.2.0"

OK, so version 18.2.0 was installed. But what's that funny thing before the number? Well that's called a caret, and it complicates things a little.

In theory, every package user and author could always specify exact versions of packages. This can be configured for all installations by running npm config set save-exact=true, or for a particular package with npm install --save-exact. You then need to update packages individually when you are sure it's safe to do so (minor and patch releases aren’t supposed to break stuff - that doesn't mean they can't!). With this approach, package versions are specified in package.json with no caret.

There are couple of downsides to this: firstly, you have to track and manage all the versions of your dependencies yourself. Secondly, consider you install package A and package B in your project, but those packages depend on package C, and have specified slightly different versions of it. npm will have to install both versions of package C in your node_modules folder, which could be a waste of space - the packages probably would have worked fine with the same version of package C.

So npm allows some flexibility - we can specify acceptable ranges of package versions. The default is that new patch or minor versions of the package will be tolerated (since they should not break existing code). This is what the caret in package.json indicates. I can therefore run npm update (or npm install) with a particular package name, and the package will be updated to the latest non-major version. Or I can run npm update without arguments to update all my packages at once.

You may also see a tilde before the package version, e.g. "react": "~17.0.0". This means that only updated patch versions are allowed, so React 16.3.1 could be installed but not React 16.4.0. This can be configured with npm config set save-prefix='~'.

Locking package versions

Unfortunately there is a side-effect of allowing version ranges. Say someone else clones my repository and runs npm install, and in the time since I installed the packages myself, there have been some minor or patch releases. The package.json file stays exactly the same, but the other user gets slightly different versions of the packages, which could lead to different behaviour.

To get around this, npm introduced another file called package-lock.json, which is also created when you install packages. This file is sort of like a historical record: it shows the exact version of all the dependencies and sub-dependencies that were installed using npm install, and it should be committed to version control along with package.json.

When another user runs npm install in your project, npm will usually install the exact versions specified in package.lock-json rather than the more recent minor or patch versions that could be installed according to package.json. This ensures consistency of package installations while still maintaining the benefits of specifying dependency ranges.

For example, your package.json might list "react": "^18.1.0". Without a package-lock.json, the other user could get React 18.2.0 installed instead. But if the package-lock.json file says 18.1.0, that is what the other user will get.

There is one exception, in the scenario that package.json has been edited manually to specify a different version. If it was edited to say 18.2.0, for example, the other user will get that when running npm install, even though package-lock.json said 18.1.0. The installation causes package-lock.json to be updated to list 18.2.0.

Note: package-lock.json used to always win out in this case, but since users tended to see package.json as the 'source of truth' of their package versions, npm altered the behaviour in 2017.

Initialisers

So now you understand packages and the purpose of module bundling, but you still might be unsure how to get started building a JS application. One way to do this is by using a type of npm package known as an initialiser.

Initialisers are designed to configure your local environment for building a certain type of app, including setting up the bundler. The name of these packages begins with ‘create’, and they contain code that can be run on the command line to install certain packages and set up config files for you.

You can run these packages with the npm command npx, which temporarily installs and runs the package specified in your local directory, then removes the installation once its work is done. This makes it ideal for initialisers, since they are only needed once in a project. So an example would be npx create-react-app. You could also run npm init react-app (no ‘create’ in the package name), which is a slightly newer equivalent (init is also aliased to create).

Initialisers like this make it quicker to get up and running with a project. You can then to refer to their documentation to get an idea of what they are doing for you and how to continue. Usually you can also customise the setup to some degree to suit your needs.

One downside is that you need a fairly clear idea of your app’s requirements before you start, unless you want to spend some additional time untangling the config later. Another is that outsourcing configuration means you don’t really need to understand it. Usually in development this works fine until something doesn’t work as you want it to, at which point it causes you a headache. In this series we’re diving into the details that some people ignore, so you can make a properly informed decision of how to set up your project, whether you choose to use an initialiser or not.

Conclusion

We’ve covered lots of information about modules and npm here, but we didn’t get to how build tools work. I’ll cover that in the next article because it’s a big topic. First, let’s recap the key points:

Modules are key to modern JS, but the systems we use today are heavily influenced by the under-powered original design of HTML and the ‘historical’ developments since then. This is why we ended up using ESM, but not necessarily the browser implementation.
Using packages via Node.js is so popular partly because it allows us to bundle our code to be fetched with a single network request, as well as managing different versions of our dependencies more easily. However, it is not without its own complications and drawbacks.
Even though certain approaches have become very popular in the JavaScript community, you should always look to use the right tool for the job. That requires a solid understanding of the tools, which is what this series is all about.

Footnote: CDNs and npm

You may have heard of an open-source project called unpkg.com which uses Cloudflare to provide a CDN link for every package available on npm. This is a free service relying on sponsors, and it doesn’t provide any uptime or support guarantees. So the project author doesn’t recommend relying on it for anything critical to a business. However, it can be very useful for demos, personal projects and trying out packages without a local environment set up.

Sources

I cross-reference my sources as much as possible. If you think some information in this article is incorrect, please leave a polite comment or message me with supporting evidence 🙂.

* = particularly recommended for further study