Pedro Figueiredo

Posted on Dec 30, 2023

Splitting and Caching React Chunks

#javascript #webdev #react #programming

There are only two hard things in Computer Science: cache invalidation and naming things. - Phil Karlton

Caching is still today one of the hardest things to get right. Nevertheless, it stands as an indispensable element for optimizing performance in any system, essential for unlocking its full potential.

For a SPA (Single Page Application), the perfect caching strategy would be to split our code into 2 different categories: common chunks and route chunks. That way, we could in theory only download the necessary JS/CSS code for a given route and re-use all the common (already cached), parts of the code.

Code Chunks

A "chunk," as defined by Webpack, refers to the distinct components that collectively form a bundle. In the context of Webpack, a bundle serves as the entry point for organizing and encapsulating these individual chunks of code.

Another property that distinguishes chunks is their independence, meaning that every one of them will be downloaded as a different source - e.g. a bundle with 4 chunks will lead to 4 HTTP requests.

Code Splitting

To deliver the best experience in a web application, it is crucial that the browser only downloads and executes the bare minimum JS/CSS that's needed for a given page/route and that it progressively downloads and executes the rest as needed.

To achieve these sorts of optimizations, code bundlers (e.g. Webpack, Rollup...) tend to provide a way to split chunks per route. This means that for each route visited, there will be a preset of chunks that will be downloaded by the browser, as the user navigates through the different pages/routes.

Above we can see a very simplistic view of how navigating into the /page1 route would lead to only downloading the necessary parts - common-chunk.js and page1-chunk.js.

Identifying the common code

For the most part, bundlers do a "good enough" work by creating multiple independent chunks that can be re-used across a web app, so that users only download the bare minimum that's needed for a given page. Take a quick peek at Webpack docs below on when it creates a new chunk automatically:



Webpack will automatically split chunks based on these conditions:

- New chunk can be shared OR modules are from the node_modules folder
- New chunk would be bigger than 20kb (before min+gz)
- Maximum number of parallel requests when loading chunks on demand would be lower or equal to 30
- Maximum number of parallel requests at initial page load would be lower or equal to 30

When trying to fulfill the last two conditions, bigger chunks are preferred.

However, we can probably do better with very little work, just by looking at our app and understanding which parts of the code we should extract into independent chunks.

While there is potential to take loads of time to optimize this process, we should probably stick to the ones that will have the most impact on our bundle size and that will be used in most routes, and these usually fall under one of the following categories:

[related vendors] react, react-dom and react-router-dom
[globally used vendors] lodash, css-in-js lib, etc...
[global styles] CSS Reset files, global styles files, etc...

Extracting common parts into chunks

Once we have identified all the common parts of code we want to extract into independent chunks, it's usually very straightforward to actually set this up in whatever bundler you are using.

Here is an example of how we could create a new chunk for all the react related vendors using webpack:



const path = require('path');

module.exports = {
 ...,
  optimization: {
    splitChunks: {
      cacheGroups: {
        reactVendor: {
          test: /[\\/]node_modules[\\/](react|react-dom|react-router-dom)[\\/]/,
          name: 'vendor-react',
          chunks: 'all',
        },
      },
    },
  },
};

This Webpack config uses splitChunks combined with cacheGroups to create specific code chunks that pass the given test regex:



reactVendor: {
  test: /[\\/]node_modules[\\/](react|react-dom|react-router-dom)[\\/]/,
  name: 'vendor-react',
  chunks: 'all',
},

After we apply this change to the config, Webpack will generate a new chunk that will include any code that comes from react, react-dom, or react-router-dom, with the name reactVendor.js.

The last step to get this working, is to include this newly produced asset in our HTML file:



<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8" />
    <title>My App</title>
    <script src="./vendor-react.js"></script>
  </head>
  <body>
    <script src="./app.js"></script>
  </body>
</html>

Since we need React to run our application, it's better to include it on the head tag, so it's picked up faster by the browser.

However, to fully leverage this new chunk, it's crucial to configure caching, ensuring it is downloaded only once and subsequently "reused" across all page navigations.

Caching Static Assets (Chunks)

To efficiently cache static assets, there are certain aspects we need to take into consideration:

what's the best strategy
when to invalidate the cache
how to invalidate the cache

In my opinion, the best way to cache static assets is a combination of CDN distribution with cache busting, so that's what we will discuss below.

CDN Distribuition

A content delivery network (CDN) is a geographically distributed group of servers that caches content close to end users. A CDN allows for the quick transfer of assets needed for loading Internet content, including HTML pages, JavaScript files, stylesheets, images, and videos. - Cloudflare

The CDN will be 1st opportunity for caching, since that's the system that we will be hitting when visiting a web app for the first time. Bear in mind that it will not, prevent the consumers of a web app from downloading the assets (that's the browser's work), but it will make the load time much faster by placing the static assets closer to the user's device, at least, once they are cached.

Enabling cache for CDNs is usually well-defined within their docs, but the most common practice is setting the Cache-Control header to public and max-age with a value greater than 0, like so:



Cache-Control: max-age=31536000,immutable

The best/simpler practice in this regard would be to add the Cache-Control header to every single asset BUT the ones that are NOT created by the bundler, meaning: .html, .png, .gif... And that's just because they don't pass by a bundling process that allows us to change their name depending on the content.

CDN Cache Invalidation

Invalidating the cache is a very sensible process and as such, it should be as straightforward as possible, so that we can minimize errors, such as delivering outdated assets.

The simplest and most effective strategy for this is to invalidate the cache every time we make a make a new deployment, for all the assets that are not part of our bundling process, such as:

.html
.png
.jpg
.gif
...

And this is simply because even if these assets change, they can still keep the same name and we don't really have a way to signalize the CDN if they are outdated or not.

Cache Busting

Cache busting is a technique used by web developers to force the browser to load the most recent version of a file, rather than a previously cached version. - keyCDN

We still need to somehow signalize both the CDN and the Browser if we should get a new asset or re-use the cached one. And the best way to do that is through cache busting.

Cache busting can be applied with different strategies, such as:



// File name versioning
"index.v2.js"

// File path versioning
"/v2/index.js"

// Query params versioning
"index.js?v=2"

// File content hashing
"[name].[contenthash].js"

File content hashing

From my experience, content hashing is the best cache busting strategy for 3 reasons:

1- we don't rely on past data (like in versioning)
2- we can still use cached assets even after new deployments
3- cache busting is done automatically by having differently named files when their content changes

In Webpack, to generate the chunk names using their content hashes, we can change the output format like so:



// webpack config
module.exports = {
  entry: {
    app: './src/app.js',
  },
  output: {
    // entry bundle chunks names
    filename: '[name].[contenthash].js', // 🆕
    // non-initial chunk names
    chunkFilename: '[name].[contenthash].bundle.js', // 🆕
    path: path.resolve(__dirname, 'dist/app'),
    // removes old bundles from the directory
    clean: true, // 🆕
  },
  ...
};

Once we build our web app with such configuration, we will have something similar to this:



📁 dist/

 ├── 📄 index.html

 ├── 📄 app.575d644de9307ca8621d.js

 └── 📄 vendor-react.7c8b137d61bff5cad6e7.bundle.js

Summary

1- Extract common parts of code into their own chunks;
2- Distribute your web app through a CDN;
3- Invalidate non-bundled assets on deployment;
4- Setup a caching strategy through cache busting;

Sources

Make sure to take a look at these sources if you wanna have a more in-depth understanding of caching and code splitting:

Conclusion

Distributing cached assets is the best way to spare the clients from downloading unnecessary code and having a much faster loading time.

One of the best ways to do it is by distributing your web app through a CDN and setting up a strategy that allows for invalidating new assets when their content changes.

Make sure to follow me on twitter if you want to read about TypeScript best practices or just web development in general!

DEV Community