Read interactive version of this post on my blog.
Intro
Working on big projects brings many difficult challenges, keeping applications bundle size in check is one of them. When project grows, inevitably you will start separating big sections of features into separate modules or sub-applications, delegating development to other teams or, sometimes, even other companies. Not after long you have huge application, tens teams building hundreds of modules, all to be packed, bundled and shipped towards user.
Control of bundle size becomes critical at this point, one module, one bad apple, can just ruin everything. Fortunately webpack does a lot of optimisation under the hood, to make sure you ship as minimum code as required. However, and I witnessed this over and over again, there is still one simple mistake you can do that will prevent webpack from working its magic. Let's talk about that.
TL;DR
We all know at this point, webpack does "tree shaking" to optimise bundle size. Just in case, "tree shaking" is a term commonly used in the JavaScript context for dead-code elimination, or in simple words - export-ed code that wasn't import-ed and executed will be detected as unused, so it can be safely removed to decrease bundle size.
What you might not know, it's not the webpack that cleans up dead code per se. Of course, it does bulk of "preparation" work, but it is terser package that actually will *cut off * unused code. Terser is JavaScript parser, mangler and compressor toolkit for ES6+.
Let's lay this out - webpack will take your modules, concatenate them into chunks and feed to terser for minification (all of this, obviously, will happen only if optimization is enabled).
Time to highlight few key points:
- By default, webpack always will try to concatenate your code from different modules (files) into one scope and create a chunk from it later. E.g. moduleA.js imports few methods from moduleB.js will end up being chunk-[hash].js containing code from both files mentioned before within one scope, like it was written inside one file in the first place (essentially removing "module" concept). When it can't be concatenated though, webpack will register those files as modules, so they can be accessed globally via internal helper webpack_require later.
- By default terser doesn't cut off global references in your code (topLevel flag is false). E.g. you build some library with global scope API, you don't want it to be removed during minification. In essence, only somewhat "obviously" dead (unreachable) code or unused in near scopes code will be removed.
You probably saw this coming - terser can remove your unused export-s only if webpack scoped them in a way that unused declarations can be easily detected.
For optimization webpack heavily relies on the static structure of ES2015 module syntax, i.e. import and export key-words, and, as for now, doesn't work for other module types. We can see this ourself from the source.
ModuleConcatenationPlugin puts code from JavaScript files into "chunks" (basically concatenating code from different files into one scope). It bails though if it can not identify ECMAScript module exports.
Like you can see, messing up module interfaces prevents ModuleConcatenationPlugin (plugin for optimization) to do its job.
We all love and use babel to transpile modern ES syntax in our modules, but in this situation the babel-preset-env becomes a bad friend of ours - by default modules are transpiled to "commonjs" standard and that's precisely what we don't want when pulling together multiple packages into one application! We have to make sure to set modules: false in preset config. Webpack can do majority of its optimizations only for Harmony modules!
Do not transpile modules with your babel setup! If you do so, webpack will treat files as a blobs, loosing ability to prepare chunks properly for terser to cut off unused code.
Well, technically it's not that straightforward, of course. Webpack does ton of processing on its side in order to build the concatenated code, it does track of provided and used export-s on its side as well, before even calling terser, so "combined" code with all modules is still valid for terser. But once again - it won't work for anything else but static ES module syntax.
Under the hood
There is quite a complex process going under the hood, starting from you passing webpack.config.js to compiler and before bundle is generated. We'll touch slightly the parts that are interesting for our discussion.
Compilation phase is where all fun happens, below you can see its main steps.
Ultimately, during compilation webpack builds dependency graph for the entry point specified in your webpack.config.js (or several of them, if configuration specifies multiple entry points).
In essence, the idea is to start from the entry point, go through dependent modules, build them and connect together - modules are nodes and dependencies are connections of the graph.
(0) | Start for entry module (Compilation.js#1033) |
(1) | Build module (Compilation.js#1111) |
(2) | After build process module dependencies (Compilation.js#1095) |
(3) | Add dependencies to module (Compilation.js#843) |
To build module means to generate AST while extracting all needed information (export-s, import-s etc.). Webpack relies on acorn.Parser (from acorn) to build and process AST.
Next comes optimization phase.
FlagDependencyUsagePlugin
identifiesusedExports
and assign to themodule
.
FlagDependencyUsagePlugin hooks into the compilation phase and identifies usedExports. Basically, the idea is to find what "moduleA" imports from "moduleB", to set its usedExports. This process requires a lot of recursive traversing and "counting references".
As you know, webpack has pipe of plugins working on events, if you want to learn more, check out my other post Tapable library as a core of webpack architecture.
FlagDependencyUsagePlugin.js follows what HarmonyImportDependencyParserPlugin.js found about dependencies usage.
(1) | Once importSpecifier is detected, variable will be marked as "imported var" for further tracking |
(2) | Listen to calls (AST element method call), i.e. webpack is smart, imported method doesn't necessary means it's used, it needs to make sure it's called as well |
(3) | Called imported method detected and saved as dependency (later going to be inside usedExports for imported module) |
Webpack is smart, imported method doesn't necessary means it's used, we need to make sure it's called as well.
Once again, in order for this to work, import-s/export-s should remain in the package (not transpiled).
Interesting finds
There are way too many interesting things I've noticed in the source code of webpack that should be to mentioned. It probably needs a separate post.
I'll highlight just a few of them.
Remember that error when you run webpack for the first time, but forgot to install webpack-cli package? They are not peerDependencies, so webpack provides quite useful guidance for users on how to solve it.
Simply try to resolve the package, thrown error means it's not installed!
Another rather big surprise, how many independent packages-dependencies webpack has. Literally for everything:
1) tapable package for event-driven architecture
2) terser for minification
3) acorn for AST processing
4) watchpack to watch file changes
That's obviously very nice, hence all of them can be reused for different purposes in other tools!
Read interactive version of this post on my blog.
Top comments (0)