This is me: π£.
And my thoughts while implementing a JavaScript monorepo using lerna and yarn workspaces, as well as git submodules.
Disclaimers
- The term
monorepo
seems to be controversial when it comes to project structuring, some may prefermulti-package
(lerna
itself once wasA tool for managing javascript monorepos
, it's nowA tool for managing JavaScript projects with multiple packages
) . - Not a step by step guide on tools, links to well maintained official docs will be provided.
- To record (not to debate) my own thoughts and details-of-implementation on 'monorepo'. Corrections and guidances are welcome!
Monorepo What and Why
TL; DR
Back to those early days in my web projects as a noob, typically I would create repositories like one named frontend
, another one named server
, separately maintained and git-versioned. In the real world two simple sub-repositories may not cover many of those complicated scenarios. Think about those lovely UI components you would like to pet and spread, and those clever utils/middlewares you want to extract and share.
frontend # a standalone repo
βββ scripts
βββ components
β βββ some-lovely-ui
β βββ ...
βββ index.html
βββ ...
server # a standalone repo
βββ utils
β βββ some-mighty-util
β βββ ...
βββ middlewares
β βββ some-clever-middleware
β βββ ...
βββ router.js
βββ app.js
βββ package.json
βββ ...
The noob structure
Yes, we must protect our innovative ideas, by creating a few more standalone repositories, which should turn the whole project into a booming repo-society.
webapp # standalone
βββ node_modules
βββ package.json
βββ .gitignore
βββ .git
βββ dotenvs
βββ some-shell-script
βββ some-lint-config
βββ some-lang-config
βββ some-ci-config
βββ some-bundler-config
βββ ...
server # standalone as it was
βββ node_modules
βββ package.json
βββ .gitignore
βββ .git
βββ dotenvs
βββ same-old-confs
βββ ...
whateverapp # say, an electron-app
βββ same-old-js # a standalone javascript-domain repo, again
βββ ...
some-lovely-ui # need to be independently bootstraped and managed
βββ same-old-setup
βββ ...
some-mighty-util # share almost identical structure
βββ same-old-structure
βββ ...
some-clever-middleware # inherit absolute pain
βββ same-old-pain
βββ ...
The real world?
With the help of the
link
command provided by yarn (-link) or npm (-link), you can easily try development features across projects and packages on the fly. Say, if you are developingProject A
andPackage B
simultaneously but as separate repos, and want use B as a dependency of A. Performyarn link
underPackage B
andyarn link B
underProject A
, you will find a symlink folder likepath/to/A/node_modules/B
(same result when you perform ayarn add
) which links to your still-active localPackage B
. Details beyond scope, dig for yourself.
So far so good, until then you quickly find yourself annoyed by what everybody tends to get rid of: Repository Bootstrapping, during which, if you care about maintainability and consistency, almost identical configurations have to be set for version control, dependency control, bundling, linting, CI, etc. meanwhile almost identical solutions have to be make to avoid madness, one of the baddest villains for example: The 'node_modules' π³οΈ.
The Silver Lining
While dirty jobs must not be avoided, there is still a silver lining hereβdirty jobs done once and for all, at least to get rid of the duplicated painfulness.
The approach is simple. Step zero, since all the repositories we've built are meant to serve the same big blueprint, joining them into one single repository sounds just modern and intuitive.
the [project] root
βββ apps
β βββ webapp
β βββ server
β βββ some-lovely-ui
β βββ some-mighty-util
β βββ ...
βββ ...
The what?
Such approach, looks like a history rewind. As I've not-very-deeply learned, many ancient projects in corporations used to be structured in a monolith
ic way, but gradually suffer from maintenance and collaboration problems. Wait, still?
What is the confusion? What is our goal by putting things together? Our wish:
- Being saved from redundant jobs.
- Promote code consistency
- Version control made easy
- Best practices possible for all sub projects.
MANAGEABILITY, I think.
Manageability Up
The [project] root
βββ apps
β βββ webapp
β β βββ package.json # sub-project manifests and deps
β β βββ lint-conifgs # sub-project-wide lint, can extend or override global confs
β β βββ lang-configs # sub-project-wide, can extend or override global confs
β β βββ bundler-configs # sub-project-wide
β β βββ README.md
β β βββ ...
β βββ server
β β βββ package.json # sub-project manifests and deps
β β βββ sub-project-level-confs
β β βββ ...
β βββ some-lovely-ui
β β βββ sub-project-level-stuff
β β βββ ...
β βββ some-clever-middleware
β β βββ ...
β βββ ...
βββ package.json # global manifests, deps, resolutions, root-only deps (husky for instance)
βββ .gitignore # git once for all
βββ .git # git once for all
βββ dotenvs # dotenvs for all
βββ shell-scripts # maintainance for all
βββ lint-configs # lint for all
βββ lang-configs # helpers for all
βββ ci-configs # publish made handy
βββ bundler-configs # bundler for all
βββ ...
The advanced structure
Here we've introduced several familiar faces into the root of the project directory, they are manifests or config files once only dwelled in each sub-project. This made these configs effect project-wide, allowing a baseline to be set and shared among all sub-projects, aka code consistency. A sub project may still hold its private-scope configs to override or extend the global standardβall thanks to the inheritance-like
feature in most dev toolchainsβif a variation has to be made, in many cases.
The model must be flexible. Take code linting for instance, to save lives, almost every well maintained framework, both frontend and backend included, has their own boilerplates and cli tools to the rescue, which usually have linting covered. Though many of them use eslint and promote the JavaScript Standard Style, there are ts / js variations, there are still tslint advocates, there are complexity between lint plugins, and for sure there can be strict-or-not conventions in your company. Wise choices have to be made on your own responsibility.
Bravo?
Let's now bravely call our project a monorepo
already! By the name we infer (?) that this is basically a project with all its ingredient parts in a single / monophonic repository. Meanwhile the ability of serving a project-wide but extendable development standard is made possible.
Manageability achieved! Now who be the manager?
Sir, We have a problemοΌ
-
The installing process for a JS project is never satisfying. It creates fat and tricky
node_modules
. Multiple projects in one?π Not human-life-saving: I have to
cd
and performyarn add
per sub-project folder.π Not battery-life-saving: A sub-project's deps are installed under its own directory. To the global scale, heavy loads of duplications are produced and will keep expand.
Cleverer ideas and methods needed for handling sub-project versions, and cross-d relations.
Introducing Lerna
In action, I didn't come across like having headaches resolving package dependencies, went search and found a solution. I've heard of
lerna
in the first place, found it handy while performing semver bumps, and then got to knowmonorepo
goodness.
As described on its website, lerna is a tool for managing JavaScript projects with multiple packages.
A lerna init command creates a new (or updgrade an existing project into a) lerna project, which typically structures like:
root
βββ lerna.json
βββ package.json
βββ node_modules
βββ packages
βββ packageA
βΒ Β βββ node_modules
βΒ Β βββ package.json
βΒ Β βββ ...
βββ packageB
βΒ Β βββ node_modules
βΒ Β βββ package.json
βΒ Β βββ ...
βββ ...
Looks like pretty much a lerna.json file introduced into our previous mono-structure. The file is the config file for your globally npm-installed or yarn-added lerna command line tool, a project-wide lerna should also be automatically added to root/package.json/devDependencies
.
A minimal effective lerna config be like:
// [project/root]/lerna.json
{
"packages": ["packages/*"],
"version": "independent",
"npmClient": "yarn" // or npm, pnpm?
// ...
}
The packages
entry is a glob list that matches the locations of sub-projects, for instance, "["clients/*", "services/*", "hero"]
should make valid sub-projects (having a valid package.json) directly located under clients
and services
, as well of the exact hero
project which located under the root, recognized as lerna packages.
The version
entry, if given a valid semver string, all packages should always share the same version number. "independent" means packages have different versions in parallel.
Useful Commands
-
lerna bootstrap (once, from any location, project wide):
π Install dependencies for every single package (sub-project only, root dependencies not included), no per directory by-hand installs.
π With a
--hoist
flag, can resolve duplication of common dependencies.βοΈ Link cross dependencies, same results (see lerna add and lerna link) as performing
yarn link
s per package lerna clean: Remove installs (purge the
node_modules
folder) from every package (root excepted)-
lerna version and lerna publish as lerna's selling point:
BETTER READ THE DOCS FOR THIS SECTION BY YOURSELF
You must be smart if you use conventional commits in your repo at the same time, it gives you much more advantages.
Use Conventional Commits
A repo who follows the Conventional Commits has its commit messages structured as follows:
<type>[optional scope]: <description>
[optional body]
[optional footer(s)]
We have husky the git hooks manager, as well as commitizen the commit util. They create interactive prompts as you
git commit
, simplifying the making of commit messages. Dig it for your self.
Informations provided in a conventional commit message correlate with the Semantic Versioning spec very well. Typically, given that a full semver number can be MAJOR.MINOR.PATCH-PRERELEASE
:
- As a possible value of the type section, a
fix
commit should stand for aPATCH
semver bump,. - A
feat
commit stands for aMINOR
bump. - The
BREAKING CHANGE
optional footer stands for aMAJOR
bump.
This makes it easier to write automated tools on top of.
Meanwhile with lerna, an illustrational workflow on conventional version bump
- Current package versions (independently versioned)
- Make some updates
- A
MAJOR
level performance updates on Package A, withperf(package-a)!: bump electron version
as the commit message. - A
MINOR
level feature updates on Package B, with afeat(package-b): add folder draggability
commit message. - A
PATCH
level fix on Package C, with afix(package-c/error-interception): fix type defs
. - No modifications on Package D.
- A
- Perform
lerna version
with the--conventional-commits
flag, the process and the results- Read current versions from the
package.json
s. - Read from git history (and actual code changes), determine what commit was made in what package.
- Resolve commit messages, generate corresponding version bumps.
- Once get confirmed, will:
- Modify
package.json/version
s. - Create a git commit as well as new version tags (the message format can be configued in
lerna.json
). - Push to remote.
- Modify
- Read current versions from the
- New versions
You should read the docs for prerelease bumps and more capabilities utilizing lerna.
Introducing Yarn Workspaces
Using lerna to handle package installs, though is applicable, is not a very good idea. Especially when you are having root-only dependencies, and when you are using Yarn (the classic version).
This article only focuses on yarn 1.x only. Having introduced the currently-not-very-widely-supported but advanced Plug'n'Play feature, Yarn 2.x has had very significant diverges both conceptually and functionally. It's even incubating its own release-workflow, fyi.
Hoist in Lerna
says this official blog from yarn, which also introduced yarn workspaces and its relationship with Lerna
With the above said, I don't really remember since which version, to solve duplicated installation problem, Lerna does provide a --hoist flag while it bootstrap
s.
root
βββ package.json # deps: lerna
βββ node_modules
β βββ typescript @4.0.0 # HOISTED because of being a common dep
β βββ lodash ^4.17.10 # HOISTED because of being a common dep
β βββ lerna # root only
β βββ ...
βββ package A
β βββ package.json # deps: typescript @4.0.0, lodash ^4.17.10
β βββ node_modules
β β βββ .bin
β β β βββ tsc # still got a tsc executable in its own scope
β β β βββ ...
β β βββ ... # typescript and lodash are HOISTED, won't be installed here
β βββ ...
βββ package B
β βββ package.json # dpes: typescript @4.0.0, lodash ^4.17.10
β βββ node_modules
β β βββ .bin
β β β βββ tsc # still got a tsc executable in its own scope
β β β βββ ...
β β βββ ... # typescript and lodash are HOISTED, won't be installed here
β βββ ...
βββ package C
β βββ package.json # dpes: lodash ^4.17.20, wattf @1.0.0
β βββ node_modules
β β βββ .bin
β β β βββ wtfdotsh # got an executable from wattf
β β β βββ ...
β β βββ lodash ^4.17.20 # only package C asks for this version of lodash
β β βββ watf @1.0.0 # package C's private treasure
β β βββ ...
β βββ ...
βββ ...
which means that common dependencies around the repo should get recognized and installed only once into the project/root/node_modules
, while the binary executable of each (if it has one) should still be accessible per package/dir/node_modules/.bin
, as required by package scripts.
However, still, this absolutely very positive feature is only available during lerna bootstrap
, while in most common cases we are installing new packages during development, using a package manager.
Plus, Lerna knows the disadvantages with hoisting, and it doesn't have a way to solve it.
So far with Lerna:
π Good for managing "macro"-scopic packages.
π¬ Bad at resolving microscopic dependencies.
- Easy-to-break package symlinks.
- None-desirable overhead control.
Nohoist in Yarn
Finally we welcome Yarn Workspaces on stage. And she comes with such a duty:
- She has Hoisting as her key feature.
- She knows the caveats of hoisting as well, and provides a
βno-hoist
option (very helpful, PLEASE DO READ THIS).
Its even easier to call her number, by modifying your existing repo/root/package.json
.
[root]/package.json
{
"private": true,
// pretty familliar setup like Lerna
"workspaces": ["workspace-a", "workspace-b", "services/*"]
}
This turn a repo into workspaces
Now, instead of lerna bootstrap
, calling yarn [install/add]
anywhere in the repo and anytime during dev, hoisting will be applied (honestly, more time consuming, but tolerable by all means).
What about nohoisting? Sometimes you don't want some package / workspace having some of there deps installed globally even though they share common versions. It's as simple as adding yet another entry with glob patterns.
[root]/package.json
{
"private": true,
"workspaces": {
// this even more like Lerna
"packages": ["workspace-a", "workspace-b", "services/*"],
// exceptions here, globs
"nohoist": ["**/react-native", "**/react-native/**"]
}
}
DETAILS? AGAIN, PLEASE DO READ THIS FINE BLOG FROM YARN.
Friendship
Its easy to notice similarities in the way Lerna and Yarn manifest a monorepo. In fact the integration of both is encouraged by Yarn and programmatically supported in Lerna.
[root]/lerna.json
{
"npmClient": "yarn",
"useWorkspaces": true
// ...
}
This join hands together
The above useWorkspaces
, once set to true
, we get Lerna to read package / workspace globs from package.json
instead.
Our original goal
- [x] A manageable monorepo
- [x] Package / Workspace versioning made easy
- [x] Low level dependency well controlled
Not an Intruder - Git Submodules
In my actual dev experience, I'd ran into scenarios as follows:
- I have to pick some package out, cuz I want opensource it.
- I am not satisfied with some certain dependency, I'd better fork it and constantly modify and use it in action.
A none-perfect solution
With Git Submodules, we can leverage git as an external dependency management tool as well. In a nutshell, it made possible placing a package inside a big repo, while having its private scope git storage. Details of implementation, please read the above links and this github blog.
For a quick peek, see this sample project structure:
root
βββ apps
β βββ auth-web # a lerna package / yarn workspace
β βββ electron-app # a lerna package / yarn workspace
β βββ ...
βββ nest-services # a lerna package / yarn workspace
βββ submodules
β βββ awesome-plugin # MUST NOT be a lerna package / yarn workspace
β β βββ node_modules # deps manually installed
β β βββ package.json # nohoist anything
β β βββ .git # havs its own git history with its own remote origin
β βββ some-framework-adapter # MUST NOT be a lerna package / yarn workspace
β β βββ .tsconfig.json # private configs
β β βββ .ci-conf # SHOULD have its own CI config
β β βββ .eslintrc # MAY break code consistency.
β β βββ .git
β β βββ ...
β βββ ...
βββ package.json
βββ lerna.json
βββ .gitmodules # the config for submodules
βββ .git # project git history
βββ ...
And this config:
# [root]/.gitmodules
[submodule "submodules/awesome-plugin"]
path = submodules/awesome-plugin
url = https://github.com/awesome-plugin
[submodule "submodules/some-framework-adapter"]
path = submodules/some-framework-adapter
url = https://private.gitlab.com/some-framework-adapter
Caveats:
- The implementation is tricky.
- Its recommended that a submodule should not be a Lerna package / workspace, meaning we should regard it as a completely standalone project, perform everything respectively.
- Can possibly break the code consistency.
USE WITH CAUTION.
Conclusion - your own responsibility
As I've been sticking with the Lerna-Yarn-Workspaces scheme for a while, questionmarks constantly emerge. Here are some notes of mine.
- Git commits must be strictly governed, or they could easily end up a mess. For instance, you should always avoid blending changes in various packages into one commit.
- Handle dependencies carefully. I've made mistakes while I was dealing with multiple Nestjs projects. Nest with the help of its CLI tool has its own monorepo mode. I radically tried to merge the Nest monorepo into the Lerna-Yarn-Workspaces one. So I moved all nest-ly common deps (say: express, typescript, prettier plugins) to the project root, make every nest workspace a yarn workspace. This ended up with warnings everywhere, breaking the overall ecosystem. Turns out I had to leave nest inside its own playground and find back inner peace.
I've also investigated the Rushstack a bit, another monorepo implementation from Microsoft. It works best with pnpm
and has many conceptual differences from Lerna. For me the most significant is it doesn't encourage root package.json, and they have their ideas on husky and pre-commit git hooks. Moreover its configs are somehow complicated, should be suitable for LARGE monorepos, in things like even detailed file permissions, I think.
I still use Lerna and Yarn for my own convenience and simplicity. And now the final question: Should I always PUT EVERYTHING IN, company-wide for example, like what some big firms does; Or should I be cool, do it project by project; or even completely avoid this approachοΌ
The answer? Maintaining monorepos isn't easy, weigh pros and cons on your own responsibility.
References
Monorepos in Git | Atlassian Git Tutorial
Guide to Monorepos for Front-end Code
Misconceptions about Monorepos: Monorepo != Monolith
License Compliance Question Β· Issue #673 Β· microsoft/rushstack
https://www.youtube.com/watch?v=PvabBs_utr8&feature=youtu.be&t=16m24s
[rush] Support Husky for git commit hooks Β· Issue #711 Β· microsoft/rushstack
[rush] Add support for git hooks by nchlswhttkr Β· Pull Request #916 Β· microsoft/rushstack
Top comments (1)
Recently I am considering Nx and Bit.dev, I am starting to agree with Nx's idea on its relationship with Yarn/Lerna as follows: