- The Two Extremes of Code Organization
- Finding the middle ground
- I need your help!
- So what's the plan?
- A quick word before we begin
- The Attempts
I like simple code, in the "low coupling, high cohesion" sort of way where code is packaged into understandable, self-contained clumps that does an understandable "thing". That way I don't have to understand everything all at once, instead I can get an overview at the high-level and dive into the details when relevant to the work that needs doing.
We all chop our code into understandable abstractions already: We write functions and classes into separate files and folders. But as our project grows so does the need to keep organizing our code's abstractions, and at some point a project becomes too overwhelming if the only organizing tool is files & folders.
This code-organizing dynamic can be thought of as a spectrum, and if we put "files & folders" as the least extreme solution what's the most extreme approach? That's where we split all our code into separate repositories, so our product ends up entirely composed of generic "lego blocks" that snap together and none of the individual parts know about each other. But both these extremes have problems:
Files & Folders ◄─────────► Everything's a Repository
This is a great place to start a new project, basically all projects should start here. But there is a scale challenge. Given constant growth it becomes increasingly difficult to keep sub-systems decoupled, because there are no hard separations between systems: Files and folders inevitably degrades into a code-jungle where search-results return too many hits, auto-complete gives too many suggestions, and modules easily end up importing each other in ways that couples concepts together. If you're the original author you might not see that degradation, but newcomers will be increasingly confused and slow to get up to speed. At some point it just becomes too much for newcomers to get an overview, and if you do nothing the code-jungle will spread and suffocate development, and will be a source of countless frustrations and bugs.
On the other side of the spectrum is the Everything's a Repository pattern, where we turn every abstraction into its own separate repository that can be used by possibly many other products. It's like the ultimate open-source dream where all the code lives as independent lego-blocks, and our product just wires together a bunch of separate dependencies and all the details are taken care of by each of those separate projects.
The end result is complete code isolation: We can open a single repository and really focus on just that one code-concept, there's truly no code-jungle anymore 🎉.
But this is a dangerous path, it quickly turns into a different jungle: Precisely because each package is so isolated we now have a huge overhead for introducing changes, because each change has to be weaved into the intricate web of sub-projects.
The challenge is that an individual sub-package has no context of the overall product, so when we dive into one library to make a change we lose sight of the overall product. And it gets very frustrating dealing with the different dependencies and their versions, e.g. if we upgrade one sub-package it becomes a manual process of going through its consumers and make them pull in the new version until we reach our product. And what if we then find the change to the library wasn't quite right for our product? It can be hard to replicate the exact needs of our product inside each library, and this back-and-forth quickly becomes very destructive.
With just a few separate repositories we'll be spending more time juggling versions and ensuring they all work correctly with each other than we do actually adding valuable changes to our product.
ℹ️ BTW this "multiple repositories" approach is great for open-source, because that's a low-trust, high-latency work environment where the workflow must be optimized for letting separate groups move at their own individual pace, but it is an extremely poor fit for a team whose value is their product.
This article-series exists because I want to find ways to group code at higher levels than files & folders without suffering the drawbacks of multiple repositories. The Monorepo pattern is the solution, but there are pitfalls and multiple ways of organizing a monorepo that makes this a problem worth exploring.
This series is all about pragmatism: I expect you and I to be normal "in-the-trenches programmers" who just want to make products, and we don't have time for complex workflows or perfectly divine principles. We want a simple way to organize code into separate projects when and where it makes sense, so code can migrate towards their own apps or shared libraries when their size and complexity warrants it. We want to continuously manage complexity without getting sucked into the jungles of either extremes, and we want to do it in a way that is as straightforward as possible.
This pragmatism is important because we don't need to find perfection. We just need a straightforward way to extract code. Maybe that code is deep inside the product, maybe it's some hardcoded functions, maybe it's a concept that's been copy-pasted across multiple systems, maybe it lacks tests, whatever the case it's a shared pattern that just needs to be extracted without too much ceremony. It can be improved later, but right now we just want to put a box around it. After all, the whole product can be tested and deployed together, I just want a simple way to continuously refactor so I can avoid the code-jungle.
Basically we want to find the lowest barrier for grouping pieces of code, with as little technical and workflow overhead as possible to accomplish that.
ℹ️ BTW the Monorepo pattern is probably more usually seen where each package is versioned and published individually. That's a common pattern for open-source solutions. But that is explicitly not the goal for this series where we focus on a team that just wants to focus on their product, and want a way to organize the code so its easy to understand.
For this guide we're using Nodejs + TypeScript, which unfortunately causes some (or all) of the complexities we're about to encounter. If you're coming from another language you may wonder why these articles exist at all because for you it's easy to extract code into local packages, but for worse or worse it's not that easy in the Nodejs + TypeScript universe… as we're about to see.
Spoiler: I don't know what I'm doing! I'm not a Typescript expert, I'm not a Monorepo guru, I can't offer the golden solution for this problem. I need your help to work through ideas and insights to explore the possible solutions. How do you organize your code? Do you have a preferred tool? I'm very interested in exploring what's out there.
First, let's go over the Files & Folders example so we have some starting point to use for exploring the different monorepo solutions. Then we'll move into actually trying various ways of pulling the code-jungle apart.
ℹ️ BTW, to keep our learnings easy we'll use a simple example to illustrate the complexity of a real product, but as a result it won't actually warrant any code-organizing. So please imagine the code is complex enough that we definitely need to reorganize 😅.
Let's pretend we're building a web-service called webby, and it's grown to this Files & Folders structure:
│ ├── analytics.spec.ts
│ ├── analytics.ts
│ ├── api.ts
│ ├── client.tsx
│ ├── index.ts
│ ├── logging.ts
│ ├── pages/
│ ├── server.tsx
│ └── types.ts
ℹ️ BTW, I've prepared the Files & Folders solution above via VSCode on GitHub1s.com if you'd like to explore the code yourself. You can also clone the repository (
email@example.com:gaggle/exploring-the-monorepo.git) and check out the
attempt-files-&-foldersbranch. It's also fine to just continue reading, as we'll cover all the necessary details as we get to them.
Depending on your experience-level you can maybe get a feel for the product just from this overview… Safe to say
client.tsx relates to the frontend, so possibly
server.tsx is the HTML-serving backend for that. That'd make
api.ts a backend, but what does
analytics.ts connect to? Maybe both? And maybe you don't know what that
prisma folder is about? How do we know what areas connect to what?
package.json file doesn't give an overview either because it is an overwhelming superset of all the dependencies for the product, with no way to tell which one belongs to what part of the product.
If we put ourselves in the shoes of someone just getting started this lack of overview makes it difficult to get familiar with the product. If each file are hundreds of lines and contain dozens or more classes and functions it's going to be difficult to understand how it all fits together! This is one big project after all, so imagine search-results are giving back too many result, with too many similar-sounding functions, and tests are taking too long to run, and it's just too difficult to get a grasp of exactly how it all fits together so all just feels like a big soup of code that's hard to work in.
It's this lack of overview that what we want the monorepo pattern to improve on.
(At this point I want to make it clear that just adding more files & folders isn't the solution, because it won't make it easier to search, it won't help the tests to run faster, it won't help the overview. I realize our specific example is quite trivial, but I'm asking you to imagine this project is so massively complex that a junior hire comes in and clearly gets lost in what is to them a sea of folders, files, classes, and functions. The code itself may be well-factored, but we need a higher level of abstraction)
Here's a cheat-sheet dependency graph of how the different modules actually relate to each other:
│ web │ │ api ├─┐
└────┬┘ └┬────┘ │
│ │ │
│ │ │
│ │ │
│ types │ │ analytics │
│ logging ◄───┘
These are the "clumps of code" that we'd like to see separated into separate packages. Of course this just reflects my architectural opinions, but let's imagine we've arrived at this diagram together as a result of great collaborative meetings.
web is straightforward:
$ npm ci
$ npm run web:start
> Started on port 3000
And ditto for
$ npm run api+db:start
[api] api started at http://localhost:3002
It isn't so important what it does though, we just need to re-organize it 😂.
Below is the list of attempts, please add suggestions for tools or methodologies I haven't tried, the whole point of this article-series is to learn the different ways of arranging code.