Jon Lauridsen

Posted on Jun 26, 2021 • Edited on Jan 29, 2022

Exploring the Typescript Monorepo (a practical, hands-on adventure)

#node #typescript #productivity #monorepo

The Two Extremes of Code Organization
- Files & Folders
- Everything's a Repository
Finding the middle ground
- A note on TypeScript
I need your help!
So what's the plan?
A quick word before we begin
The Attempts

I like simple code, in the "low coupling, high cohesion" sort of way where code is packaged into understandable, self-contained clumps that does an understandable "thing". That way I don't have to understand everything all at once, instead I can get an overview at the high-level and dive into the details when relevant to the work that needs doing.

We all chop our code into understandable abstractions already: We write functions and classes into separate files and folders. But as our project grows so does the need to keep organizing our code's abstractions, and at some point a project becomes too overwhelming if the only organizing tool is files & folders.

The Two Extremes of Code Organization

This code-organizing dynamic can be thought of as a spectrum, and if we put "files & folders" as the least extreme solution what's the most extreme approach? That's where we split all our code into separate repositories, so our product ends up entirely composed of generic "lego blocks" that snap together and none of the individual parts know about each other. But both these extremes have problems:

  Files & Folders ◄─────────► Everything's a Repository

Files & Folders

This is a great place to start a new project, basically all projects should start here. But there is a scale challenge. Given constant growth it becomes increasingly difficult to keep sub-systems decoupled, because there are no hard separations between systems: Files and folders inevitably degrades into a code-jungle where search-results return too many hits, auto-complete gives too many suggestions, and modules easily end up importing each other in ways that couples concepts together. If you're the original author you might not see that degradation, but newcomers will be increasingly confused and slow to get up to speed. At some point it just becomes too much for newcomers to get an overview, and if you do nothing the code-jungle will spread and suffocate development, and will be a source of countless frustrations and bugs.

Everything's a Repository

On the other side of the spectrum is the Everything's a Repository pattern, where we turn every abstraction into its own separate repository that can be used by possibly many other products. It's like the ultimate open-source dream where all the code lives as independent lego-blocks, and our product just wires together a bunch of separate dependencies and all the details are taken care of by each of those separate projects.

The end result is complete code isolation: We can open a single repository and really focus on just that one code-concept, there's truly no code-jungle anymore 🎉.

But this is a dangerous path, it quickly turns into a different jungle: Precisely because each package is so isolated we now have a huge overhead for introducing changes, because each change has to be weaved into the intricate web of sub-projects.

The challenge is that an individual sub-package has no context of the overall product, so when we dive into one library to make a change we lose sight of the overall product. And it gets very frustrating dealing with the different dependencies and their versions, e.g. if we upgrade one sub-package it becomes a manual process of going through its consumers and make them pull in the new version until we reach our product. And what if we then find the change to the library wasn't quite right for our product? It can be hard to replicate the exact needs of our product inside each library, and this back-and-forth quickly becomes very destructive.

With just a few separate repositories we'll be spending more time juggling versions and ensuring they all work correctly with each other than we do actually adding valuable changes to our product.

ℹ️ BTW this "multiple repositories" approach is great for open-source, because that's a low-trust, high-latency work environment where the workflow must be optimized for letting separate groups move at their own individual pace, but it is an extremely poor fit for a team whose value is their product.

Finding the middle ground

This article-series exists because I want to find ways to group code at higher levels than files & folders without suffering the drawbacks of multiple repositories. The Monorepo pattern is the solution, but there are pitfalls and multiple ways of organizing a monorepo that makes this a problem worth exploring.

This series is all about pragmatism: I expect you and I to be normal "in-the-trenches programmers" who just want to make products, and we don't have time for complex workflows or perfectly divine principles. We want a simple way to organize code into separate projects when and where it makes sense, so code can migrate towards their own apps or shared libraries when their size and complexity warrants it. We want to continuously manage complexity without getting sucked into the jungles of either extremes, and we want to do it in a way that is as straightforward as possible.

This pragmatism is important because we don't need to find perfection. We just need a straightforward way to extract code. Maybe that code is deep inside the product, maybe it's some hardcoded functions, maybe it's a concept that's been copy-pasted across multiple systems, maybe it lacks tests, whatever the case it's a shared pattern that just needs to be extracted without too much ceremony. It can be improved later, but right now we just want to put a box around it. After all, the whole product can be tested and deployed together, I just want a simple way to continuously refactor so I can avoid the code-jungle.

Basically we want to find the lowest barrier for grouping pieces of code, with as little technical and workflow overhead as possible to accomplish that.

ℹ️ BTW the Monorepo pattern is probably more usually seen where each package is versioned and published individually. That's a common pattern for open-source solutions. But that is explicitly not the goal for this series where we focus on a team that just wants to focus on their product, and want a way to organize the code so its easy to understand.

A note on TypeScript

For this guide we're using Nodejs + TypeScript, which unfortunately causes some (or all) of the complexities we're about to encounter. If you're coming from another language you may wonder why these articles exist at all because for you it's easy to extract code into local packages, but for worse or worse it's not that easy in the Nodejs + TypeScript universe… as we're about to see.

I need your help!

Spoiler: I don't know what I'm doing! I'm not a Typescript expert, I'm not a Monorepo guru, I can't offer the golden solution for this problem. I need your help to work through ideas and insights to explore the possible solutions. How do you organize your code? Do you have a preferred tool? I'm very interested in exploring what's out there.

So what's the plan?

First, let's go over the Files & Folders example so we have some starting point to use for exploring the different monorepo solutions. Then we'll move into actually trying various ways of pulling the code-jungle apart.

ℹ️ BTW, to keep our learnings easy we'll use a simple example to illustrate the complexity of a real product, but as a result it won't actually warrant any code-organizing. So please imagine the code is complex enough that we definitely need to reorganize 😅.

Let's pretend we're building a web-service called webby, and it's grown to this Files & Folders structure:

webby
├── package.json
├── prisma/
├── src
│  ├── analytics.spec.ts
│  ├── analytics.ts
│  ├── api.ts
│  ├── client.tsx
│  ├── index.ts
│  ├── logging.ts
│  ├── pages/
│  ├── server.tsx
│  └── types.ts
├── tsconfig.json
└── typings/

ℹ️ BTW, I've prepared the Files & Folders solution above via VSCode on GitHub1s.com if you'd like to explore the code yourself. You can also clone the repository (git@github.com:gaggle/exploring-the-monorepo.git) and check out the attempt-files-&-folders branch. It's also fine to just continue reading, as we'll cover all the necessary details as we get to them.

Depending on your experience-level you can maybe get a feel for the product just from this overview… Safe to say client.tsx relates to the frontend, so possibly server.tsx is the HTML-serving backend for that. That'd make api.ts a backend, but what does analytics.ts connect to? Maybe both? And maybe you don't know what that prisma folder is about? How do we know what areas connect to what?

And the package.json file doesn't give an overview either because it is an overwhelming superset of all the dependencies for the product, with no way to tell which one belongs to what part of the product.

If we put ourselves in the shoes of someone just getting started this lack of overview makes it difficult to get familiar with the product. If each file are hundreds of lines and contain dozens or more classes and functions it's going to be difficult to understand how it all fits together! This is one big project after all, so imagine search-results are giving back too many result, with too many similar-sounding functions, and tests are taking too long to run, and it's just too difficult to get a grasp of exactly how it all fits together so all just feels like a big soup of code that's hard to work in.

It's this lack of overview that what we want the monorepo pattern to improve on.

(At this point I want to make it clear that just adding more files & folders isn't the solution, because it won't make it easier to search, it won't help the tests to run faster, it won't help the overview. I realize our specific example is quite trivial, but I'm asking you to imagine this project is so massively complex that a junior hire comes in and clearly gets lost in what is to them a sea of folders, files, classes, and functions. The code itself may be well-factored, but we need a higher level of abstraction)

A quick word before we begin

Here's a cheat-sheet dependency graph of how the different modules actually relate to each other:

    ┌─────┐ ┌─────┐
    │ web │ │ api ├─┐
    └────┬┘ └┬────┘ │
         │   │      │
         │   │      │
         │   │      │
       ┌─▼───▼─┐   ┌▼──────────┐
       │ types │   │ analytics │
       └───────┘   └┬──────────┘
                    │
      ┌─────────┐   │
      │ logging ◄───┘
      └─────────┘

These are the "clumps of code" that we'd like to see separated into separate packages. Of course this just reflects my architectural opinions, but let's imagine we've arrived at this diagram together as a result of great collaborative meetings.

Starting web is straightforward:

$ npm ci
$ npm run web:start
> Started on port 3000

And ditto for api:

$ npm run api+db:start
[api] api started at http://localhost:3002

It's not really important what "webby" really is, but just to satisfy anyone curious web is a simple React frontend that queries api for data, and the actual "product" looks like this:

It isn't so important what it does though, we just need to re-organize it 😂.

The Attempts

Below is the list of attempts, please add suggestions for tools or methodologies I haven't tried, the whole point of this article-series is to learn the different ways of arranging code.

Oldest comments (6)

Andrei Dascalu • Jun 28 '21

"And what if we then find the change to the library wasn't quite right for our product?" - a little bit on the "why" thingie.

A library isn't used by the product. The whole point of a monorepo is to package a "product" in a way still allows separation between different applications, their common dependencies but keep them in a single repository.

But that's kindof the thing, even monorepo comes at a cost. It's great (like you said) when you want to make quick changes to the shared codebase but without the overhead of updating packages in each consume.

But the bit above isn't true. A change to a library serves its consumer. The procut is the whole of the applications of the monorepo, but still a shared component serves a subset of that. If a change to the shared codebase isn't right ... you do the same thing you do regardless of organisational pattern: you revert.

Thing is, once your application becomes production-worthy, the monorepo's cost becomes obvious when you start needing to be able to version components properly and maintain hard dependencies through the packet manager. The whole advantage of having independent applications (whether in a single repo or not) is to be able to deploy and version them individually and if you're indeed focused on the product that's pretty invaluable to be able to maintain integrity once in production.

When considering monorepo (as in, different apps in a single repo) it's often better to first consider a properly DDD'ed monolith.

Jon Lauridsen • Jun 28 '21 • Edited

I think I largely agree. What I write is from a practical point of view: I change the library to improve the product, but when I then use that change in the app I now see I didn't quite get the requirements right… So I go back to the library to do better. Maybe that means reverting and re-doing the change, or I can extend the original change further, but either way it's a problem for me when that back-and-forth takes too long or imposes too many restrictions.

The details differ depending on your workflow:
1) Change the library, version it, release it, realize it wasn't quite right. In this scenario it takes too long to go through changes.
2) Link the two together in a local development. Changes go fast now, but it is a hard requirement that the library has tests for all the app's use-cases (otherwise it doesn't make sense as a library). So landing changes becomes more strict.

I want a solution that separates the code, but doesn't introduce the hard requirements of versioning/releasing or tests.

At this point you're not wrong to suggest a monolith, that is definitely a solution here. If there's such a delicate coupling between the systems then separating them is probably too early. I buy your argument on this.

But for me it becomes problematic when that direction pushes me into the "code jungle" corner. I seek the practical option of being able to separate that code, even when the code is not conceptually well separated yet. I find by putting a box around it (making the library) I can start the process of separation. Suddenly the code sits there naked in its own project, making its points of coupling scream out.

I hope that clarifies the point you quoted and disagreed with.

I like your points on the drawbacks of monorepos, they're quite true. I see a lifecycle of a library or app beyond what this article describes. where a library may be individually versioned and deployed inside the monorepo, and even a further matured lifecycle where it escapes the monorepo entirely to become its own repository. I don't explore any of that in this article-series but the monorepo pattern will support all that quite smoothly.

Anyway, thanks for your feedback!

John Hartnup • Jun 28 '21

Have you looked at NX?

It’s an opinionated framework for building monorepos with Typescript.

Jon Lauridsen • Jun 28 '21 • Edited

I investigated it a month or so ago, but got lost in all its configuration. The problem I really struggled with is that it doesn't separate dependencies per-project, instead it globs them all together in one big root package.json. I really struggle with that because I very much want the clarity that comes from having separate package.jsons!

It's possible I should give it another try though. Do you think it'll somehow handle the "strict:false for analytics" case I'm butting heads with?