loading...
Cover image for The issue with Monorepos
Squash.io

The issue with Monorepos

awenzi profile image Yuelin Wen Originally published at squash.io ・9 min read

A monorepo  (also called a monolithic repository) is an arrangement where a single version control system (VCS) repository is used for all the code and projects in an organization.  The alternative to monorepo is having multiple repositories for different projects or even functionalities. In that case, the system will be called a polyrepo. In this article, we will define the idea of a monorepo and identify some of its benefits and some famous companies using it. Finally, we will focus on whether a monorepo is suitable for everyone.  

By the end of this piece, you’ll notice that even though the monorepo system is used by many tech giants such as Google, Facebook, and Uber, it still has its challenges. I found the observation by  Google's spokesman interesting in this regard. She acknowledges that using monorepo isn't for everyone. But how does she support that argument? I’ll be focusing on some of her reasons by looking at the advantages and disadvantages of the system.

Why Some Tech Giants Employ a Monorepo setup

The fact that some of the biggest tech giants of our time are using a monorepo system implies that it has some benefits. What could be some of the great attributes of this setup which attract them to it? Let's look at the leading three.  

Extensive Code Sharing and Collaboration Across Teams

The main advantage of a monorepo is that all code exists in a single repository. Therefore, team members across an organization work in the same place. This results in a lowered sense of ownership and blurred boundaries. Consequently, there is an increased likelihood of code reuse, less code duplication, and more teams working on shared infrastructure. 

The unclear boundaries which such a system creates make changes made by other developers feel less invasive, a situation that fosters more interaction among developers and teams. As a result of these advantages, more useful libraries build-up and more projects get to share the same dependencies.

Simplified Dependency Management With a Single Version of Truth

Since a monorepo system places all the code in the same place, everything is built and updated at the same time. There is no need to reach out to other repositories and constantly check for potential breaks. For instance, technical debt generates in polyrepo when changes occur, but dependency updates don't immediately follow.  In a monorepo, changes in a dependency immediately lead to an update of the dependent code and no technical debt is accrued, assuming best practices are in place. 

In a monorepo, library updates are also made easier when files are not hosted in different repositories. Moreover, because everything is updated constantly, there will be only one version for all the dependencies, avoiding the diamond dependency problem. The diamond dependency problem occurs when two projects depend on one dependency. However, the challenge comes about when the projects depend on different versions of the dependency. The diamond dependency makes release without breakage very difficult. In certain instances, it can make the building of a project impossible. 

Atomic Changes/Large-scale Refactoring

When developers have one repository, they search through one database to make atomic changes across the system. On the same basis, code refactoring is made easier since developers do not have to locate different repositories. The layout of the codebase will be organized in a single tree so that developers can easily trace dependencies. For instance, Google gives each source file a unique identification by assigning the file path as their namespace.

If the Tech Giants are Using a Monorepo Setup, Shouldn’t We All?

So, if the monerepo system is employed by some of the biggest tech giants in the world, shouldn’t we be falling over each other to use the same system? Aimee Lucide, a mobile engineer on Uber’s Driver Signups team, seems to have provided the answer. She gave a speech about how Uber started from a monorepo and then switched to a polyrepo. However, Uber would later switch back to monorepo when the business grew huge and successful.

In her presentation, Lucide explained that the reason the monorepo became the rideshare company’s ultimate choice is that their system comes with an Android app studio, it has a single source of truth, and it makes code sharing easier. However, she also admits that her team quickly encountered some serious problems with the monorepo when their codebase scaled. As Uber’s mobile codebase got bigger, Aimee explains, her team started to experience an integrated development environment (IDE) lag.  Eventually, developers could no longer scroll without the code freezing up. She explained that every time they tried to build, git slowed down. The reason was that the whole exact clone would have to pull out. 

When the codebase is huge, it consumes a huge portion of the developer’s time when they try to work on a project; even the tiniest one. When everyone is working on a single repository, a broken master sometimes occurs. Broken master happens when one person accidentally breaks the code. Consequently, others get unexpected errors that they can’t figure out.

So why did Uber switch back to a monorepo while they still didn't have the answer to the problems encountered on the system? Uber got big enough to throw its resources at the problem.  In Lucide’s own words: “As your company starts to get bigger, you can actually invest resources to mitigate some of the original downsides.”

From Lucide’s answer, it is clear that the monorepo challenge persists. What’s different is that Uber can now throw money at them. So, the real question for anyone who wants to use a monorepo is simple: Are you willing to invest as much as Google, Facebook, Twitter, and Uber? For many of us the answer is equally simple: No. 

What’s Good For Google is Not Good For All 

In the same way buying out Chelsea Market may not be as much a smart move to you as it could be to Google, the benefits of a monorepo to Google may just not be advantageous to you. As noted above, the tech giants are able to use the monerepo system because they have the resources to throw at the challenges. Hence, one could say the reason is not based on the fact that the system has more benefits for everyone. 

Let us take a closer look at the weaknesses of some of the supposed benefits that the tech giants like Google see in the monorepo system. This will make you realize why a monorepo may not be for everyone. 

Extensive Code Sharing and Collaboration Across Teams

Because a single repository hosts all the projects, code reuse must be more frequent, and code duplication must be lower right? This is not always the case. Why? 

Even in a medium-sized monorepo codebase, much like the case for Uber, mentioned above, it doesn’t really make sense for a developer to grab the whole extract clone of the entire repository. Not only does it take a long time, but it also creates a lot of unnecessary trouble in either searching or editing. Therefore, there are typically two methods required for any monorepo that hopes to scale.

Virtual File System (VFS) that Enables Developers to Access Only a Portion of the Code Locally

According to Rachel Potvin, engineering manager at Google, Google has about 2 billion lines of code power. She was speaking at a Silicon Valley conference. This is for all the Google internet services such as Google Search, Google Maps, Google Docs, Google+ and so on. Despite years of experiments, there was still not an optimal open-source version control system that supported their code in one repository. 

To solve the challenge, Google built its own version control system called Piper. This system came with features that help developers to browse and change codes in this vast pool of lines. One feature mentioned by Potvin in her Silicon Valley conference presentation which is used by developers to access Piper is CitC. CitC allows developers to browse through the repository without having to clone or sync. While only being able to modify files stored in the developer’s machine, developers can browse and edit files freely across the whole repository.

Sophisticated Source Code Indexing and Tooling for Code Searching and Discovery 

Because of the complexity of a monorepo version control system, a tool to help perform a search across the entire repository is crucial. Google invested in building a custom plug-in for the Eclipse IDE to make handling billions of lines of code possible. The Code-indexing of Google also supports static analysis cross-referencing. These are the type of investments required when codebases scale in a monorepo.

Even with tools that satisfy the requirements, developers will only get to access small portions of the code when working on a project. How different is it to view a portion of the tree using a Virtual File System when compared to looking at different repositories? Tools for easily iterating over many different repositories and collecting the results are readily available. The scaling of the codebase, to a large extent, makes working with a single VCS much similar to working with multiple repositories because of the tool requirements. Collaboration and code sharing has, in fact, more to do with engineering culture than ways to store files.

Simplified Dependency Management/One Version of Truth

It is unreasonable to believe that since a monorepo has all code in a single repository and everything is built at the same time, dependency management is unnecessary. At scale, huge efforts are required to help update on a large scale and run automated tests to ensure code health in place of the usual dependency management in polyrepo. 

Because it is easy to create dependencies, developers put less effort into establishing well-thought-out APIs. Abandoned projects can also be under the radar and continue to be maintained and updated, leading to an unnecessary loss in productivity. How do the giants deal with this challenge? 

Google invested in creating new tools beyond standard tools and human efforts to deal with codebase complexity, underused/unnecessary dependencies, and code discovery. On top of that, more investment is put into making tools and indexing. This creates better search and browsing.

Moreover, not all the projects are deployed at the same time, previously deployed software can be redeployed if needed. This means that in a monorepo, multiple versions of code exist, build artifacts will have to be carefully tracked, much like what happens in polyrepo, where different versions of dependencies need to be reconciled and managed.

Atomic Change/Large Scale Refactoring

A monorepo system may be counter-intuitive. Editing or searching across the whole codebase at the local machine is a much more difficult task than you might expect.

While implementing a very sophisticated VFS, as mentioned above, makes using monorepo or polyrepo the same, making changes to a fundamental library API and having it reviewed by affected teams can be impossible because of the merge conflict in the case for a monorepo. When it happens, developers, much like they do in a polyrepo, will do one of two things: working around the API issue or deprecate the current API and replace it with a new one.

To make refactoring across large amounts of codes, automated tooling for refactoring will need to be in place to make the whole process possible.

More Issues to Look out for in a Monorepo

Anyone who wants to use a monorepo system will need to be conscious of other issues with the system. I focus on some of the leading ones below. 

Tight Coupling

Because of the nature of monorepos, certain issues can be experienced. These include unclear boundaries, less sense of ownership, and collaboration. Consequently, projects in a monorepo are prone to depend on each other, creating tight coupling. This leads to brittle software. Brittle software happens when any small fixes can imply alternation in many other parts of the codebase making maintenance impossible without fracturing the whole system.

A polyrepo, on the other hand, can mitigate this kind of issue by offering defined boundaries and responsibilities to each team.

Broken Master

Broken master is also a result of the unclear boundaries in a monorepo. All failing tests on the master are considered to be a broken master. The broken master can seriously reduce productivity if left unsolved. Therefore, before any monorepo projects start, ways or tools to eliminate broken master are required.

The broken master does a lot less harm in a polyrepo because a break in one or two repositories doesn’t have huge impacts on others.

Is it Good for Everyone?

The answer to our question regarding whether a monorepo is good for everyone is simple: No.  

No one can question the fact that a monorepo system has its advantages. That is the reason why big tech companies line up to use it. However, there are some who believe the amount of effort put into simply keeping projects in a monorepo VCS running is impractical for smaller organizations. I will leave you with some questions if you may want to switch to the monorepo system. How much is the sense of collaboration worth to your business? Is your company willing to invest more to get the benefit of the system? After all, like Rachel Potvin said in her Silicon Valley conference presentation, monorepo is not for everyone. Therefore, corporations have to look at their specific contexts, think hard, and select the system that best meets their unique needs.

Posted on Apr 3 by:

Squash.io

On demand staging/dev/QA environments for web apps and microservices. Save time and iterate faster with disposable staging sites for each branch of code.

Discussion

markdown guide
 

My company uses a polyrepo setup. Each in-house project/library has its own repository.

However, we handle the external dependencies issue by having a single dedicated repository housing (most) of our third-party library dependencies — the statically linked ones, anyway.

This ensures the team (and the CI!) is always building against the same dependency versions. It also allows fixes to be pushed company-wide, whether it comes from the third-party library's maintainer or from us. Our default build systems and scripts look for this repository in a predictable relative location, but it is trivial to override it for any individual dependency. That way, if someone needs to pull in or use a different version of one library, they can do so.

The result is that builds are smooth as butter. :)

One might wonder why we wouldn't have a repository for each individual third-party dependency, but it works for the exact same reasons large corporations benefit from a monorepo, chief of which being simplicity. Each of those libraries are small, and a developer usually needs most or all of them. If one third-party library gets unruly, it's simple to move it out into its own repository. The team addresses that on a case-by-case basis.