Santiago Carmuega

Posted on Aug 5, 2020

Finding middle ground between Monorepo and Polyrepo

#productivity #architecture #devops

A lot has been said against and in favor of each option. The discussion seems as visceral as tabs vs spaces.

Can't we find middle ground between the two?

Nobody really likes having a single 30gb-sized repo, with 200 developers pushing commits at all times, with 1000+ branches, 10+ languages and a myriad of incompatible toolchains.

At the same time, is very frustrating having to constantly switch between 20 different repos in a single day, opening and closing another 20 matching PRs, while publishing 10 different libraries just get your shinny new button into the frontend.

How do you strike a good balance between the two?

(FYI, my questions are not rhetoric, I'm really asking)

I'm inclined to think that the "scope" of a repo should be tied to two, sometimes contradicting, dimensions:

ownership boundaries
bounded contexts

Repo by Ownership Boundary

Having a single repo for all of the code owned by a single team is very practical, and by "team" I mean a group ~10 developers working together on the same backlog.

It will never grow to the size of a "mega repo" (unless your team is composed by 10x developers working 24x7 hours 😉), so you shouldn't hit any VCS limits.

Your repo interactions (PRs, issues, branching, labels, etc) can reflect a workflow tailor-made for the team, instead of compromising a workflow that fits the whole company.

The stack / toolchain of the repo should (?) be relatively cohesive since it only contains stuff selected by the members of the team.

Last but not least, if you live by the "you build it, you run it" motto, all of the CI/CD workflows would fit very nicely within the scope of this unique repo.

Repo by Bounded Context

Imagine a relatively simple subsystem within your company, composed of a web frontend, an API layer and a couple worker services. All of the above nicely orchestrated to perform intrinsically related tasks (as @alxgrk posted in the comments, a concept very well explained by DDD as Bounded Context).

Having all of the components in a unique repo allows us to push new features that span changes across all of the layers in a single cohesive and self-contained PR.

If you need to change something in your API contracts, the same PR would include the changes to the frontend layer that consumes the API, and the changes to workers that execute the actual task. Very easy to visualize, review and to fit in your mental model as a whole.

As a bonus, your deployment pipeline jobs (meaning test, QA, rollout, etc) would have a 1:1 relationship with your PRs. Nice for changelogs, awesome for rollback procedures.

Sadly, these two dimensions don't always go hand-in-hand. For example, when companies split their developers between frontend and backend teams (a very common example of independent teams handling very coupled components).

In a perfect world, your team has full ownership over a cohesive set of components, coupled between each other as much as they want, but very loosely coupled with the outside world. Under this assumption, I would most certainly prefer a single repo to manage all of the team's components.

Wouldn't you?

Top comments (3)

Alexander Girke • Aug 6 '20

Thanks for this interesting text, I really like that you advertised the middle ground in that discussion.

I'm personally not a huge fan of monorepos, so your suggestion to cut based on team boundaries sounds totally reasonable to me - under the condition, that the team controls each aspect of its "bounded context" (from Domain-Driven Design), which is not always the case as you mentioned.
So may I suggest to reformulate and say: "a repo's boundaries should match the boundaries of the bounded context its components belong to"? Take an e-commerce platform as an example: there would then be one repo for "purchases", one for "product search", etc. This would ideally match your second dimension "component coupling".

What do you think about this?

Santiago Carmuega • Aug 6 '20

That sounds very reasonable, thanks for bringing it up. I'll update the post to reflect the relationship to bounded contexts.

On that subject, I'm sure that a repo shouldn't contain partial implementations of a bounded context, but should a repo be allowed to contain more than one bounded context if they are all managed by the same team? I have mixed feelings about it.

Alexander Girke • Aug 6 '20

Good point! Probably, if there is a repo per bounded context and a team works on multiple contexts, that team should have multiple repos.

DEV Community

Finding middle ground between Monorepo and Polyrepo

Repo by Ownership Boundary

Repo by Bounded Context

Top comments (3)

Read next

Installing Kubernetes using Kubeadm utility

Launching EC2 Instances with AWS CLI and Advanced Features

kentaka: Exploring DevCycle API Integration

Beginners Guide To CDN