How Google manages open source

#google #opensource #monorepo #devops

Many people know that Google uses a single repository, the monorepo, to store all internal source code. The Google monorepo has been blogged about, talked about at conferences, and written up in Communications of the ACM.

Most of this has focused on how the monorepo impacts Google developer productivity and the ability to have software written by one team and used by many other teams. But I haven’t seen as much written about how it also impacts the way teams within Google consume and use open source.

The benefits of the Google monorepo

Just like internal code, third-party open source code is also imported into the monorepo under a /third_party prefix. There are a number of benefits to this approach:

Single version. Much like with internally developed libraries at Google, importing open source code into the monorepo ensures that the same version of a library is used in all applications rather than having a spaghetti of versions to understand and support across the many applications within Google.
Ease of updates. With a single version of the code in one place, updating an open source library either for normal maintenance or because of a critical security issue is much easier. You just have to update the project copy in /third_party and every application in the Google monorepo now gets built with that new version. You do, though, have to ensure that you haven’t broken the build of anything else in the monorepo.
Dependency clarity. By having a single location where every dependency is stored, Google engineers can easily see which things within the monorepo depend on a given open source library. Thus when doing an update for a security vulnerability, the developers who own the individual applications can easily be notified that they need to deploy new binaries with the fixed dependency.
Simplified licensing review. Licensing reviews can be done in a single location rather than requiring a new review any time an application wants to depend on a new-to-it library. As you can imagine, at Google scale, vast numbers of open source projects have already had their licenses reviewed and approved for use inside of Google.

It turns out that these same benefits that Google gets from a monorepo can also be valuable to most other engineering organizations using open source—even though not operating at Google scale. But most engineering organizations don’t have the human power or financial resources to ensure that they get them on their own.

After all, one of the main benefits of using open source to begin with is having access to a lot of common infrastructure components without having to write them from scratch yourself.

But most development teams still need to have a high degree of confidence that the software that they are using is being properly maintained. They need confirmation that it is licensed in a way that is acceptable to the organization, and they need to know that it is secure, or be notified when there are vulnerabilities.

At a basic level, most developers would love to have access to “known good” components like Google’s developers get when pulling from the monorepo, rather than the dependency roulette of bringing in new open source components without any sort of sanity check.

How to manage open source like Google

Every organization could benefit from managing open source like Google does. Fortunately, the Tidelift Subscription makes it easy for you to create customized catalogs of open source components that provide many of the benefits of Google's approach, without the need to maintain your own fork or invest in creating and maintaining your own monorepo.

With the Tidelift Subscription, you’ll be able to see the catalog of open source packages and releases you use across all of your applications. You can approve new packages as developers need them with workflow automation—developers request packages, and managers or architects review and approve.

You can disallow certain packages or package releases based on known security vulnerabilities or licensing concerns. Or you can centrally flag that a vulnerability that is largely theoretical in nature can be ignored not just once, but by every development team without requiring each one to painstakingly review the vulnerability and assess it on their own to pass some pre-deployment scanner test.

Partnering with open source maintainers

We even take it a step further by partnering with the maintainers of many open source packages to help ensure that they are well maintained, have clear licensing, and get timely security fixes as vulnerabilities are discovered. This is a win-win, because the more subscribers who use a project, the more its maintainers get paid, which means they have even more time and incentive to keep their projects well maintained and up to date.

As a Tidelift subscriber, you can set your own policies for how you would like to use open source projects within your organization—or you can just choose to accept our guidance entirely.

Customizing your catalogs

A catalog of managed open source within Tidelift can be consumed in lots of different ways.

Your developers can ensure that they are using appropriate packages and versions with our command line tool and request new ones as they discover a need.
You can add a check as a part of your continuous integration pipeline to ensure that nothing is built that uses components that haven’t been vetted.
You can plug into a central artifact manager (such as JFrog Artifactory) to only allow approved components to be downloaded.

Each option can be used individually or, for the most effective deployment, use all three!

If you are interested in learning more about best practices for managing open source dependencies, we can help. Talk to one of our experts or read more about the Tidelift approach to managed open source here.

Photo by Joseph Barrientos on Unsplash

Top comments (4)

Titus Winters • Jul 24 '20

Ugh, no. Please don't suggest that we (Google) are managing external projects well, or that this is automatable or simple in any way. Without build transparency, this is straight up always going to be a hard problem.

(See chapter 21 of "Software Engineering at Google.")