CBOI: Continuous Build, Occasional Integration

Is your organization practicing CBOI? If you haven't heard this hot new industry acronym, it stands for "Continuous Build, Occasional Integration." A lot of big companies are using this technique. It's a different way of approaching Continuous Integration (CI).

By different, I mean a lot worse.

In fact, your organization should not practice CBOI. So why write an article about it? Because, sadly, most organizations who claim to do CI are actually doing CBOI. I'll explain why that is, and how you can stop.

What is CI?

Let's break down the terms a bit to start. "Continuous" is just a way of saying "infinite loop" - we trigger on every change or on a regular interval, and give feedback to the development cycle, such as alerting developers that they broke an automated test. Easy, and not controversial.

"Integration" is a much more nuanced term. In most software shops, what we mean here is that we bring together the artifacts from independent engineering teams into a functioning system. A common example that I'll use in this article is a Frontend and a Backend.

In a small organization, with only a few developers, Integration isn't much of a problem. Every engineer develops on the whole stack, and runs the complete system locally. As the organization scales, however, teams break up and specialize. The full system is eventually too complex to fit in one person's head, though the Architect tries mightily. The more the org structure gets broken up, the more different software systems diverge and the harder it is to guarantee that the code they're writing works when integrated.

In order to perform Continuous Integration, then, you need an automated way to integrate the full stack. In working with a number large companies, I've rarely observed this automation. Instead, individual developers just work on their code (not surprising since they would prefer to work in isolation, reducing their cognitive load and learning curve). They aren't able to bring up other parts of the system, for a variety of reasons I'll list later. However, the engineers know (or their managers instruct them) to set up a "CI" for their code. So they take the build and test system they use locally, and put it on a server running in a loop. In our example, the backend team runs their backend tests on Jenkins.

Is that CI? There's an easy litmus test to determine that.

How to tell if you're doing CBOI rather than CI

Let's say the backend team makes a change, that will break the frontend code. To avoid certain objections, I'll add that this change isn't something we expected to be part of the API contract between these layers: let's say we just caused the ordering of results from a query to change. At what point in your development cycle will you discover the problem?

In organizations doing CBOI, the answer is that they'll find out in production when customers discover the defect. That's because the automation couldn't run the frontend tests against the HEAD version of the backend, and since the change appeared API-compatible, no one tried to manually verify it either. When you're discovering your bugs in prod, you should start asking the hard questions in your post-mortem: why didn't our CI catch this? And in our example, the answer shocks our engineers: they didn't have CI after all.

Instead of CI, their setup was individual teams testing their code in a loop, which is a Continuous Build (CB). Then when they released to prod, the Release Engineer performed the actual integration, by putting the code from different teams together in the finished system. They only do those releases on a less-frequent cadence. That's Occasional Integration (OI).

If a developer wanted to debug the problem, they'd be forced to "code in production". With no way to reproduce the full stack, they have to push speculative changes and look at production logs to see if they've fixed it. SSH'ing into a production box to make edits is the opposite of what we want. For space, I won't go into details on this as it merits a separate article (and is maybe obvious to you).

So we've finally defined what CBOI is, and seen how it causes production outages and scary engineering practices. Ouch!

How to stop doing CBOI

I have to start this section with a warning: it isn't going to be easy. The Continuous Build was setup because it was trivial: take the build/test tool the developers were running for their code and put it on a server in a loop. There isn't a similarly easy way to integrate the full stack. It may even require some changes to your build/test tools, or to the entry-point of your software. However if your organization has a problem with defects in production (or wants to avoid such a problem), this work is worth doing.

Also, although the example so far was a Frontend and a Backend, which are runnable applications, CI is just as important for other vertices of your dependency graph, such as shared libraries or data model schemas.

I'll break this down into a series of problems:

1) developers can't run the full stack
2) no integration test fixture exists that can detect the defect
3) resource constraints make it uneconomical to run all the tests

Along the way (spoiler alert) I'll explain how one Integration tool (http://bazel.build) solves the technical problems.

However we'll conclude with a final problem, the people problem:

4) the organization is averse to integrating dev processes

People problems are always harder than software problems, as I learned from early Google luminary Bill Coughran.

Why devs can't run the full stack

As I mentioned earlier, our ideal integration happens on the developers machine. After making that non-order-preserving backend change, you'd just run the frontend tests to discover the breakage. In practice this is much harder than it should be.

First, you might need your machine in a very particular state. You need compilers and toolchains installed, at just the right versions, statically linked against the right system headers, and running on an OS that's compatible with prod. Most teams don't have an up-to-date "onboarding" instructions that carefully covers this, and since the underlying systems are always churning, you don't even know whether your instructions will work for the next person trying to run your code.

Next, many systems require shared runtime infrastructure ("the staging environment") or credentials. These either aren't made available to engineers, or they're a contended resource where only one person can have their changes running at a time.

It's also common that knowledge of how to bring up a fresh copy of the system isn't written down anywhere, and hasn't been scripted. Only the sysadmin has the steps roughly documented in an unsaved notepad.exe buffer, so when you need to bring up a server, that person clicks around the AWS UI to do so.

To solve these problems, and unlock your developers ability to run the whole system, you need:

A tool like Bazel that manages the toolchains and keeps the configuration roughly hermetic, so a dev can "parachute" into someone else's code and run it at HEAD without any setup to maintain.
The ability to cheaply spin up a new environment anywhere. For example if you deploy to a Kubernetes cluster, use something like minikube to make a miniature local environment that mimics production and re-uses most of the same configs.
Robust scripting that automates the release engineer's job. It should be possible for a test to run the same setup logic to make a fresh copy of the system under test.

The configurations need to be "democratized" for this to work well. Under Jenkins you might have had some centralized Groovy code that looks at changed directories or repositories and determines tests to run. This doesn't scale in a big org where many engineers have to edit these files. Instead, you should push configuration out to the leaves as much as possible: co-locate the description of build&test for some code at the nearest common ancestor directory of those inputs. Bazel's BUILD.bazel files are a great example of how to do this.

Integration test fixtures

Remember that tests are written in three parts, sometimes called "Arrange, Act, Assert". The first part is to bring up the "System under test" (SUT). ( https://en.wikipedia.org/wiki/Test_fixture#Software and other links )

In order to assert that the frontend and backend work together, our automated test first needs to integrate the frontend and backend, by building both of them at HEAD and running them in a suitable environment, with the wiring performed so they can reach each other for API calls. You'll need a high-level, language-agnostic tool to orchestrate these builds, in order to build dependencies from head. Again, Bazel is great for this.

You'll find there is natural resistance here: the "first mover" cost is very high. An engineer could easily spend a week writing one test to catch the ordering defect I mentioned earlier. In the scope of that post-mortem, someone will object "we can't possibly make time for that." But of course, the fixture is reusable, and once it's written you can add more true "integration tests", even writing them at the same time you make software changes rather than as regression tests for a post-mortem.

If the code is in many repositories, that also introduces a burden. You'll either need some "meta-versioning" scheme that says what SHA of each repo to fetch when integrating, or you'll need to co-locate the code into a single monorepo (which has its own cost/benefit analysis).

Not economical to run all the tests

The last technical problem I'll mention is test triggering. In the CBOI model, you only needed to run the backend tests when the backend changed, and the frontend tests when the frontend changed. And they were smaller tests that only required a single system in their test fixture. CI is going to require that we write tests with heavier fixtures, and run them on more changes.

Triggering across projects is tricky. Our goal is to avoid running all the tests every time, but to run the "necessary" ones. You could write some logic that says "last time we touched that backend we broke something, so those changes also trigger this other CI". This logic is likely flawed and quickly rusts, so I don't think it's a good strategy. You could automate that logic using some heuristics, like Launchable does. But to make this calculation reliably correct, ensuring that all affected tests are run for a given change, you need a dependency graph. Bazel is great for expressing and querying that graph, for example finding every test that transitively depends on the changed sources.

In a naive solution, it's also too slow to build everything from HEAD. You need a shared cache of intermediate build artifacts. Bazel has a great remote caching layer that can scale to a large monorepo, ensuring that you keep good incrementality.

Organization Averse to Integrating

Lastly, I mentioned there's a non-technical problem as well. Even with clever engineers and the right tools, like Bazel, this might be what sinks your effort.

Engineers want to work in isolation from each other. For example, the backend engineers think JavaScript is a mess and don't want to learn anything about frontend code. Engineers are amazingly tribal! Try asking a Mac user to develop on Windows or vice-versa.

To do CI, we're asking that the backend engineers have to look at the frontend test results when something is red, to determine if their changes caused a regression. We're asking the frontend engineers to wait for a build of the backend to run their tests against. These teams never had to work closely together in the past.

Worse, we're also asking the managers to act differently. This is an infrastructure investment for the future, requiring some plumbing changes in the build system. So only an organization willing to make strategic decisions will be able to prioritize and consistently staff their CI project. Also, the managers from different parts of the org will have to reach some technical agreement between their teams about standardizing on build/test tooling that can span across projects. This may run into the same friction you always have when making shared technical decisions.

Epilogue: coverage

I like to beat up on test coverage as a metric, because it weights entirely on executing lines of code, but not on making assertions. In the context of CBOI, test coverage is also misleading. You might have 100% test coverage of the frontend, and 100% test coverage of the backend, but 0% test coverage of defects seen when integrating the two. I think this contributes to the misunderstanding among engineering managers.