Eyal Lapid

Posted on May 7

Six Reasons Your Monorepo CI Got Slower This Quarter

#monorepo #turborepo #nx #dxcore

Originally published on dxcore.dev.

I've been building a CI scheduler — an internal tool that grew out of getting tired of watching my own Turborepo monorepo CI get slower on GitHub Actions. The pattern across teams that have been at it for ten years is consistent enough to write down.

Shopify got 95% of their monolith CI runs under 18 minutes, down from 45 — but only after it had silently grown the other direction for years. Slack cut their average end-to-end build from around 10 minutes to around 2. Canva brought a "build all" job that had grown to take up to three hours down to around fifteen minutes.

These are after-photos. The before-photo is what every monorepo team eventually arrives at: median CI time creeps up, week by week, with no single bad commit to blame. Six mechanisms do the creeping. Here's what each one looks like, and which lever each one calls for.

You added packages

This one is mechanical. Your monorepo started with three packages. By quarter six it has fifty. Every step in your CI that runs across all packages — install, lint, typecheck, build, test — is now somewhere between five and sixteen times slower than it was at the start. Not because the work per package got harder. Just because there's more of it.

Gotcha: this is the easiest mechanism to spot, so it's the one teams blame first. It's almost never the headline cost.

Your slowest test got slower

The wall-clock time of any partially-parallel pipeline is bounded by the deepest single task in it. Your slowest end-to-end test, your largest integration suite, your one big database migration check — that single job sets a floor on how fast a green CI can possibly be, no matter how many runners you throw at the rest.

The critical path through your CI graph sets the floor. As your slowest job grows, that floor rises with it.

Shopify's CI team described the lever directly: "Focus on the slowest 20% and you will be surprised how much impact they have on the whole test suite." They go after individual long poles ("we discovered that one test frequently hangs and causes CI to timeout") and sometimes disable tests outright "if they cause too much 'harm' to Shopify and other developers." The fix is attacking the specific slow job, not throwing more runners at the rest.

Gotcha: when your CI gets slower, the instinct is to add more runners. But your savings cap out at your slowest single job.

Your tests got flakier (and you keep retrying them)

Flakiness was the mechanism I most underestimated. It's not a quirk; it's structural at scale.

John Micco at Google wrote in 2016 that "almost 16% of our tests have some level of flakiness associated with them" and that "about 84% of the transitions we observe from pass to fail involve a flaky test." That last number is the load-bearing one: when a green CI suddenly turns red, more than four times in five it's not a real bug.

The downstream cost is the retry. Durieux et al. (MSR 2020) studied 3,286,773 Travis CI public builds and observed that "46.77% of the failing/errored builds pass post-restart." Almost half. Each one of those manual restarts doubles the wall time of whichever job triggered the restart.

Gotcha: this looks like "the slowest test got slower." It is sometimes that. It is more often that you're running it twice.

Your work is shaped to serialize

You have parallel runners. The work could, in principle, run across them. But your CI is structured in ways that prevent that — sometimes at the task level (tests sharing resources), sometimes at the orchestration level (the scheduler can't tell what's independent).

Canva's CI team documented the task-level version. Their backend integration tests depended on a single set of storage containers — which meant "all tests were running against and overloading the single set of storage containers. Some tests required exclusive container access, leading to complex test parallelism and ordering." A test step "easily pushing over 50 minutes" wasn't slow because the work was hard; it was slow because the work couldn't run concurrently. The fix was hermetic tests — each with its own container — so they could finally run in parallel.

The other version lives a layer up: even when tasks are structurally parallelizable, the orchestrator running them may not see they're independent. Most build systems and CI providers treat their queue as a flat job list with hand-written dependencies — they can't see across packages, so independent work serializes into the order you wired it up.

Gotcha: when CI feels slow, the instinct is to look at the runners. The harder question is whether your work is even shaped to use them.

Setup tax

Setup overhead is bigger than people give it credit for. Shopify found that "68% of the time in CI was spent just on overhead before we actually ran any test" — about 31% on agent prep and 37% on dependency build. Two-thirds of CI time, gone to housekeeping.

Gotcha: setup cost is a permanent tax that grows with your repo size. Adding more runners doesn't make any single run faster; it just lets more runs pay the tax in parallel.

You rebuild more than you need to

When you change a base package — a UI library, a logger, shared types — running everything downstream is correct. Those consumers really do need to rebuild.

The problem is the other 95% of PRs. Most changes touch a narrow slice of the graph: one consumer's internal helper, a single service's route handler, a leaf-package test. The CI cost of those PRs should be small. In practice, most teams run the full suite anyway — because nobody is computing what the change actually affected. Here's the shape of the fan-out the typical PR pays for:

Without a precomputed dependency graph, your build system has two options: rebuild everything, or rebuild based on a hand-maintained list that's always out of date. Most teams pick the first one, and pay for it on every PR — even the ones where most of the work was unnecessary.

The teams who've fixed this compute the impact set per change. Stripe runs only around 5% of their tests on average per change, against a 50 million line Ruby monorepo, because they identified which tests actually depend on the changed code.

Gotcha: the fan-out exists whether you measure it or not — but you can't skip what you haven't computed.

All six hit hardest when the cache is cold

Here's the thread that ties the six together.

You added packages, and on a cold cache every package gets re-installed and re-built from scratch. Your slowest test got slower, and on a cold cache you're paying the full duration of it instead of skipping it from cache. Your tests got flakier, and on a cold cache a retry isn't "just the test again" — it's the full chain again. Setup tax is at its worst right when the runner is brand-new, with no prefetched layers. Your graph fans out, and on a cold cache every node in the fan-out runs.

This is why the first PR of the morning takes 22 minutes — and the rest of the day is fine. The cache hides five of the six mechanisms once it's warm. The first PR pays the full bill.

Next week we'll look at what your cold-run dependency graph actually contains, using turbo run --dry=json — a tool you already have.

DEV Community