Tom Masson

Posted on Apr 17

Prepping the Ingredients: Scaling CI with a Unified Monorepo Engine

#devops #cicd #productivity #devex

In the previous post, I talked about how we stopped building and started listening to our developers. We uncovered a fragmented CI/CD landscape riddled with challenges: a glaring lack of standardization and uniformization across teams, stability issues and performance bottlenecks that hindered developer velocity. The loudest complaint? Migration fatigue. Every time the platform team "improved" something, it meant hours of manual work for every product team.

To fix this, we didn't just need better pipelines; we needed a way to automate the "tax" of being on the platform and provide a robust, unified CI/CD experience. Just as importantly, we needed to build those new foundations in public, alongside the teams who would eventually rely on them.

The Brain of our Monorepo: Implementing Nx and our custom Nx plugin

We took the opportunity of our ongoing shift toward Domain Driven Design (DDD), organizing our code into monorepos per business domain, to make sure to leverage Nx to act as the central orchestrator of our backend monorepos. This was not a gamble considering the frontend track record.

Our magic sauce, the custom @payfit/nx-core Nx plugin

This plugin is the engine that drives our CI platform. We didn't just write one off scripts, we built a triad of automation leveraging native Nx primitives. This is a crucial distinction, because we are using the same tools and patterns that our developers already know, the platform isn't a "black box", it's a transparent, customizable extension of their existing workflow.

That transparency mattered as much as the technical implementation itself. We were dogfooding the same setup in our own repositories, cleaning up our own workflows first, and sharing progress publicly as we went. Before inviting teams onto the new paved road, we wanted to make sure we were already walking it ourselves.

Automatic configuration with Nx Inferred Tasks: By leveraging the Nx Release SDK, we inferred a standardized nx-release-publish target for every publishable project. This allowed us to orchestrate the entire versioning, changelog generation, and publishing lifecycle in a single, atomic operation. Developers don't have to manually configure these targets, they just "appear" when the project meets the criteria.
Nearly zero maintenance tax with Nx Migrations: When we need to roll out a breaking change, we write a Nx migration. These are automated code mods that could update literally anything in the monorepos and apply fixes at scale, turning daunting manual migrations into simple PR reviews, automatically opened and applied by a dependency bumping tool.
Automated installation with Nx Init Generators: Our init generator handles the "day zero" setup, installing necessary dependencies and configuring plugin settings in one command.
Automated drift detection with Nx Sync Generators: Our sync generators ensure that config drift is eliminated by keeping local configurations in sync with reality.
Automated conformance across our org using Nx Conformance: We wrote several global conformance rules that all our repositories must follow. Nx Conformance ensures they comply automatically and fails fast in PRs whenever someone breaks a rule, with a clear error message.
Efficient & performant by default: We only run Nx CLI commands (like nx affected), the system only builds, tests, and releases what actually changed. Nx targets are run in parallel by default.
CI vendor agnostic: Our logic lives in Nx executors and we only run CLI commands, we are decoupled from the underlying CI vendor. Whether we are running on CircleCI, GitHub Actions, or locally on a developer's machine, the behavior is identical. The workflow is the same.

Under the Hood: The Polyglot Publisher

For my tech folks, here’s a brief deep dive.

If you’re interested in the implementation details, I may put together a follow-up article that explores it in depth, let me know if that’s something you’d like to see.

We replaced a dozen of "snowflake" custom release scripts and CircleCI Orbs with a single, unified Nx executor: @payfit/nx-core:publisher.

Docker images were a major source of CI bloat. We reworked the process using our docker-build executor:

Optimized lightweight Dockerfiles: Instead of the naive COPY . . of the whole workspace, we copy only the specific files needed for that project's dependency graph, ensuring smaller layers, smaller image size, faster builds, faster Kubernetes pod spin up. Everyone win with lighter images.
The "Build Once" Rule: The Dockerfile reuses the dist artifacts already generated in previous CI steps, ensuring that what was tested is exactly what is packaged. We also separated the Docker build from the release step. CI builds the image once (tagged with the commit SHA), and the publisher simply promotes that artifact.

2. Lambda: Smoke Testing as a First Class Citizen

For each Lambda publishing job, we are verifying if it works in a real environment before the pipeline finishes.

The "Pre-flight" Smoke Test: Before the publisher marks a release as successful, it triggers a run on the project's dedicated Spacelift infrastructure stacks. It triggers a deployment and runs a health check to ensure the function is properly "good to go" in a real environment.
Fail Fast: If the smoke test fails, the CI job fails immediately. We catch "it works on my machine" bugs before they ever reach our promotion tool, Kargo (which we'll explore in Part 3).

3. Calendar Versioning for Readability

With over 12,000 deployments a month over a hundred applications, traditional semantic versioning can make it difficult to quickly identify when a change was released. We adopted Calendar Versioning (CalVer), using the format YYYYMM.DD.patch, to provide immediate temporal context for every artifact. This makes our release history significantly more readable and helps teams identify version age at a glance.

The Impact: Reduced Friction and Enhanced Visibility

Today, CI at PayFit is no longer a collection of "snowflakes", it’s a standardized, stable engine leveraging Nx, the central "brain" that has become our shared language across every repository:

One Pipeline, One Config: Whether you're working on a backend app, a Lambda, or a frontend app, the pipeline is the same.
The Paved Road, Not a Walled Garden: Everything we've built is through native Nx support. This means teams aren't locked into a rigid platform, they can easily override, extend, or customize their configurations whenever they need to, using the same mental model they use for their local development.

In five months, we migrated 90 applications to this new standard. But those migrations didn't succeed because we forced them through. We started with a small number of teams with whom we already had strong relationships, gave them our full attention, and treated every issue they hit as a platform bug we had to fix quickly. Those early adopters helped us harden the system in real conditions.

That is what gave us the right to scale. By leveraging Nx's remote caching and optimizing our pipelines, we nearly halved the CI time for every repository we migrated. Our deployment frequency increased by 20.8x, jumping to over 12,000 deployments a month, we regained our leverage as a Platform. We now have an entrypoint on each monorepo, allowing us to provide an easy, automated migration path moving forward.

At the beginning, we were the ones reaching out and asking teams whether they wanted to move. A few months later, the situation had flipped, teams were asking to migrate faster than we could support them. That backlog of demand was one of the strongest signals that the platform had stopped being perceived as a constraint and had started to be seen as an accelerator.

Lead time visibility

The equally impactful outcome was the radical increase in visibility across the entire delivery lifecycle. Previously, we simply didn't have the data to measure DORA metrics accurately. By standardizing on a single pipeline, we've finally gained the ability to track the entire journey. Our median lead time for change currently sits at 16.19 hours.

Is that number high? Maybe. But for the first time, it's a real number. It's not a vanity metric, it's a reflection of our actual engineering cycle. Now that we can finally see the full picture, we have the baseline we need to identify real bottlenecks.

Unified Scaffolding: The Next Step

We still have some manual steps in the creation of new apps. Our next step is to leverage Nx generators to provide full, opinionated scaffolding for backends, frontends, and Lambdas, with the goal to provide a single, unified CLI where a developer can run a command and have a production ready, fully integrated app in minutes. Because we've already standardized the underlying engine, these generators will unlock a "Ready to Ship" state by default, with no configuration required.

But CI is only half the battle. Getting those artifacts into production safely across 100+ apps requires a different kind of magic.

In the final part, I’ll explain how we used Kargo to close the CD loop and how we built an automated safety net to ensure every release is production ready.

Top comments (3)

François • Apr 19

That's one thing I dislike but I guess we can't do without: that the build (dist creation) happens outside the docker process and we just copy the output into docker. It feels counter-intuitive with container principles but I guess there is no easy solution here :(

Tom Masson • Apr 27

Yeah since there is no complicated runtime requirement, we chose speed whenever possible

Henry A • Apr 17

The unified pipeline definition pattern is the right call. The alternative — copy-pasting workflows across 30 repos and hoping someone remembers to update all of them when the Docker registry changes — doesn't scale past 5 services.

One thing that bit us with shared CI engines: make sure your pipeline abstraction handles per-service secrets and per-environment deploy targets without the consuming team needing to understand the engine internals. We ended up with a matrix of {service} × {environment} × {deploy-target} that the shared config had to express cleanly. The teams that got it right used a thin service-level config (just "what am I" and "where do I go") that fed into the shared engine. Teams that tried to override pipeline steps per-service ended up with a worse mess than separate workflows.