Tom Masson

Posted on Apr 27

The Perfect Fruit Salad: 12,000 deployments per month with a single entry point

#idp #cicd #devex #productivity

In the first two posts, we talked about rebuilding trust with product teams and automating CI with Nx. But building a great "Artifact Factory" was only half the battle. We knew from day one that if we just produced artifacts faster without improving the deployment side, we’d just be creating a faster mess.

The Architectural Split: CI ≠ CD

In our legacy world, the lines were blurred. Our CI pipelines didn't just build and test code, they often tried to handle the deployment logic too. This "monolithic pipeline" approach was fragile. If a deployment failed due to a transient environment issue, you had to rerun the entire build and test suite.

When we designed the new CI/CD platform, we made a deliberate choice to enforce a strict Separation of Concerns.

CI (The Factory): its only job is to produce a validated, versioned artifact and signal its readiness.
CD (The Logistics): its job is to take those artifacts and navigate them through the environments maze.

This decoupling is about more than just clean architecture, it's about accelerating our time to market. Because the CD system is a separate entity that listens for new artifacts, it can react immediately to move them through stages. We no longer have to wait for a legacy CI pipeline to finish a long, linear sequence of environment specific steps. The moment the factory produces a new version, the logistics engine kicks in.

Enter Kargo: The Single Entry Point

To manage this logistics layer, we chose Kargo. It provided the single entry point our developers had been craving.

Instead of hunting through CI logs or checking different tools for different stacks, developers now have one unified interface. Whether they are deploying a backend service, a Lambda function, or a frontend app, the workflow is identical. This is the Paved Road in action, making the best way to deploy also the fastest and easiest.

Kargo allowed us to define a clear, stage based journey for our software, from Warehouse (Artifacts) to Production, moving through dev and staging environments.

Efficient Release Management

The beauty of Kargo is that it allows us to define the "rules of the road" for each environment based on team's specific needs.

Automated Flow: For most teams, a successful CI run triggers an automatic promotion to dev (and sometimes staging). The system reacts to the new artifact immediately, closing the feedback loop for developers in seconds.
Gated Promotion: For Production, we keep a "human in the loop" for now. A developer can use the Kargo dashboard to see the Freight that has successfully passed through staging and trigger the final promotion with a single click.

It’s one interface, one mental model, and total visibility.

The "Glue": How can Kargo handle a non-kubernetes artifact ?

The secret is in how our nx-core publisher uses OCI manifests as metadata markers.

For containerized apps, the signal is a standard, out of the box supported Docker image. But for non containerized workloads like Lambdas or Frontends, we push a tiny "marker" (an OCI artifact) to ECR using ORAS CLI. Think of these as "empty" ECR artifacts, they don't contain a runnable container. Instead, they carry OCI annotations that act as metadata pointers to the real assets stored in S3 or our private Helm charts registry.

Kargo actively polls the ECR registry for these markers. When it sees a new tag, it parses the annotations, follows the pointers and automatically knows what kind of artifact to promote. It will push a commit updating the artifact version following its promotion flow and then a set of Github Actions will apply the changes for non standard artifacts.

It turns fragmented artifacts into a single, atomic unit of delivery, providing a unified GitOps interface for every type of workload in our monorepos. It also allows us to support pretty much any kind of artifact we want.

Building an Automated Safety Net: Ensuring Production Readiness

To close the loop safely, we needed a way to ensure an environment was healthy before allowing a change to proceed, especially when promoting to Production. We developed an automated verification service that integrates directly into the Kargo Verification system.

This "safety net" executes health checks automatically:

Backend: It monitors deployment status and health in real time, checking for pod readiness and liveness.
Lambda: It verifies that its version aliases are correctly aligned with the promoted version in AWS.
Frontend: It validates that the live application reports the expected version via /version.json or similar endpoints before marking a promotion as successful.
Datadog Deployment Gates: For critical services, we evaluate Datadog monitors (including log based error rates and APM metrics) via the datadog-ci CLI to automatically block or rollback promotions if regressions are detected.

Developer Productivity: Empowering with PR Previews & Helm Diffs

Standardizing our workflow gave us the foundation for a couple of "quick win" productivity features that have since transformed our developer experience:

a functional preview of any backend applicative Pull Request directly on our dev environment: add a label to the PR, wait a few seconds for the deployment signal and you’re live (similarly to our previously shared journey on Frontend Previews)
Helm Chart diff checker: whenever a change is proposed to a Helm Chart, a script automatically calculates and posts the Helm diff as a PR comment. This allows developers to previsualize exactly how their Kubernetes manifests will change before they hit the cluster, a short feedback loop that catches configuration errors in seconds.

From Support to Multiplier: The Platform's Impact

By the end of 2025, Kargo deployments (12,134/month) had completely overtaken our legacy deployment system (1,543/month). This shift wasn't just about speed, it was about 94% adoption across 17 out of 18 teams.

That adoption didn't happen because we announced a grand rollout plan. It happened because we built the platform in public, stayed close to the first teams who trusted us, fixed their pain quickly, and kept sharing the wins, the learnings, and the data. Over time, the migration dynamic flipped: instead of the platform team trying to convince teams to move, teams started asking to join faster than we could onboard them.

The platform team finally moved from a support desk to an Engineering Force Multiplier. We weren't spending our days "helping" people with YAML tags anymore, we were building safety nets and preview environments. We had regained our leverage.

The Honest Reality Check: "Pips in the Salad"

I don't want to oversell this. It’s a win, but as our Engineering Manager, Côme, likes to point out with his favorite analogy: "We are approaching a fruit salad, but some bits are still too big and there are too many seeds."

This "fruit salad" analogy, popularized by Gregor Hohpe, perfectly describes the state of a maturing platform. You don't want a basket of whole fruits (fragmented tools), you want an integrated, combined experience. But getting there is challenging:

Migration Fatigue: Migrating 90+ apps in five months was a marathon. We pushed the limit of how much change an organization can absorb & how much migration a team can handle in a short window.
Complexity: Our toolchain is powerful, but it's still complex. Reducing the cognitive load of a modern cloud native stack remains a core challenge we are tackling with a product mindset.

What’s Next: The Road to 2026

We aren't done. The migration was just the foundation. Our roadmap for the coming year is focused on deepening the "Paved Road" and removing the remaining pips:

One Click Scaffolding: Moving from standardized lifecycles to standardized "everything".
Observability as Code: A metadata driven approach where our Helm charts automatically configure monitoring, alerting, and Datadog dashboards.
Better Infrastructure Interfaces: Treating infrastructure as a library, making cloud resources as intuitive to consume as a software dependency.
Tooling for AI: Providing the "guardrails" and a single source of trust for teams leveraging LLMs to ensure they scale safely.

Conclusion: Building Trust, Not Just Tools

The success of this transformation wasn't achieved by the best tools alone, it was achieved by building in public, clear communication and rapid feedback loops. We improved developer experience by making workflows faster, deployments more reliable, and the path to production easier to understand. But credibility came from something simpler: listening closely, fixing issues fast, and proving on our own repos that the paved road was worth taking.

I think it also worked because our strategy was perfectly aligned with the organization's timing, and the engineering maturity across PayFit which had reached a point where teams were ready to trade "snowflake" control for standardized leverage.

If you're building a platform, remember: Your developers don't want a "fruit basket" of tools they have to peel and chop themselves. They want a salad. It might still have a few pips, but if it helps them ship 20x more often and faster, they'll keep coming back for more.

Acknowledgments

A big thanks to everyone involved in this transformation. Special thanks to Nicolas Beaussart for introducing me to the monorepo world and his mentoring, Nx for providing such a great devexp, and to Côme Tresarrieu for the "fruit salad" reference and his EM support. Finally, a huge kudos to my Internal Developer Platform (IDP) team for building the "paved road" that more of a hundred engineers use today.

This concludes our series on CI/CD Platformization at PayFit. Thanks for reading!

DEV Community