DEV Community

Cover image for CI/CD Is Not a Toolchain—It's a Control Plane
Iyanu David
Iyanu David

Posted on

CI/CD Is Not a Toolchain—It's a Control Plane

For years, we treated CI/CD as delivery automation.

A toolchain. A convenience layer. A faster path from commit to production.

That framing is outdated—not because the tools changed, but because what they do changed, and we kept pretending they hadn't.

Modern CI/CD systems don't just ship code. They provision infrastructure. They rotate secrets. They apply IAM policies, run database migrations, configure networking, trigger rollbacks, and deploy across multiple environments. They make decisions about what exists and what doesn't. They hold keys that unlock more doors than most engineers will ever touch.

That's not a toolchain.

That's a control plane.

And we're still securing it like internal plumbing.

What Makes a Control Plane

A control plane has three characteristics:

1. It can change system state
2. It holds privileged authority
3. It affects multiple environments

Modern pipelines meet all three. When a pipeline runs, it doesn't just "build." It decides. It allocates compute. It writes firewall rules. It stamps certificates. It mutates the topology of systems that serve actual users.

That authority often exceeds the permissions of individual engineers—by design. The pipeline needs to reach across boundaries that humans can't. It needs to touch production. It needs to rewrite DNS. It needs to push artifacts into registries that gate what runs where.

And yet we talk about it as though it's just Jenkins with better syntax.

The Power Asymmetry Problem

In many organizations, engineers have scoped access. They can read logs from their service. They can deploy to staging. They can query metrics. Production? That requires approvals. IAM changes? Those go through a ticketing process. Cross-account modifications? Forget it.

Services have bounded roles. An API server can write to its own database. It can call specific downstream dependencies. It can't touch S3 buckets it doesn't own. It can't assume roles in other accounts. The principle of least privilege is gospel.

Production environments have layered controls. Network segmentation. Private subnets. Security groups. Bastion hosts. VPNs. You don't just SSH into prod anymore.

But pipelines?

They often hold cross-environment credentials, deployment authority, artifact signing keys, infrastructure modification rights. They can create load balancers, delete databases, rotate encryption keys, push container images, apply Terraform plans, update DNS records, modify IAM policies, and invalidate CDN caches.

Why? Because pipelines need to "just work." Because friction in CI slows down shipping. Because nobody wants to manually approve every deployment step.

The result is a power asymmetry: the automation layer has more authority than the humans it serves.

That should make us uncomfortable.

It doesn't—yet.

"Trusted Runner" Is a Dangerous Phrase

We casually refer to CI environments as trusted.

But trusted by whom? Against what threat model?

Modern pipelines run on third-party infrastructure—GitHub Actions, GitLab runners, CircleCI agents, and cloud-hosted build farms. They execute code from pull requests, often before any human review. They fetch remote dependencies from npm, PyPI, Maven Central, and Docker Hub—registries we don't control. They interact with SaaS APIs. They store cached artifacts across builds. They pull secrets from vaults that assume the runner ID is proof of identity.

They are exposed to dependency poisoning, malicious forks, compromised tokens, lateral movement paths, and supply chain injection. A single poisoned npm package in a pre-build script can exfiltrate AWS credentials. A malicious PR can rewrite pipeline configuration to dump secrets. A compromised runner can pivot to internal services.

Calling this environment "trusted" is less a statement of fact and more a leftover assumption from when CI ran in the basement on hardware we owned.

That world is gone.

Pipelines Now Define Blast Radius

When incidents happen today, the blast radius often traces back to CI/CD.

A compromised token pushes a poisoned artifact that propagates through staging, then production, across six services before anyone notices.

A misconfigured pipeline applies IAM changes globally because the Terraform workspace wasn't parameterized correctly and nobody caught it in review.

A broad secret—something like AWS_ACCESS_KEY_IDscoped to *—enables cross-service access because the pipeline needed to deploy "everything," and that was easier than per-service roles.

An automated rollback reintroduces a vulnerability because the rollback logic doesn't check CVE databases; it just redeploys the last known-good SHA.

The failure isn't in runtime code. Runtime code is sandboxed, logged, and monitored. It runs with constrained permissions. It's been through static analysis, dependency scanning, and peer review.

The failure is in deployment authority.

And deployment authority lives in the pipeline.

The Architectural Shift No One Acknowledged

Cloud-native architecture evolved. Microservices replaced monoliths. Infrastructure became code. Environments became ephemeral. We got better at blast radius containment—network policies, service meshes, zero-trust networking, and identity-aware proxies.

But pipeline trust models didn't evolve at the same speed.

We expanded their power without redesigning their boundaries. We gave them more keys without rethinking the locks. We automated more surfaces without segmenting the automation itself.

A pipeline that deploys a single microservice now might also:

  • Apply Kubernetes manifests via kubectl
  • Provision cloud resources via Terraform or Pulumi
  • Update service mesh policies
  • Rotate database credentials
  • Push container images to multiple registries
  • Update feature flags in a remote config service
  • Invalidate CDN caches
  • Send deployment notifications to Slack, PagerDuty, Datadog

Each of those actions requires credentials. Each of those credentials is a pivot point. Each pivot point is a potential compromise.

That's the architectural gap. We designed resilient runtime systems. We forgot to design resilient deployment systems.

The Question Teams Rarely Ask

Instead of asking, Is our pipeline fast?, we should be asking:

If this pipeline were compromised, what could it change?

If the honest answer is "almost everything," then the system is not segmented—it's automated.

And automation without scoped authority is concentrated risk. It's a single choke point with god-mode privileges. It's a target.

The calculus is simple: attackers don't need to compromise every service if they can compromise the thing that deploys every service.

What Practitioners Actually Do on Monday Morning

This isn't theoretical. Here's what changes when you treat CI/CD as a control plane:

You scope pipeline permissions. Not later. Now. Each pipeline gets exactly the permissions it needs to deploy its specific service. No wildcards. No AdministratorAccess. No shared service accounts.

You separate build from deploy. Build environments shouldn't hold deployment credentials. They shouldn't need them. Builds produce artifacts. Deployments consume artifacts. Different trust domains.

You gate production access. Pipelines don't get automatic production deploy rights. They request them. Approval workflows, break-glass procedures, and time-bounded tokens. Production is not just another environment variable.

You audit pipeline changes like code changes. Every modification to .github/workflows or .gitlab-ci.yml or Jenkinsfile goes through review. Those files define what can change production. Treat them accordingly.

You inventory secrets. What does each pipeline actually have access to? Where are credentials stored? How are they rotated? Who can read them? If you can't answer these questions in under five minutes, you have a visibility problem.

You assume compromise. Design pipelines so that a compromised runner can't pivot laterally. Network segmentation, egress controls, short-lived credentials, and least-privilege IAM policies. Defense in depth isn't just for runtime.

You monitor pipeline behavior. What repositories are being cloned? What APIs are being called? What resources are being created? Deployment telemetry is production telemetry.

None of this is exotic. It's just... deliberate.

The Uncomfortable Truth

Pipelines are easier to compromise than production systems.

They run untrusted code by design. They have broad permissions by necessity. They're less monitored, less segmented, and less reviewed. They're infrastructure we inherited rather than infrastructure we designed.

And they're the keys to the kingdom.

The industry spent a decade hardening runtime security—sandboxing, SELinux, seccomp, capabilities, namespaces, firewall rules, and intrusion detection. We got pretty good at it.

We spent comparatively little time hardening deployment security.

That asymmetry is showing.

Reframing CI/CD

CI/CD isn't a supporting utility. It's the system that assembles artifacts, defines infrastructure, enforces policy, and deploys change. It's the mechanism by which intent becomes reality.

That makes it production infrastructure.

And production infrastructure deserves explicit trust modeling, scoped permissions, ownership, review, and architectural design.

Not just YAML.

Not just "works on my machine."

Not just "trusted by default."

If your pipeline can rewrite production, it is production. Treat it that way.

Up Next (Day 2)

If CI/CD is a control plane, then the build stage isn’t harmless either.

Day 2 explores why build systems often have more practical power than production—and why supply chain attacks work so well.

📌 If you’re new here, start with the previous series (pinned).

Top comments (1)

Collapse
 
iyanu_david profile image
Iyanu David

This series examines CI/CD as production infrastructure and not just delivery automation.