Iyanu David

Posted on Feb 12

Build Systems Have More Power Than Production

#architecture #cicd #devops #security

If CI/CD is a control plane, then the build system is its forge. And the forge decides what becomes reality.

We obsess over production security—runtime policies, network segmentation, Zero Trust architectures, container isolation, service mesh controls. The infrastructure that serves traffic gets the scrutiny. The monitoring. The war rooms when things break.

But the build system?

It often has more authority than production itself. And attackers know it.

The Power Differential Nobody Talks About

Production systems are constrained by design. Services run with scoped identities. IAM roles follow least privilege—at least in theory. Network boundaries are defined, even if they're occasionally porous. Observability has matured enough that you can usually reconstruct what happened, even if you can't always prevent it. Runtime detection exists, though its efficacy varies wildly based on how much signal you're willing to drown in.

Build systems operate differently.

They execute arbitrary code from repositories—code that hasn't been vetted yet, that's the entire point. They pull dependencies from public registries where typosquatting and namespace confusion are facts of life, not edge cases. They hold artifact signing keys, often in environment variables or mounted volumes, because that's the path of least friction. They access secrets for packaging, for publishing, for promoting artifacts across environments that span development, staging, and production.

Production consumes artifacts. Build systems create them.

Creation power exceeds runtime power. Always.

Production Can Only Run What Build Produces

Here's the asymmetry: production doesn't decide what it runs. The build pipeline does.

If an attacker compromises production, they control one environment. Maybe they pivot to adjacent services if your lateral movement controls are weak. Maybe they exfiltrate data. It's bad. You declare an incident, page the team, start containment.

If they compromise the build system, they control every future deployment. Every downstream environment. Every customer update. Every signed artifact that carries your organization's cryptographic blessing.

That's not an incident. That's a generational compromise.

The SolarWinds attack demonstrated this with surgical clarity. Compromise the build process, inject malicious code into signed releases, distribute at scale through trusted update mechanisms. By the time defenders noticed, the payload had reached 18,000 organizations. Production security didn't matter. The artifacts themselves were poisoned at the source.

Build Systems Execute Untrusted Code by Design

CI systems exist to run code that is not yet trusted. That's their function. Every pull request—from employees, from contractors, sometimes from external contributors—triggers dependency installation, script execution, compilation, testing, packaging.

Which means third-party packages run inside your build environment. Preinstall and postinstall scripts execute with whatever privileges the build runner has. Toolchains fetch and execute remote binaries because modern development requires it. Container base images are pulled dynamically from registries you don't control.

A single malicious dependency can exfiltrate environment variables. Access injected credentials. Modify artifacts in ways that survive code review because the review happens before the build, not after. Alter build outputs subtly enough that static analysis misses it but the runtime payload activates exactly when designed.

The Codecov breach in 2021 showed how this works in practice. Attackers modified a Bash Uploader script used in CI pipelines. The script extracted environment variables—including credentials—from build environments and sent them to an attacker-controlled server. For months. Across hundreds of customer networks. The build system was the vector. Everything else was downstream consequence.

The Signing Authority Problem

Artifact signing is meant to create trust. Cryptographic proof that this binary, this container image, this deployment package came from your organization and hasn't been tampered with.

But who controls the signing keys?

Often: the build system.

Which means if an attacker controls the build, they control trust itself. They sign malicious artifacts with your keys. Your infrastructure validates those signatures and deploys with confidence. Runtime security sees valid signatures and assumes safety. The entire trust chain is predicated on build integrity, and build integrity is frequently assumed rather than enforced.

SLSA—Supply-chain Levels for Software Artifacts—emerged specifically to address this gap. It defines graduated levels of build provenance, from "no guarantees" to "signed provenance from hardened, isolated build platforms." Sigstore provides the cryptographic infrastructure for verifiable signing. These frameworks exist because the industry recognized that signing without build integrity is theater. A performance that creates the appearance of security while leaving the actual attack surface unaddressed.

Most organizations operate at SLSA Level 1 or below. They sign things. They don't verify the build environment that produced those things.

Production Is Observable. Build Is Often Not.

Production environments typically have centralized logging. Alerting pipelines that page people at 3 AM. Runtime monitoring that tracks anomalies, even if half of it is tuned to reduce noise. Incident response playbooks, varying in quality but at least documented. When something breaks in production, you have forensic data. You can reconstruct the timeline.

Build systems often have logs no one reviews. Ephemeral runners that self-destruct after each job, taking their filesystem state with them. Shared infrastructure where multiple teams' builds execute on the same underlying compute, separated by assumptions about container isolation. Minimal anomaly detection because "builds are supposed to do weird things." Limited forensic retention because storage is expensive and who's really going to investigate a build unless it fails?

Yet the build system can alter everything production becomes.

We instrument runtime heavily. We often under-instrument artifact creation. The thing that determines what runs gets less observability than the thing that runs it.

The Ephemeral Fallacy

Many teams assume ephemeral runners are inherently safer. Fresh compute for every job. No persistent state. What could go wrong?

Ephemeral doesn't mean isolated.

It doesn't mean credential scope is limited. It doesn't mean network egress is restricted. It doesn't mean artifact outputs are verified. It doesn't mean dependencies are trustworthy or even checksummed against known-good hashes.

Short-lived infrastructure with broad privilege is still broad privilege. The runner exists for fifteen minutes, but in those fifteen minutes it has access to your container registry, your artifact storage, your signing keys, your cloud provider credentials, your internal APIs.

The CircleCI security incident in January 2023 illustrated this perfectly. Attackers gained access to encryption keys used to protect customer secrets stored in CircleCI's environment variable system. Those secrets—API tokens, cloud credentials, database passwords—were intended for ephemeral runners. But the runners needed access to them, which means the secrets had to be retrievable, which means they became a target. Ephemeral execution didn't protect against persistent credential theft.

If your build environment can pull secrets, it can leak secrets. Duration doesn't change that equation.

Supply Chain Attacks Scale Differently

Traditional breach pattern: compromise a server, move laterally through the network, escalate privileges, establish persistence, exfiltrate data or deploy ransomware.

Supply chain breach pattern: compromise the build, inject into an artifact, let the organization's own deployment automation distribute your payload.

The second scales faster. And it bypasses runtime defenses entirely.

When you compromise production, defenders can isolate the affected systems, rotate credentials, rebuild from known-good images. When you compromise the build, defenders have to question every artifact produced since the compromise began. Which deployments are safe? Which container images are clean? How far back do we roll? Do we even know when the compromise started?

The forensic problem becomes exponentially harder because the attack happened upstream of where you instrument. Your runtime logs are clean. Your network monitoring shows normal traffic. Everything looks fine because the malicious code was baked in before it reached production.

The Question That Changes the Conversation

Instead of asking "Is our production hardened?" ask: "Can our build environment publish something malicious without being detected?"

If the answer is yes—and for most organizations it is—your security posture is incomplete. You've fortified the castle while leaving the weapon forge unguarded.

Designing Build Systems as High-Privilege Infrastructure

If build systems hold creation authority, they require architectural intent. Not best practices applied as afterthoughts. Not security bolted on when compliance demands it. Intent from the beginning.

That means minimizing credential scope. Use short-lived identities instead of stored secrets. OIDC tokens from your CI provider to your cloud platform, bound to specific repositories and branches. Credentials that exist for the duration of a job and self-revoke, not API keys that live in environment variable configuration for years.

Reduce network access during builds. If your build doesn't need to call external APIs, block egress. If it needs specific dependencies, allowlist those registries and reject everything else. Defense in depth assumes compromise; network segmentation limits what an attacker can do post-compromise.

Separate build execution from signing authority. Don't let the same compute that runs arbitrary code from pull requests also hold the keys that sign production artifacts. Use isolated signing infrastructure that receives artifact hashes from builds and returns signatures, never exposing key material to the build environment itself.

Generate verifiable artifact provenance. Use in-toto or SLSA attestations that capture what was built, where, from what source, using what dependencies. Make the provenance unforgeable and independently verifiable. When something breaks, you need to know what you deployed. When something is compromised, you need to know what to untrust.

Document blast radius explicitly. If this build system is compromised, what can an attacker reach? Which secrets? Which networks? Which downstream systems? Threat modeling isn't about paranoia; it's about honest accounting of what's actually exposed.

Build systems should be treated as production-grade infrastructure because they decide what production becomes.

Closing Thought
Production systems enforce trust. Build systems define it.

If you control production, you control an environment. If you control the build, you control the future of every environment.

That's more power. And power requires architecture.

Not eventually. Not when you have time. Monday morning.

References
CISA Alert AA20-352A — SolarWinds Supply Chain Compromise

Codecov Security Incident Report (2021)

CircleCI Security Alert (January 2023)

SLSA Framework (Supply-chain Levels for Software Artifacts)

Sigstore Project