Luan Rodrigues

Posted on Mar 28

Your CI/CD Pipeline Is a Security Risk - Here's How I Fixed Mine

#security #aws #devops #cicd

Most CI/CD pipelines are one compromised dependency away from a production takeover.

I learned that the hard way after the Codecov breach.

I spent a couple of weeks hacking on a PoC to see what actually holds up when the goal is simple: stop someone from nuking your prod environment.

Here's what I ended up with and where it hurt.

Branch protection is table stakes, but it's very easy to get wrong.

I forced signed commits and mandatory approvals, even for admins. Yeah, it slows things down during fast iterations. But without it, a compromised runner can just rewrite your main history and push whatever it wants.

Let me paint the scenario that kept me up at night:

A compromised GitHub Action runs in your pipeline.
It reads your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.

Sends them to an external server.

A few seconds later, someone else is deploying to your production account.

Game over.

Static secrets had to die - so I killed them

I replaced every AWS_ACCESS_KEY_ID with OIDC (OpenID Connect).

Now the runner requests a short-lived JWT, and AWS STS exchanges it for temporary credentials. No long-lived secrets sitting in GitHub anymore.

Setting this up was a nightmare. The first error I got was an AccessDenied that lasted 4 hours until I realized the Trust Policy wasn't accepting GitHub's audience. AWS documentation on OIDC feels like it was written by someone who's never debugged a 403 in their life.

But once it works, it removes an entire class of problems. Even if a job gets compromised, the creds expire fast enough to limit the blast radius.

permissions:
  id-token: write
  contents: read

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Configure AWS credentials via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GitHubOIDCRole
          aws-region: us-east-1

Tip: if you get "Access Denied",
99% of the time it's your IAM Trust Policy not accepting
GitHub's audience. I lost 3 hours on this.

SBOMs were the easy part.

Syft generates SPDX/CycloneDX without much friction.

Keyless signing with Cosign took more effort. The Sigstore flow isn't hard once it clicks, but it's definitely not obvious the first time. I had to re-read the docs three times to understand where the "key" even comes from.

Still, it beats GPG key management hell. No private key to rotate, no "who has the signing key" questions.

The GitHub OIDC identity handles everything, and verification is just:

cosign verify \
  --certificate-identity-regexp="^https://github.com/.*$" \
  your-registry/image:tag

If verification fails with "no matching signatures",
your identity is probably not mapped in the policy.

gVisor: the container escape killer (but it comes with baggage)

I didn't fully trust Docker isolation, so I tried running builds with gVisor (runsc).

This part was painful. Self-hosted runner setup, cgroup issues, random "unsupported runtime" errors... I lost a few nights here.

What broke:

Volume mounts with specific configurations
Anything touching /proc too aggressively
Builds depending on uncommon syscalls

Debugging inside gVisor isn't fun either. When something fails, you're staring at cryptic errors that don't tell you whether it's your code or the sandbox.

Performance takes a noticeable hit. I haven't benchmarked properly, but builds are slower enough that you feel it during iteration.

Here's my honest take: if your team is small and you're just deploying a simple Node.js API, this whole setup is overkill. The engineering hours you'll spend maintaining gVisor + Falco + Cosign might cost more than the actual risk.

Use this if you run third-party code in CI, or if the data you handle is sensitive enough that standard GitHub Actions isolation doesn't cut it.

Falco for runtime visibility

Isolation alone didn't feel sufficient, so I added Falco for runtime monitoring.

Default rules are noisy as hell. I'm still tuning them. But it already paid off.

Seeing a Slack alert fire instantly when I poked at the Docker socket from inside a container was... reassuring.

At least now I'm not completely blind if something weird happens.

Security is always a tradeoff.

OIDC removes the biggest footgun. Keyless signing makes provenance usable without key management hell.

gVisor adds real isolation.

But you pay for it in complexity and performance.

This setup is far from perfect. It adds friction. It breaks things. It forces you to actually understand your pipeline.

But here's the thing: if your CI/CD pipeline still depends on long-lived secrets, you don't have automation.

You have a liability.

And honestly? The next breach is going to happen anyway.

The question is:
will your credentials still be valid when it does?

DEV Community