Forem

Cover image for Why Your CI/CD Pipeline Is Your Biggest Security Blind Spot (And How to Fix It)
Viktor Bulanek
Viktor Bulanek

Posted on

Why Your CI/CD Pipeline Is Your Biggest Security Blind Spot (And How to Fix It)

You deploy code 200 times a year. You pentest once.

That's the reality for most engineering teams I've worked with over the past 20 years building infrastructure across fintech, IoT, and energy platforms. We obsess over test coverage for functionality, we automate linting, we run integration tests on every PR - but when it comes to security, we still operate like it's 2010.

Schedule a pentest. Wait three weeks. Get a PDF. Fix the critical stuff. Repeat next year.

Meanwhile, every commit between those annual tests is a roll of the dice.

The deploy-to-test gap is where breaches happen

Let's think about this concretely. Say your team merges 15 PRs per week. That's roughly 750 code changes per year. A traditional pentest captures a snapshot of one of those 750 states. The other 749? Untested.

This isn't a theoretical problem. Some of the most damaging breaches in recent years happened in the window between the last security assessment and the next one - through a misconfigured API endpoint that went live on a Tuesday, or an auth bypass introduced in a dependency update nobody reviewed for security implications.

The fundamental issue isn't that pentesting is bad. It's that point-in-time security assessment doesn't match continuous delivery.

What "shifting security left" actually means in practice

The phrase "shift left" gets thrown around a lot, but most teams implement it superficially. They add a SAST scanner to the pipeline, get 400 findings on the first run, mark most as false positives, and move on. That's not security. That's checkbox compliance.

Genuine security-left means three things:

Testing security the same way attackers think. Attackers don't run static analysis. They chain vulnerabilities — a low-severity information disclosure leads to an IDOR, which leads to privilege escalation, which leads to data exfiltration. Your testing needs to model these chains, not just flag individual CVEs.

Testing continuously, not periodically. If your code changes daily, your security validation should too. This doesn't mean running a full pentest on every commit — that's impractical. It means having automated security agents that understand your application context and can assess meaningful changes as they happen.

Producing actionable output, not reports. A PDF saying "SQL injection found in /api/users endpoint, severity: high" is useless if it doesn't come with a specific fix in the context of your codebase. The output of security testing should be a pull request, not a document.

The tools landscape and where the gaps are

Let's break down what's available today and where each approach falls short.

Static Analysis (SAST)

Tools like Semgrep, SonarQube, and CodeQL scan source code for known vulnerability patterns.

# Typical GitHub Actions SAST setup
- name: Run Semgrep
  uses: returntocorp/semgrep-action@v1
  with:
    config: >-
      p/owasp-top-ten
      p/security-audit
Enter fullscreen mode Exit fullscreen mode

Where it works: Catching known patterns — hardcoded secrets, obvious injection sinks, insecure crypto usage.

Where it fails: It can't find business logic flaws, authentication bypasses, or vulnerabilities that only manifest at runtime. And the false positive rate is brutal — I've seen teams where 80% of SAST findings are noise, which trains developers to ignore all of them.

Dynamic Analysis (DAST)

Tools like OWASP ZAP and Burp Suite test running applications by sending requests and analyzing responses.

# Basic ZAP scan in CI
docker run -t ghcr.io/zaproxy/zaproxy:stable zap-baseline.py \
  -t https://staging.yourapp.com \
  -r report.html
Enter fullscreen mode Exit fullscreen mode

Where it works: Finding runtime issues that SAST misses — CORS misconfigurations, missing security headers, reflected XSS.

Where it fails: Dumb crawling. Traditional DAST tools don't understand your application. They can't log in properly, navigate complex SPAs, or test authenticated endpoints in a meaningful way. They find the low-hanging fruit and miss the stuff that actually matters.

Software Composition Analysis (SCA)

Tools like Dependabot, Snyk, and Renovate track vulnerabilities in your dependencies.

// dependabot.yml
{
  "version": 2,
  "updates": [
    {
      "package-ecosystem": "npm",
      "directory": "/",
      "schedule": { "interval": "daily" }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Where it works: Keeping dependencies patched is table stakes. SCA does this well.

Where it fails: Knowing a dependency has a CVE doesn't tell you if your application is actually exploitable through that vulnerability. Is the vulnerable function even in your code path? SCA can't answer that.

The missing piece: continuous exploitation testing

None of the tools above answer the question: "Can an attacker actually break into my application right now?"

That's what penetration testing answers — but manually, expensively, and infrequently.

The gap in the market is automated security testing that thinks like a pentester: chains findings together, attempts actual exploitation paths, and does it continuously as part of your development workflow.

This is where AI-driven approaches are starting to make a real difference. Rather than pattern matching (SAST) or dumb fuzzing (DAST), AI agents can reason about your application — understanding that a certain information disclosure endpoint, combined with a weak session management implementation, creates an exploitation chain that neither tool would flag individually.

If you're curious about how this approach works in practice, I wrote more context on AI pentest platform info — it's the project I've been building to bridge this exact gap.

A practical security layer for your pipeline

Whether or not you adopt AI-driven testing, here's a minimum viable security pipeline I'd recommend for any team shipping code regularly:

# .github/workflows/security.yml
name: Security Pipeline
on:
  pull_request:
    branches: [main]

jobs:
  secrets-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Scan for secrets
        uses: trufflesecurity/trufflehog@main
        with:
          extra_args: --only-verified

  sast:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: returntocorp/semgrep-action@v1
        with:
          config: p/owasp-top-ten

  dependency-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Audit dependencies
        run: npm audit --audit-level=high

  dast-staging:
    runs-on: ubuntu-latest
    needs: [secrets-scan, sast, dependency-check]
    if: github.event.pull_request.base.ref == 'main'
    steps:
      - name: ZAP Baseline Scan
        uses: zaproxy/action-baseline@v0.10.0
        with:
          target: ${{ secrets.STAGING_URL }}
Enter fullscreen mode Exit fullscreen mode

This gives you four layers:

  1. Secrets detection (TruffleHog) — catches API keys and credentials before they hit the repo
  2. Static analysis (Semgrep) — flags known vulnerability patterns in your code
  3. Dependency audit — checks for known CVEs in your package tree
  4. Dynamic scan (ZAP baseline) — tests the running application for common web vulnerabilities

It's not a pentest. It won't find complex business logic flaws or chained exploitation paths. But it catches the embarrassing stuff automatically, on every PR.

What I'd actually measure

If you're building a security program from scratch, don't measure "number of vulnerabilities found." That metric incentivizes finding easy issues and ignoring hard ones.

Instead, track these:

Mean time to remediation (MTTR): How long from when a vulnerability is detected to when it's fixed in production? If this number is weeks, your process is broken regardless of what tools you use.

Coverage of authenticated surfaces: What percentage of your authenticated API endpoints are tested by your automated security tools? For most teams, this number is shockingly low — often under 20%.

Exploitability rate: Of the vulnerabilities your tools find, how many are actually exploitable in your specific context? If this is below 30%, you have a signal-to-noise problem.

The uncomfortable truth

Most engineering teams don't have a tooling problem. They have a prioritization problem.

Security work consistently loses to feature work in sprint planning. The pentest report sits in a ticket backlog for months. The SAST findings get triaged once and then ignored.

The solution isn't better tools (though those help). It's making security findings look like bug fixes — pull requests with specific code changes, not PDF reports that require translation into actionable work.

That's the direction the industry is moving. Whether through AI-driven platforms, better DAST tools, or smarter pipeline integrations, the goal is the same: make security validation as automatic and continuous as your test suite.

We're not there yet, but we're getting closer.


I've been building security and infrastructure systems for 20+ years, currently working on closing the gap between continuous deployment and continuous security testing. Hit me up in the comments if you have questions about setting up security pipelines or if you've found approaches that work well for your team.

Top comments (0)