Alan West

Posted on Apr 20

Your Deployment Platform Got Breached — Here's Your Incident Response Playbook

#security #devops #webdev #infrastructure

So you wake up, check Hacker News with your morning coffee, and see that your deployment platform just confirmed a security breach. Hackers are reportedly selling stolen data. Your stomach drops.

I've been through this exact scenario twice in my career — once with a CI/CD provider and once with a hosting platform. The panic is real, but the response doesn't have to be chaotic. Here's the step-by-step playbook I use now, refined through painful experience.

The Real Problem: You Don't Know What Was Exposed

The hardest part of a third-party breach isn't the breach itself — it's the uncertainty. Your deployment platform has access to a terrifying amount of sensitive data:

Environment variables (database URLs, API keys, secrets)
Source code and build artifacts
Deployment tokens and webhook secrets
Team member emails and access patterns
Connected Git repository tokens

The vendor's disclosure might take days or weeks to fully detail what was compromised. You can't wait that long.

Step 1: Audit What Your Platform Actually Had Access To

Before you start rotating everything frantically, take 15 minutes to map your exposure. I use a simple script to pull environment variables from my projects and categorize them by risk:

#!/bin/bash
# audit-env-exposure.sh
# Pulls all env vars from your project configs and categorizes risk

echo "=== HIGH RISK (rotate immediately) ==="
echo "Database connection strings, API keys with write access, auth secrets"
grep -rn 'DATABASE_URL\|DB_PASSWORD\|API_KEY\|SECRET_KEY\|PRIVATE_KEY\|AUTH_SECRET' \
  .env* docker-compose*.yml 2>/dev/null | grep -v node_modules

echo ""
echo "=== MEDIUM RISK (rotate within 24h) ==="
echo "Third-party service tokens, webhook secrets"
grep -rn 'TOKEN\|WEBHOOK_SECRET\|SMTP_PASS\|REDIS_URL' \
  .env* docker-compose*.yml 2>/dev/null | grep -v node_modules

echo ""
echo "=== CHECK THESE ==="
echo "Any env var you don't recognize — could be legacy or forgotten"
grep -rn '^[A-Z_]*=' .env* 2>/dev/null | grep -v node_modules

Don't just check your current projects. Check archived ones too. That side project from 2024 with a Stripe test key? Yeah, that counts.

Step 2: Rotate Secrets — In the Right Order

This is where people mess up. They start rotating everything at once and break production in three places simultaneously. Here's the order that minimizes downtime:

First: Database credentials. This is your most critical data.

# Example: rotating PostgreSQL credentials
# 1. Create new credentials FIRST
psql -U admin -c "CREATE ROLE app_user_new WITH LOGIN PASSWORD 'new-secure-password';"
psql -U admin -c "GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO app_user_new;"

# 2. Update your application config to use new credentials
# 3. Deploy with new credentials
# 4. Verify everything works
# 5. THEN revoke old credentials
psql -U admin -c "REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA public FROM app_user_old;"
psql -U admin -c "DROP ROLE app_user_old;"

Second: Auth and session secrets. Rotating these will log out all users, so do it during low-traffic hours and have your support team ready.

Third: Third-party API keys. Regenerate in each service's dashboard. Most services let you have two active keys simultaneously — use that to avoid downtime.

Fourth: Git integration tokens and deploy hooks. Revoke and recreate OAuth connections between your Git provider and deployment platform.

Step 3: Check for Unauthorized Access

This is the step people skip, and it's arguably the most important. If attackers had your deployment tokens, they might have pushed malicious code or modified build configs.

# Check your git history for unexpected commits
git log --oneline --since="2026-04-01" --all | head -50

# Look for commits from unrecognized authors
git log --format='%an <%ae>' --since="2026-04-01" | sort -u

# Check if any build/deploy scripts were modified recently
git log --oneline --since="2026-04-01" -- \
  '**/Dockerfile' \
  '**/.github/workflows/*' \
  '**/package.json' \
  '**/*.config.*' \
  '**/next.config.*'

# Diff your current deploy config against a known good state
git diff HEAD~20..HEAD -- Dockerfile docker-compose.yml

Also check your DNS records. Supply chain attacks through deployment platforms sometimes involve modifying DNS to point your domain at attacker-controlled infrastructure.

# Verify your DNS records match expectations
dig +short yourdomain.com A
dig +short yourdomain.com CNAME
dig +short yourdomain.com NS

Step 4: Harden Your Setup to Limit Future Blast Radius

Here's what I do on every project now, regardless of which platform I'm deploying to:

Use a secrets manager instead of platform env vars

Stop storing secrets directly in your deployment platform's environment variable UI. Use a dedicated secrets manager and fetch secrets at runtime.

# Instead of reading from os.environ directly,
# pull from a secrets manager at startup
import boto3
import json

def load_secrets(secret_name: str, region: str = "us-east-1") -> dict:
    """Fetch secrets at runtime — they never touch your deploy platform."""
    client = boto3.client("secretsmanager", region_name=region)
    response = client.get_secret_value(SecretId=secret_name)
    return json.loads(response["SecretString"])

# Your deployment platform only needs ONE credential:
# the IAM role or service account to access the secrets manager
secrets = load_secrets("prod/my-app")
db_url = secrets["DATABASE_URL"]  # never stored in platform env vars

This way, even if your deployment platform is fully compromised, attackers get one narrowly-scoped credential instead of your entire secret inventory.

Scope your tokens ruthlessly

Git tokens should be read-only unless your workflow specifically needs write access
Database credentials should use least-privilege roles — your web app doesn't need DROP TABLE permissions
API keys should be scoped to specific operations, not full admin access

Enable audit logging everywhere

If you're self-hosting anything, make sure you have audit logs that are stored separately from the systems being audited. When a breach happens, logs are the first thing attackers try to delete.

Step 5: Decide Whether to Migrate

This is the uncomfortable question. After a breach, should you move to a different platform?

Honestly? Not necessarily. Every platform is a target. The question is how the vendor responds:

Good signs: Quick disclosure, clear timeline, transparent about what was accessed, concrete remediation steps, independent security audit announced
Bad signs: Vague language, delayed disclosure, minimizing scope, no post-mortem commitment

If you do decide to diversify, consider splitting your infrastructure so no single platform compromise exposes everything. Run your database on one provider, your compute on another, and your CDN on a third. It's more operational overhead, but it limits blast radius.

The Bigger Lesson

Every time I go through one of these incidents, I'm reminded of the same thing: treat every third-party platform as a potential breach vector. Not because they're all insecure, but because the math is against us. The more services in your stack, the higher the probability that at least one gets popped eventually.

Design your architecture so that no single compromise gives an attacker the keys to the kingdom. Rotate secrets regularly even when nothing's wrong — if you're rotating quarterly, a breach just means you're doing an off-cycle rotation instead of a panicked scramble.

And keep that incident response playbook somewhere you can actually find it at 6 AM on a Saturday. Trust me on that one.

DEV Community