Alan West

Posted on Mar 23 • Edited on Mar 24

How to Detect and Recover From a Compromised Container Scanner

#security #containers #devops #supplychainsecurity

So you're running Trivy in your CI/CD pipeline, scanning every image before it hits production, feeling pretty good about your security posture. Then you wake up to a PSA on Reddit telling you the scanner itself was compromised. Yeah, that happened.

This is the nightmare scenario for supply chain security — the tool you trust to find vulnerabilities becomes the vulnerability. Let me walk you through what happened, how to check if you're affected, and how to harden your setup so you're not caught flat-footed next time.

What Actually Happened

The compromise targeted Trivy's vulnerability database — the OCI artifact that Trivy pulls down to know what CVEs to scan for. An attacker managed to push a malicious database update to the public registry. When Trivy automatically fetched its latest vulnerability definitions (which it does by default on every run), affected users pulled down a poisoned database.

This is particularly nasty because Trivy didn't need to be "hacked" in the traditional sense. The binary itself was fine. The data it consumed was tampered with. It's like having a perfectly good antivirus engine but feeding it a definitions file that tells it everything is clean.

Step 1: Check If You're Affected

First things first — figure out if your Trivy installation pulled the compromised database. Check your local cache:

# Find where Trivy stores its database
ls -la ~/.cache/trivy/db/

# Check the metadata for download timestamps
cat ~/.cache/trivy/db/metadata.json

# Look at when the DB was last updated
# If the timestamp falls within the compromise window,
# you need to take action
trivy version --format json | jq '.VulnerabilityDB'

If you're running Trivy in a container (which most CI pipelines do), check your pipeline logs for the database download step. Look for any unusual download sources or unexpected database sizes.

# If running Trivy in Docker, inspect the volume
docker run --rm -v trivy-cache:/cache alpine \
  ls -la /cache/db/

# Check the integrity of the current database
# Compare the sha256 against known-good values from
# Aqua Security's official GitHub advisories
sha256sum ~/.cache/trivy/db/trivy.db

Step 2: Clean Up and Reset

If you suspect you pulled the bad database, nuke the cache and start fresh:

# Remove the entire Trivy cache
rm -rf ~/.cache/trivy/

# For containerized setups, remove the volume
docker volume rm trivy-cache

# Re-download with explicit verification
trivy image --download-db-only --reset

# Verify the fresh database
trivy version --format json | jq '.VulnerabilityDB'

But here's the thing — just cleaning the cache isn't enough. Any scans that ran during the compromise window produced untrustworthy results. You need to re-scan every image that went through the pipeline during that period.

# Re-scan your critical images after resetting the DB
for image in $(cat critical-images.txt); do
  echo "Re-scanning: $image"
  trivy image --severity HIGH,CRITICAL "$image"
done

Step 3: Pin Your Database Source

This is the fix most people miss. By default, Trivy fetches its DB from a public OCI registry without pinning a specific digest. That's what made this attack possible.

# In your CI pipeline config, pin the database source
# and verify its checksum
env:
  TRIVY_DB_REPOSITORY: "ghcr.io/aquasecurity/trivy-db:2"
  TRIVY_SKIP_DB_UPDATE: "true"  # Don't auto-update

steps:
  # Download DB as a separate, auditable step
  - name: Download Trivy DB
    run: |
      trivy image --download-db-only
      # Verify the database digest matches expected value
      ACTUAL_DIGEST=$(sha256sum ~/.cache/trivy/db/trivy.db | awk '{print $1}')
      if [ "$ACTUAL_DIGEST" != "$EXPECTED_DB_DIGEST" ]; then
        echo "ERROR: Database digest mismatch!"
        exit 1
      fi

  - name: Scan image
    run: trivy image --skip-db-update my-app:latest

The key here is --skip-db-update during the actual scan. You download the database once, verify it, and then tell Trivy not to fetch anything new during the scan itself.

Step 4: Set Up Offline/Air-Gapped Database Updates

For production-critical pipelines, I've started recommending a self-hosted database mirror. It adds complexity, but it gives you a verification chokepoint:

# Host your own verified copy of the Trivy DB
# Pull the official DB to a staging area first
trivy image --download-db-only --cache-dir /staging/trivy-cache

# Run your verification checks on the staged DB
# (size sanity check, signature verification, diff against previous)
DB_SIZE=$(stat -f%z /staging/trivy-cache/db/trivy.db 2>/dev/null || \
          stat -c%s /staging/trivy-cache/db/trivy.db)

# If the DB size changed dramatically, something's wrong
if [ "$DB_SIZE" -lt 50000000 ]; then  # typical DB is ~100MB+
  echo "WARNING: DB suspiciously small — possible tampering"
  exit 1
fi

# Push verified DB to your internal registry
oras push your-registry.internal/trivy-db:verified \
  /staging/trivy-cache/db/trivy.db

Then point your CI pipeline at your internal registry instead of the public one.

Prevention: Don't Trust Your Security Tools Blindly

This incident taught me a few things I should've already known:

Verify the verifier. Your security scanner is software too, with its own supply chain. Treat it with the same suspicion you treat your dependencies.
Pin everything. Database versions, binary versions, container digests. Use SHA digests, not tags. Tags are mutable — digests aren't.
Monitor database updates. Set up alerts for when your scanner's vulnerability database changes significantly in size or content. A database that suddenly shrinks or grows by 50% is suspicious.
Run multiple scanners. I run both Trivy and Grype in my pipeline. If one gets compromised, the other still catches issues. The overlap isn't wasted effort — it's redundancy.

# Run both scanners and compare results
trivy image --format json my-app:latest > trivy-results.json
grype my-app:latest -o json > grype-results.json

# If one scanner finds zero vulns and the other finds many,
# investigate immediately
TRIVY_COUNT=$(jq '.Results[].Vulnerabilities | length' trivy-results.json | paste -sd+ | bc)
GRYPE_COUNT=$(jq '.matches | length' grype-results.json)

if [ "$TRIVY_COUNT" -eq 0 ] && [ "$GRYPE_COUNT" -gt 10 ]; then
  echo "ALERT: Significant scanner disagreement — possible DB compromise"
fi

The Bigger Picture

This is the same class of problem as the XZ Utils backdoor — supply chain attacks targeting security-adjacent tooling. The attacker doesn't need to break into your system. They just need to compromise something your system trusts implicitly.

I've been doing this for a while now and the pattern is always the same: we adopt a tool, it works great, we start trusting it completely, and then we stop questioning it. The fix isn't to stop using these tools. It's to treat them as components in a defense-in-depth strategy, not as silver bullets.

After dealing with this, I've added database integrity checks to every scanner in my pipeline. It's an extra 30 seconds per build. Worth it.

Quick Checklist

[ ] Check if your Trivy DB was updated during the compromise window
[ ] Clear and re-download your Trivy database cache
[ ] Re-scan all images that passed through your pipeline during the affected period
[ ] Pin your database source with digest verification
[ ] Consider running a second scanner in parallel
[ ] Set up size and content monitoring on database updates
[ ] Review your pipeline for any other tools that auto-update their definitions

Stay paranoid. That's literally the job.

DEV Community