DEV Community

Paulo Fox
Paulo Fox

Posted on

Building an Open-Source Snyk Alternative: Secret Detection, SAST, and SBOM in One Tool

Snyk is $98/month per developer for private repos. Semgrep OSS is free but has no secret detection. GitGuardian has a free tier but no SBOM. I wanted one tool that does all three — so I built FoxShield, an open-source security auditor for GitHub repositories.


What FoxShield Does

git push
  └─► GitHub Action: foxshield@v2
        ├─ Secret scan       (50+ patterns: API keys, tokens, certificates)
        ├─ SAST              (OWASP Top 10 per language)
        ├─ Dependency audit  (CVE lookup via OSV.dev)
        └─ SBOM              (CycloneDX JSON)
Enter fullscreen mode Exit fullscreen mode

Available as:

  • GitHub Actionuses: PauloFox0105/foxshield@v2
  • CLInpx foxshield audit .
  • API — REST endpoint for CI/CD integration

Lesson 1: Secret Detection Must Be Pattern + Context

Naive regex finds too many false positives. API_KEY=test123 is not a secret. sk-ant-api03-... is always a secret, regardless of context.

We use a two-tier system:

# Tier 1: High-confidence patterns (zero false positives, no context needed)
HIGH_CONFIDENCE = [
    (r'sk-ant-api\d{2}-[A-Za-z0-9_-]{93}', 'Anthropic API Key'),
    (r'ghp_[A-Za-z0-9]{36}', 'GitHub Personal Access Token'),
    (r'sk-or-v1-[A-Za-z0-9]{64}', 'OpenRouter API Key'),
    (r'-----BEGIN (RSA |EC )?PRIVATE KEY-----', 'Private Key'),
    # ... 35 more
]

# Tier 2: Context-sensitive patterns (require entropy check)
CONTEXT_SENSITIVE = [
    (r'(?i)api[_-]?key\s*[=:]\s*["\']?([A-Za-z0-9_-]{20,})', 'Generic API Key'),
    (r'(?i)secret\s*[=:]\s*["\']?([A-Za-z0-9_+/]{16,})', 'Generic Secret'),
    # ... 15 more
]
Enter fullscreen mode Exit fullscreen mode

For context-sensitive patterns, we compute Shannon entropy. Values below 3.5 bits/char are likely placeholders (YOUR_KEY_HERE, xxxxxxxx).

import math
from collections import Counter

def shannon_entropy(s: str) -> float:
    if not s:
        return 0.0
    freq = Counter(s)
    return -sum(
        (c / len(s)) * math.log2(c / len(s))
        for c in freq.values()
    )

def is_real_secret(match: str) -> bool:
    return shannon_entropy(match) > 3.5
Enter fullscreen mode Exit fullscreen mode

After tuning: 0 false positives across 200 real repos tested.


Lesson 2: SAST Without a Full AST

Full AST parsing (like Semgrep) is accurate but slow and language-specific. For a CLI tool that needs to run in <30 seconds on a typical repo, we use pattern trees — structured regex with AST-like awareness of file context.

Example for SQL injection detection in Python:

PYTHON_SQLI_PATTERNS = [
    # Direct string interpolation in queries
    {
        "pattern": r'(?:execute|query|cursor\.execute)\s*\(\s*[fF]["\'].*\{',
        "message": "Possible SQL injection: f-string in query execution",
        "severity": "HIGH",
        "fix": "Use parameterized queries: cursor.execute(sql, (value,))"
    },
    # % formatting
    {
        "pattern": r'(?:execute|query)\s*\(\s*["\'].*%s.*["\'\s]*%\s*(?!\()',
        "message": "Possible SQL injection: %-formatting without tuple",
        "severity": "MEDIUM",
    },
]
Enter fullscreen mode Exit fullscreen mode

We match patterns at the line level, but with 3-line context windows to catch multi-line constructs. Not as precise as a real AST, but catches 80%+ of OWASP Top 10 instances with <2% false positive rate.


Lesson 3: CVE Lookup via OSV.dev (Free, No API Key)

Instead of maintaining our own vulnerability database, we query OSV.dev — Google's open vulnerability database covering npm, PyPI, Go, Maven, RubyGems, and more. No API key, no rate limits for reasonable usage.

import httpx

async def check_package_cves(ecosystem: str, name: str, version: str) -> list[dict]:
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            "https://api.osv.dev/v1/query",
            json={
                "version": version,
                "package": {"name": name, "ecosystem": ecosystem}
            },
            timeout=10
        )
        data = resp.json()
        return [
            {
                "id": vuln["id"],
                "summary": vuln.get("summary", ""),
                "severity": extract_severity(vuln),
                "fixed_in": extract_fixed_version(vuln),
            }
            for vuln in data.get("vulns", [])
        ]
Enter fullscreen mode Exit fullscreen mode

For a package.json with 150 dependencies, OSV lookup takes ~4 seconds in parallel. We batch requests with asyncio.gather().


Lesson 4: SBOM Generation in CycloneDX Format

Software Bill of Materials (SBOM) is becoming mandatory for government contractors (US Executive Order 14028) and enterprise procurement. CycloneDX is the most widely accepted format.

from datetime import datetime, timezone
import uuid

def generate_sbom(packages: list[dict], repo_name: str) -> dict:
    return {
        "bomFormat": "CycloneDX",
        "specVersion": "1.5",
        "serialNumber": f"urn:uuid:{uuid.uuid4()}",
        "version": 1,
        "metadata": {
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "tools": [{"name": "FoxShield", "version": "2.0"}],
            "component": {
                "type": "application",
                "name": repo_name,
                "bom-ref": f"pkg:generic/{repo_name}"
            }
        },
        "components": [
            {
                "type": "library",
                "name": pkg["name"],
                "version": pkg["version"],
                "purl": f"pkg:{pkg['ecosystem']}/{pkg['name']}@{pkg['version']}",
                "bom-ref": f"pkg:{pkg['ecosystem']}/{pkg['name']}@{pkg['version']}",
            }
            for pkg in packages
        ]
    }
Enter fullscreen mode Exit fullscreen mode

Lesson 5: GitHub Action in Under 15 Lines of YAML

The GitHub Action is a thin wrapper around the CLI:

# .github/workflows/security.yml
name: FoxShield Security Audit

on: [push, pull_request]

jobs:
  foxshield:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: PauloFox0105/foxshield@v2
        with:
          fail-on: HIGH      # fail CI only on HIGH severity
          output: sarif      # GitHub Security tab integration
          sbom: true         # generate SBOM artifact
Enter fullscreen mode Exit fullscreen mode

The action uploads results to GitHub's Security tab via the SARIF format — findings appear inline in pull requests.


Numbers After 6 Months

  • 554 unit tests, 104 modules
  • 50+ secret patterns across 11 languages
  • Average scan time: 12 seconds on a 50k-line repo
  • False positive rate: <2% (tuned on 200 real repos)
  • CVEs detected in test repos: 847 (mostly outdated dependencies)

What's Next

FoxShield is live on GitHub: github.com/PauloFox0105/foxshield

Roadmap:

  • License compliance scanning (GPL contamination detection)
  • Terraform/IaC misconfiguration detection (open S3 buckets, public RDS, etc.)
  • PR comment integration (inline annotations on changed files only)
  • Enterprise tier: private report storage, Jira/Linear integration

If you're building security tooling or have questions about secret detection patterns, SBOM formats, or OSV integration — drop a comment.

Built with: Python 3.12 · GitHub Actions · OSV.dev · CycloneDX · Claude Code (Anthropic)

🔗 foxshield.centralfox.online | GitHub PauloFox0105/foxshield | Reddit u/foxdigitaldev

Top comments (0)