Arnold M S

Posted on Apr 30

Open Source OWASP API Security Scanner with AI-Assisted Testing

#security #devsecops #opensource #api

Most security scanners produce a list of vulnerabilities ranked by severity and leave the remediation work to you. After working on projects where that list grew long and the question "can someone actually exploit this right now?" remained unanswered, I built something different.

The result is Breach Gate, an open source CLI tool that combines static analysis, container scanning, dynamic API testing, and AI-assisted behavioral testing into a single pipeline. It outputs one clear answer: SAFE, UNSAFE, or REVIEW REQUIRED.

The Core Problem

Traditional scanners answer: "What vulnerabilities exist?"

Breach Gate answers: "Can an attacker actually compromise the system right now?"

The distinction matters in CI pipelines. A list of medium-severity findings does not tell you whether to block a deployment. A confirmed exploit does.

Breach Gate scores every finding using a multiplicative formula:
Risk = Reachability x Exploitability x Impact x Confidence

A vulnerability that is hard to reach, has no working proof-of-concept, and low confidence stays at a low risk score. A confirmed exploit with a working payload gets boosted to critical regardless of how the individual factors score.

What It Tests

AI-Assisted Behavioral Testing

The scanner generates OWASP-based test cases per endpoint and executes them against your live API. Two mechanisms keep false positives low:

Baseline diffing -- a benign request is sent to each endpoint before any attack probes. Response tokens that appear in the baseline are filtered from vulnerability indicators, eliminating a large class of false positives where generic words like "error" or "id" triggered matches.
Time-based blind injection -- responses delayed more than 3 seconds AND more than 3 times the baseline timing are flagged as potential blind SQL or command injection, which cannot be detected from response bodies alone.

Attack categories covered out of the box:

Category	Detection Method
SQL Injection	Response body, error text, blind timing
Command Injection	Response body, blind timing
XSS	Reflected probe in response
Broken Access Control	Status code shift vs baseline
SSRF	Cloud metadata endpoint probing
Mass Assignment	Privilege field echo in response
JWT Attacks	Algorithm confusion, claim tampering, expired token
Path Traversal	File content indicators in response

Static Analysis via Trivy

Scans your source code and dependencies for known CVEs, exposed secrets, and misconfigurations. Results feed into the same scoring pipeline as dynamic findings.

Container Scanning

Pulls your Docker image and runs Trivy against the filesystem and OS packages. Findings are correlated with the API endpoint they affect where possible.

GraphQL Security Probing

For GraphQL APIs, Breach Gate runs five dedicated probes: introspection exposure, depth-limit denial of service, field suggestion enumeration, variable injection, and IDOR by ID enumeration.

Dynamic Testing via OWASP ZAP

When ZAP is available (local or Docker), the scanner runs an active API scan and merges the results with findings from other scanners.

The Output

SECURITY VERDICT:
╔════════════════════════════════════════════════════════╗
║ UNSAFE TO DEPLOY ║
╚════════════════════════════════════════════════════════╝
Reason: Confirmed exploitation: SQL Injection, Command Injection.
Active attacks succeeded during testing.
2 CONFIRMED EXPLOITS:
SQL Injection on POST /api/data
Command Injection on POST /api/execute
Attack Surface (by endpoint):
POST /api/execute
Risk: 95%
Command Injection
Attack chain: Command Injection -> Full System Compromise
POST /api/data
Risk: 90%
SQL Injection
Attack chain: Injection -> System Compromise

Reports are generated in JSON, Markdown, SARIF, and HTML. The HTML report includes a category filter bar and one-click evidence copy.

CI Integration

Breach Gate is published to the GitHub Marketplace as a composite action:

- name: Run Breach Gate
  uses: epten08/breach-gate@v1
  with:
    target: ${{ vars.STAGING_API_URL }}
    anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
    format: json,markdown,sarif
    output: security-reports

The action outputs a verdict value (PASS or FAIL) that downstream steps can consume, and the SARIF report integrates directly with GitHub Code Scanning.

For teams not on GitHub, the same scan runs via npm:

npx breach-gate scan --target https://staging.api.example.com --ci

The --ci flag sets a non-zero exit code on UNSAFE verdicts, which blocks the deployment step in any CI system.

Watch Mode

For continuous environments, a watch command runs scans on a configurable interval and diffs findings between runs:

breach-gate watch --target http://localhost:3000 --interval 300

New findings are logged as warnings. Resolved findings are logged as informational. This is useful for staging environments that receive frequent deployments.

Suppressing Known Findings

Teams working on legacy APIs often have accepted known issues that are tracked. A .breachgateignore file prevents those from blocking pipelines:

suppress:
  - id: "finding-abc123"
    reason: "Tracked in JIRA-456, fix scheduled for next sprint"
    expires: "2026-06-01"

  - pattern: "Missing security header"
    endpoint: "/api/health"
    reason: "Health check endpoint, intentionally minimal headers"

Rules with an expires date automatically stop suppressing after that date, which prevents forgotten suppressions from masking real regressions.

Getting Started

# Install globally
npm install -g breach-gate

# Run against your API
breach-gate scan --target http://localhost:3000

# Run the built-in demo to see a full vulnerable API scan
git clone https://github.com/epten08/breach-gate
cd breach-gate
npm install
npm run demo        # starts a deliberately vulnerable API
npm run scan        # scans it

An OpenAPI spec can be passed to give the scanner full endpoint coverage:

breach-gate scan --target http://localhost:3000 --openapi ./openapi.yml

Without a spec, the scanner infers common endpoint patterns and uses them as a starting point.

Lessons Learned

Reducing the false positive rate was more challenging than building the detection logic. Early versions flagged nearly everything because words like "error", "id", and "success" appeared in every API response. Combining baseline diffing with restricting body matches to 2xx responses brought the false positive rate to a manageable level.

Prompt design for the Anthropic API also required careful iteration. Prompts using direct offensive language were blocked by content filtering. Reframing the same tests as "authorized penetration testing" and "OWASP-based assessment probes" passed the filter while generating identical test cases.

DEV Community