DEV Community

Cover image for Strix: Give Your App a Free AI Pentester Before It Ships
ArshTechPro
ArshTechPro

Posted on

Strix: Give Your App a Free AI Pentester Before It Ships

If you have ever pushed code to production and quietly hoped nobody probes it too hard, this project is for you.

Strix is an open-source tool that runs autonomous AI agents against your application the way a real attacker would: it spins up your app, pokes at it, tries to break in, and proves whether a vulnerability is real before it tells you about it. No SaaS lock-in required, no waiting weeks for a pentest firm to get back to you.

In this article I will walk through what Strix actually does, how it is different from the static analysis tools you already have, and how to run your first scan.

The Problem With Most "Security Scanning"

Most security tooling developers use day to day falls into two buckets:

  • Static analysis (SAST) — scans your source code for risky patterns. Fast, but full of false positives, and it has no idea how your app actually behaves at runtime.
  • Dependency scanners — flag known CVEs in your packages. Useful, but they say nothing about the business logic bugs you wrote yourself.

Neither of these actually runs your application and tries to exploit it. That's traditionally been the job of a human penetration tester, and hiring one is slow and expensive.

Strix tries to close that gap by giving AI agents an actual hacker's toolkit and letting them attack a running instance of your app, the same way a person would.

What Strix Actually Does

Strix agents come with:

  • A full HTTP proxy for inspecting and manipulating requests and responses
  • Browser automation for testing things like XSS, CSRF, and auth flows across multiple tabs
  • Terminal access for running commands and testing interactively
  • A Python runtime for writing custom exploits on the fly
  • Reconnaissance tooling for mapping out the attack surface
  • Static and dynamic code analysis

Instead of one agent doing everything, Strix uses a "graph of agents" model, so multiple specialized agents can work in parallel on different parts of your app and share what they find with each other.

Critically, when Strix reports a vulnerability, it comes with an actual proof-of-concept demonstrating that the exploit works, not just a pattern match that says "this line looks suspicious." That is the main thing that separates it from a linter with a security label on it.

It can detect things like:

  • Access control issues — IDOR, privilege escalation, auth bypass
  • Injection attacks — SQL, NoSQL, command injection
  • Server-side flaws — SSRF, XXE, insecure deserialization
  • Client-side flaws — XSS, prototype pollution, DOM issues
  • Business logic bugs — race conditions, workflow abuse
  • Auth issues — JWT vulnerabilities, broken session management
  • Infrastructure misconfigurations and exposed services

Installing and Running Your First Scan

You need two things before you start:

  1. Docker, running locally (Strix's agents operate inside a sandbox container)
  2. An API key from a supported LLM provider — OpenAI, Anthropic, Google, or a local model via Ollama/LMStudio

Install it:

curl -sSL https://strix.ai/install | bash
Enter fullscreen mode Exit fullscreen mode

Configure which model powers the agents:

export STRIX_LLM="openai/gpt-5.4"
export LLM_API_KEY="your-api-key"
Enter fullscreen mode Exit fullscreen mode

Then point it at a target:

strix --target ./app-directory
Enter fullscreen mode Exit fullscreen mode

That's it. On first run it will pull the sandbox Docker image, then start testing. Results land in strix_runs/<run-name>.

Strix will also remember your config after the first run, saving it to ~/.strix/cli-config.json so you don't have to set environment variables every time.

Beyond a Local Folder

Strix isn't limited to scanning a directory on your machine. A few other ways to point it at something:

# Review a GitHub repo directly
strix --target https://github.com/org/repo

# Black-box test a live deployed app
strix --target https://your-app.com

# Test both the source code and the deployed app together
strix -t https://github.com/org/app -t https://your-app.com
Enter fullscreen mode Exit fullscreen mode

If your app needs a login, you can hand Strix credentials and let it test authenticated flows:

strix --target https://your-app.com \
  --instruction "Perform authenticated testing using credentials: user:pass"
Enter fullscreen mode Exit fullscreen mode

You can also steer it toward specific concerns instead of a generic sweep:

strix --target api.your-app.com \
  --instruction "Focus on business logic flaws and IDOR vulnerabilities"
Enter fullscreen mode Exit fullscreen mode

For more detailed rules of engagement, scope, or exclusions, hand it a file instead of a one-liner:

strix --target api.your-app.com --instruction-file ./instruction.md
Enter fullscreen mode Exit fullscreen mode

Running It Headless

If you want to run Strix as part of an automated job rather than interactively, use non-interactive mode:

strix -n --target https://your-app.com
Enter fullscreen mode Exit fullscreen mode

It prints findings in real time and exits with a non-zero status code if it finds vulnerabilities, which makes it straightforward to fail a build on real findings.

Wiring It Into CI/CD

This is where Strix gets genuinely useful for a team: instead of running a security scan occasionally, you run it on every pull request and block insecure code before it merges.

A minimal GitHub Actions setup:

name: strix-penetration-test

on:
  pull_request:

jobs:
  security-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 0

      - name: Install Strix
        run: curl -sSL https://strix.ai/install | bash

      - name: Run Strix
        env:
          STRIX_LLM: ${{ secrets.STRIX_LLM }}
          LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
        run: strix -n -t ./ --scan-mode quick
Enter fullscreen mode Exit fullscreen mode

A couple of practical notes if you're setting this up:

  • Use fetch-depth: 0 in the checkout step. On PR runs, Strix automatically scopes a quick review to just the changed files, and it needs full git history to resolve that diff correctly. If it can't, pass --diff-base explicitly.
  • Store your LLM credentials as GitHub secrets, not plain environment variables.

Configuration Options Worth Knowing

export STRIX_LLM="openai/gpt-5.4"
export LLM_API_KEY="your-api-key"

# Optional: point at a local model instead
export LLM_API_BASE="your-api-base-url"

# Optional: enables search capability for the agents
export PERPLEXITY_API_KEY="your-api-key"

# Optional: control how much the model "thinks"
export STRIX_REASONING_EFFORT="high"  # default is high; quick scans use medium
Enter fullscreen mode Exit fullscreen mode

As for which model to run it with, the project currently recommends:

  • OpenAI GPT-5.4 (openai/gpt-5.4)
  • Anthropic Claude Sonnet 4.6 (anthropic/claude-sonnet-4-6)
  • Google Gemini 3 Pro Preview (vertex_ai/gemini-3-pro-preview)

It also supports Vertex AI, Bedrock, Azure, and local models — worth checking the LLM providers docs if you want to run something self-hosted.

Where This Fits in Your Workflow

A reasonable way to think about where Strix fits alongside what you probably already run:

Tool type What it catches What it misses
SAST / linters Risky code patterns Runtime behavior, business logic
Dependency scanners Known CVEs in packages Bugs in your own code
Strix Exploitable, validated vulnerabilities with a working PoC Anything outside the scope you give it

Strix is not a replacement for code review or a full professional pentest on a critical system, but as a fast, repeatable layer that actually tries to exploit your app before an attacker does, it fills a gap that static tooling structurally can't.

Trying It Out

If you want to run this against something disposable first rather than your production app, that's a reasonable way to get a feel for it — spin up a small local project, point Strix at it, and see what it turns up.

One important note directly from the maintainers: only test applications you own or have explicit permission to test. You are responsible for using it ethically and legally.

Repo: github.com/usestrix/strix

Top comments (0)