TechLatest

Posted on Jun 3 • Originally published at osintteam.blog on May 14

Pentest-AI: The Complete Guide to AI-Powered Autonomous Penetration Testing in 2026

#penetrationtesting #cybersecurity #pentesting #opensource

AI-powered cybersecurity tooling is evolving rapidly. Traditional scanners can detect vulnerabilities, but they often struggle with authentication, exploit chaining, contextual reasoning, and multi-step attack paths.

At the same time, large language models are becoming capable of orchestrating tools, understanding web applications, interacting with APIs, navigating browsers, and coordinating complex workflows.

This is where Pentest-AI (ptai) enters the picture.

Developed by 0xSteph, Pentest-AI is an open-source autonomous penetration testing framework that combines:

AI agents
MCP (Model Context Protocol)
LLM orchestration
Traditional security tools
Curated vulnerability probes
Automated exploit validation
CI/CD security workflows

Unlike traditional scanners that rely entirely on signatures or templates, Pentest-AI attempts to behave more like a human security operator.

In this detailed guide, we will cover:

What Pentest-AI is
How it works
Architecture and agents
MCP integrations
Installation methods
Running scans without API keys
Claude Code integration
Tool ecosystem
Playbooks
Benchmarks
CI/CD integrations
Local LLM usage with Ollama
Security considerations
Real-world use cases
Limitations
Comparisons with other tools

What is Pentest-AI?

Pentest-AI is an AI-native penetration testing framework designed to automate offensive security workflows using LLMs and real security tooling.

The project combines:

Deterministic vulnerability probes
AI reasoning loops
Wrapped CLI security tools
MCP-compatible interfaces
Reporting and attack-chain correlation

The framework can:

Enumerate targets
Authenticate into applications
Run web security scans
Test APIs
Execute external security tools
Correlate findings
Validate vulnerabilities with PoCs
Generate detection rules
Produce reports automatically

The project is available on GitHub:

GitHub - 0xSteph/pentest-ai: Offensive-security MCP server with 205 wrapped tools, 17 specialist agents, and 60 SPA-aware probes for OWASP Top 10. CLI + MCP, BYO LLM. No API key needed on MCP path.

Note

BlackArch Linux

We also provide a ready-to-deploy BlackArch Linux VM that can be launched instantly on AWS , GCP , or Azure . No installation, setup, or dependency management required — just spin it up and start using a full arsenal of penetration testing and security auditing tools in minutes.

Kali GUI Linux

Our Kali GUI Linux VM comes fully pre-configured with a graphical interface, making it easy for both beginners and professionals to get started. Deploy directly on AWS , GCP , or Azure with zero setup — no installation hassles, just immediate access to a complete offensive security toolkit.

Browser-Based Kali Linux

We offer a browser-based Kali Linux environment that runs entirely in the cloud. Simply deploy and access it from your browser — no downloads, no local setup, no compatibility issues. Deploy directly on AWS , GCP , or Azure with zero setup — no installation hassles, just immediate access to a complete offensive security toolkit. Perfect for quick testing, learning, and remote security operations from anywhere.

ParrotOS Linux

Our ParrotOS Linux VM is optimized for security, privacy, and development workflows. Available for instant deployment on AWS , GCP , and Azure , it eliminates the need for manual installation — giving you a secure, ready-to-use environment in just a few clicks.

Why Pentest-AI is Different

Most security scanners operate in a very linear way.

Typical workflow:

Crawl target
Run signatures
Generate alerts
Produce findings

The problem is that modern applications are far more dynamic.

Today’s applications include:

SPAs (Single Page Applications)
APIs
JWT authentication
OAuth flows
Browser-side rendering
Dynamic JavaScript routing
Complex state management
Cloud-native infrastructure

Traditional scanners often fail because:

They cannot maintain authenticated sessions properly
They struggle with dynamic JavaScript applications
They cannot reason about attack chains
They generate noisy false positives
They cannot validate vulnerabilities safely

Pentest-AI attempts to solve this using AI orchestration.

The LLM does not directly detect vulnerabilities.

Instead, the LLM:

coordinates workflows
chooses tools
interprets outputs
correlates findings
plans next actions
builds attack chains

Meanwhile, the actual vulnerability detection comes from:

deterministic probes
wrapped security tools
curated exploit logic
reproducible validation checks

This separation is important.

The project itself explicitly states:

“The LLM coordinates. The probes detect.”

That is one of the strongest architectural decisions in the project.

Core Features of Pentest-AI

1. Autonomous AI Agents

Pentest-AI includes multiple specialized agents.

Each agent focuses on a different domain of offensive security.

| Agent | Purpose |
|---|---|
| recon | Enumeration and discovery |
| web | Web application testing |
| api_security | API security assessment |
| browser | Browser and DOM analysis |
| ad | Active Directory testing |
| cloud | Cloud security analysis |
| credential_tester | Credential attacks |
| vuln_scanner | Vulnerability aggregation |
| exploit_chain | Multi-step attack chains |
| poc_validator | Proof-of-concept validation |
| detection | Sigma/SPL/KQL rule generation |
| report | Report generation |
| llm_redteam | LLM security testing |
| mobile | Mobile application testing |
| wireless | Wireless reconnaissance |

This allows Pentest-AI to behave more like a coordinated red-team workflow instead of a simple scanner.

2. MCP (Model Context Protocol) Support

One of the most important features of Pentest-AI is MCP support.

MCP allows AI assistants to directly invoke external tools.

This means AI coding assistants such as:

Claude Code
Cursor
Codex
Claude Desktop

can directly operate security tooling through natural language.

For example:

“Run an authenticated SQL injection assessment against the login flow.”

The assistant can then:

Choose the correct tools
execute scans
analyze outputs
continue testing automatically
generate reports

without manually typing commands.

This is a major shift in offensive security workflows.

Instead of:

Human → Tool → Result

You now get:

Human → AI Operator → Multiple Tools → Reasoning Loop

This represents the emerging category of:

AI-native AppSec
Agentic cybersecurity
Autonomous red teaming
AI-assisted offensive tooling

3. Wrapped Security Tools

Pentest-AI wraps over 200 security tools.

Examples include:

| Category | Tools |
|---|---|
| Recon | nmap, masscan |
| Fuzzing | ffuf, gobuster |
| Injection | sqlmap, dalfox |
| CMS Testing | wpscan |
| Password Attacks | hydra, hashcat |
| AD Testing | bloodhound, impacket |
| Cloud | prowler, trivy |
| Secrets | gitleaks, trufflehog |

Instead of reinventing the wheel, Pentest-AI orchestrates existing tooling intelligently.

This is one of the reasons the framework is gaining attention.

4. Authenticated Scanning

Most scanners fail after the login pages.

Pentest-AI supports:

session handling
authentication workflows
credential reuse
cookie management
session refresh

This is critical because most modern applications hide important attack surfaces behind authentication.

5. Exploit Chaining

Traditional scanners generally detect isolated vulnerabilities.

Pentest-AI attempts to correlate findings into attack paths.

Example:

weak authentication → SQL injection → admin access → SSRF → cloud credential exposure

This capability is extremely valuable during real-world engagements.

6. PoC Validation

One of the biggest problems in AppSec is false positives.

Pentest-AI attempts to validate findings using non-destructive proofs-of-concept.

This helps:

reduce noise
improve trust
speed up triage
provide reproducible evidence

7. CI/CD Integration

The framework integrates directly into:

GitHub Actions
GitLab CI
Jenkins

It supports:

SARIF output
severity gating
PR comments
pipeline failures on critical findings

This makes it useful for DevSecOps workflows.

Installation Guide

Pentest-AI supports multiple installation paths.

Method 1: Basic Installation

Install using pip:

pip install ptai

This installs the core framework.

Method 2: Claude Code MCP Integration (Recommended)

If you already use Claude Code, this is the easiest setup.

Install Pentest-AI:

pip install ptai

claude mcp add pentest-ai -- ptai mcp

Restart Claude Code.

You can now issue prompts such as:

Run an authenticated pentest against staging.example.com

No additional API key is required.

Your Claude subscription acts as the LLM backend.

Method 3: Setup for Cursor / Codex / VS Code

Run:

ptai setup --mcp

The framework automatically detects compatible MCP clients and configures them.

Restart the editor afterward.

Method 4: Standalone CLI Mode

You can also run Pentest-AI directly.

Example:

ptai start https://target.com

In standalone mode, you usually need an LLM provider.

Supported providers include:

Anthropic
OpenAI
Ollama
LiteLLM providers
Azure OpenAI
OpenRouter
Groq
DeepSeek
Mistral

Running Pentest-AI Without API Keys

One of the biggest advantages of Pentest-AI is that it can run without API keys.

There are several ways to do this.

For this tutorial, we will use local LLMs through Ollama instead of relying on a Claude subscription. While Pentest-AI integrates seamlessly with Claude Code via MCP, full Claude orchestration still requires an active Anthropic subscription or API-backed access.

One of the most interesting aspects of Pentest-AI is that it can also operate entirely with:

local models
offline inference
deterministic probes
wrapped security tools
local orchestration

without depending on cloud-based AI providers.

This means you can build a fully local AI-powered penetration testing environment directly on your own machine.

In this setup, we will use:

Claude Code (installed locally)
Ollama
Local LLMs
Pentest-AI
Traditional security tooling

The biggest advantages of this approach are:

no API costs
no cloud dependency
improved privacy
offline operation
full local control

Running Pentest-AI with Ollama Local Models

First, verify that Ollama is installed:

ollama --version

Example output:

ollama version is 0.23.3

Next, check the available local models:

ollama list

Example:

NAME ID SIZE
gemma4:e2b 7fbdbf8f5e45 7.2 GB

You can also install additional models that generally perform better for cybersecurity reasoning tasks:

ollama pull qwen2.5-coder

or:

ollama pull deepseek-coder-v2

or:

ollama pull llama3.1

If you are using the Ollama Desktop application, you usually do not need to manually run ollama serve from the terminal. The desktop app automatically starts and manages the Ollama background service for you. As long as the Ollama Desktop app is running, the local API endpoint (http://localhost:11434) remains active and available for tools like Pentest-AI. You can verify this by opening the Ollama Desktop interface and checking that your models are visible and responding correctly. This makes the setup process much simpler because you can configure Pentest-AI to use your local Ollama instance directly, without manually starting the server each time.

Ollama is running on Port http://localhost:11434

Ollama is running on Port http://localhost:11434, try in the browser.

Now configure Pentest-AI to use Ollama:

export OLLAMA_HOST=http://localhost:11434

Optionally specify the model:

export OLLAMA_MODEL=gemma4:e2b

Next, install Pentest-AI:

pip install ptai

Starting the First AI-Powered Scan

After configuring Ollama and Pentest-AI, the first scan can be launched directly from the terminal.

In this setup, Pentest-AI was configured to use a fully local LLM through Ollama:

export OLLAMA_HOST=http://localhost:11434

Optionally specify the local model:

export OLLAMA_MODEL=gemma4:e2b

Now start the scan:

ptai start https://target.com

During the first execution, Pentest-AI displays an authorization and acceptable-use prompt:

pentest-ai is offensive security tooling. By using it you confirm:

1. You have explicit, written authorization to test every target
2. You will comply with applicable laws
3. You accept the Acceptable Use Policy
4. You accept the Terms of Service

This is an important safeguard because Pentest-AI performs real offensive security operations and should only be used against authorized targets.

After accepting the prompt, Pentest-AI asks which LLM backend should be used:

1 Anthropic API key (Claude direct)
2 OpenAI API key (GPT direct)
3 Ollama (local model)
4 Skip — I use ptai through Claude Code (MCP server)
5 Skip — deterministic only, no AI

In this tutorial, Ollama local models were selected:

Choice [1/2/3/4/5]: 3

Pentest-AI then initializes the local AI workflow engine:

Using Ollama. Make sure it is running on http://localhost:11434.

The framework then begins launching the engagement:

Starting Engagement
pentest-ai v0.14.0
Target: https://target.com
Scope: full
Intensity: normal

The terminal output also shows the orchestration layer initializing:

agent_mode: 274 action handlers registered

This means the framework has loaded:

AI workflow handlers
probe orchestration logic
tool execution layers
engagement pipelines
scanning modules

Finally, the scan begins:

Scanning https://target.com...

At this stage, Pentest-AI starts coordinating:

reconnaissance
endpoint discovery
vulnerability probes
tool orchestration
workflow reasoning
attack-chain analysis

all using local LLM inference through Ollama without requiring any external cloud AI provider.

Important Note About Testing

For safety and legal reasons, scans should only be performed against:

systems you own
intentionally vulnerable labs
authorized environments
bug bounty targets within scope

Good testing environments include:

OWASP Juice Shop
DVWA
Metasploitable
WebGoat
PortSwigger Academy Labs

Using intentionally vulnerable applications is strongly recommended while learning AI-assisted offensive security workflows.

Running Pentest-AI Against OWASP Juice Shop

After configuring Ollama and Pentest-AI, the framework was tested against a local OWASP Juice Shop instance.

First, Juice Shop was launched locally using Docker:

docker run -d -p 3000:3000 bkimminich/juice-shop

Docker returned the container ID:

1b2a25f39d2bdf9249f22c710406243ea8443d289b44c458e25b01a24fe13b93

The application was then accessible locally at:

http://localhost:3000

Next, Pentest-AI was launched against the target:

ptai start http://localhost:3000

Pentest-AI immediately initialized the engagement workflow:

Starting Engagement
pentest-ai v0.14.0
Target: http://localhost:3000
Scope: full
Intensity: normal

The framework then loaded its orchestration engine:

agent_mode: 274 action handlers registered

This indicates that Pentest-AI successfully initialized:

AI workflow orchestration
scanning pipelines
probe handlers
attack-chain logic
external tool coordination
reporting systems

The scan then began:

Scanning http://localhost:3000...

At this stage, the framework starts:

endpoint discovery
route enumeration
vulnerability probing
fingerprinting
tool orchestration
workflow reasoning
attack-chain analysis

all powered locally through Ollama without requiring any external cloud-based AI provider.

What Makes This Interesting?

This setup demonstrates one of the most important aspects of Pentest-AI:

Fully local AI-assisted offensive security workflows.

In this environment:

The LLM runs locally through Ollama
The target application runs locally through Docker
Pentest-AI orchestrates scans locally
No cloud API keys are required
No external infrastructure is needed

This creates a completely self-hosted AI-powered AppSec lab environment directly on a local machine.

Important Observation

At this stage, Pentest-AI may not immediately display extensive findings if many external offensive tools are missing.

Since the framework relies heavily on:

nmap
nuclei
ffuf
sqlmap
gobuster
and other tooling

The quality of scans depends heavily on the installed security toolchain.

However, even with minimal tooling installed, the framework still demonstrates:

orchestration logic
local AI integration
probe execution
engagement workflows
MCP-style operational design

which is already extremely valuable for AI security research and experimentation.

Installing Security Tools

Pentest-AI wraps many external tools.

The framework provides several installation strategies.

Automatic Tool Installation

At an engagement startup, Pentest-AI predicts which tools are needed.

Example:

ptai start https://target.com

It then prompts you to install missing tools.

Batch Installation Tiers

Install essential tools:

ptai setup --tier core

Recommended tools:

ptai setup --tier recommended

Full installation:

ptai setup --tier full

Per-Tool Installation

Install only specific tools:

ptai setup --per-tool wpscan,dalfox,paramspider

Interactive installer:

ptai setup --wizard

Example Workflow

A typical workflow may look like this:

pip install ptai
ptai setup --tier recommended
claude mcp add pentest-ai -- ptai mcp

Then inside Claude Code:

Run an authenticated OWASP Top 10 assessment against staging.example.com

The system may then:

Enumerate endpoints
Authenticate into the application
Run probes
Launch external tools
Correlate findings
Validate vulnerabilities
Generate reports

Playbooks

Pentest-AI supports YAML playbooks.

Playbooks encode reusable methodologies.

Example:

name: internal-ad-pentest

phases:
  - id: recon
    too

Conclusion

Pentest-AI represents one of the most interesting examples of AI-native cybersecurity tooling currently emerging in the open-source security ecosystem. Instead of relying purely on static scanners or template-based detection, the framework combines LLM orchestration, MCP integrations, deterministic probes, wrapped offensive security tools, and attack-chain reasoning into a single workflow engine.

What makes the project especially impressive is its flexibility. Pentest-AI can operate with cloud-based models like Claude or GPT-4, but it can also run entirely offline using local LLMs through Ollama, enabling fully self-hosted AI-powered AppSec labs without API costs or external dependencies.

While fully autonomous penetration testing still has significant limitations and does not replace experienced human security researchers, Pentest-AI offers a strong glimpse into the future of AI-assisted offensive security, automated AppSec workflows, and agentic cybersecurity systems.

For AppSec engineers, AI researchers, bug bounty hunters, and cybersecurity enthusiasts, Pentest-AI is absolutely worth exploring.