DEV Community

Cover image for AI-Powered Penetration Testing: How I Used Claude + Kali Linux MCP to Automate Security Assessments
Hassan Aftab
Hassan Aftab

Posted on

AI-Powered Penetration Testing: How I Used Claude + Kali Linux MCP to Automate Security Assessments

Introduction: The Future of Offensive Security is Conversational

Picture this: Instead of juggling multiple terminal windows, memorizing command syntax, and manually piecing together scan results, you simply have a conversation with an AI that executes security tools, analyzes findings, and generates comprehensive reports—all in real-time.

Sounds like science fiction? It's not. I just completed a full penetration test using Claude Desktop connected to a Kali Linux MCP (Model Context Protocol) server, and the experience has fundamentally changed how I think about security assessments.

In this article, I'll walk you through exactly how I set this up, what I discovered, and why this approach is a game-changer for DevSecOps professionals.

The Problem with Traditional Pen Testing

As security professionals, we've all been there:

  • Terminal juggling: Multiple SSH sessions, tmux panes, and terminal tabs
  • Command syntax hell: Was it nmap -sV -sC or -sC -sV? Do I need sudo?
  • Context switching: Running a scan, analyzing output, documenting findings, then moving to the next tool
  • Report fatigue: Hours spent formatting findings into readable reports
  • Knowledge gaps: Junior analysts missing critical steps in methodology

Don't get me wrong—traditional pen testing works. But it's slow, error-prone, and doesn't scale well for modern DevSecOps teams conducting continuous security assessments.

Enter AI-Assisted Security Testing

The idea is simple but powerful: What if we could have a conversational interface to our security tools?

Instead of this traditional workflow:

# Terminal 1: Port scanning
nmap -sV -sC -p 80,443 target.example.com -oN nmap_results.txt

# Terminal 2: Directory enumeration  
ffuf -u https://target.example.com/FUZZ -w wordlist.txt -mc 200,403

# Terminal 3: Header analysis
curl -I https://target.example.com

# Terminal 4: Take notes, start writing report...
vim findings.md
Enter fullscreen mode Exit fullscreen mode

We could do this:

Me: "Run nmap on ports 80 and 443, then check for common vulnerabilities with other tools"

AI: *Executes scans, analyzes results, identifies issues*
    "I've completed the assessment. Found strong security headers but 
    discovered CSP using 'unsafe-inline'. Here's the full report..."
Enter fullscreen mode Exit fullscreen mode

This isn't just about convenience—it's about fundamentally rethinking how we approach security testing.

The Technology Stack

Here's what I used to make this work:

1. Claude Desktop

The AI interface that understands security context and can reason about findings. Claude can:

  • Understand security terminology and concepts
  • Chain multiple tools together logically
  • Analyze output for actual vulnerabilities (not just running tools)
  • Generate professional reports with prioritized findings

2. Kali Linux MCP Server

A Model Context Protocol server that provides Claude with secure access to a Kali Linux environment. Think of it as a bridge between the AI and your security tools.

Key features:

  • Sandboxed execution environment
  • Support for all standard Kali tools
  • Background job management for long-running scans
  • Output parsing and structured results

3. Traditional Security Tools

The same tools we know and love:

  • nmap - Port scanning and service detection
  • ffuf - Directory and file enumeration
  • whatweb - Technology fingerprinting
  • curl - HTTP analysis
  • netcat - Network connectivity testing

The difference? Now they're orchestrated by AI.

The Assessment: A Real-World Example

Let me walk you through an actual security assessment I performed using this setup. (All identifying information has been anonymized for obvious reasons.)

Phase 1: Reconnaissance

Me: "Let's run a quick nmap scan on ports 80 and 443"

Claude immediately executed:

nmap -sV -sC -p 80,443 target.example.com
Enter fullscreen mode Exit fullscreen mode

Within seconds, I had results showing:

  • Port 80: Returns 426 Upgrade Required (forcing HTTPS ✅)
  • Port 443: nginx server, valid SSL certificate
  • Technology: HTML5 Single Page Application

But here's where it gets interesting—Claude didn't just dump the nmap output. It analyzed it:

"The application properly enforces HTTPS. Port 80 returns 426 Upgrade Required,
which is better than a simple 301 redirect. The SSL certificate is valid and
the server is running nginx. Let's check the security headers next."

Phase 2: Security Header Analysis

Without me having to specify the exact curl command, Claude proactively ran:

curl -I https://target.example.com
Enter fullscreen mode Exit fullscreen mode

And immediately identified the security posture:

✅ Strong Controls Found:

  • HSTS with preload (max-age=31536000)
  • X-Frame-Options: SAMEORIGIN
  • X-XSS-Protection: 1; mode=block
  • X-Content-Type-Options: nosniff
  • Secure cookies with HttpOnly and SameSite flags

⚠️ Concerns Identified:

  • Content-Security-Policy uses 'unsafe-inline'
  • Multiple trusted external domains in CSP

Claude didn't just list these—it explained the security implications of each finding.

Phase 3: Directory Enumeration

Me: "Let's try ffuf for directory enumeration"

Claude executed:

ffuf -u https://target.example.com/FUZZ \
     -w /usr/share/seclists/Discovery/Web-Content/common.txt \
     -mc 200,204,301,302,307,401,403,405 \
     -fc 404 -t 40
Enter fullscreen mode Exit fullscreen mode

Results: 4,750 paths tested in ~25 seconds at 190 req/sec

The interesting part? Claude automatically analyzed the results:

  • 4,674 paths returned 200 with identical content (typical SPA behavior)
  • 19 paths returned 403 Forbidden (sensitive files properly blocked)
  • 2 paths returned 301 redirects (potential misconfiguration!)

Phase 4: Deep Dive on Findings

Here's where AI really shines. Claude noticed something interesting in the redirects:

/assets → http://target.example.com:4200/assets/
/media  → http://target.example.com:4200/media/
Enter fullscreen mode Exit fullscreen mode

Claude's analysis:

"These redirects reference port 4200, which is the default port for Angular
development servers. This suggests either:

  1. A development server is running in production
  2. Nginx is misconfigured
  3. Assets aren't being served properly

Let me test if port 4200 is accessible..."

Claude then automatically ran netcat to test:

nc -zv -w 5 target.example.com 4200
Enter fullscreen mode Exit fullscreen mode

Result: Port 4200 is NOT accessible externally (good!), but the configuration issue remains.

This kind of contextual reasoning—connecting dots between different findings—is where AI really excels.

Phase 5: Comprehensive Reporting

At the end, I simply said: "Generate a comprehensive security report"

Claude produced:

  • ✅ Executive summary with risk rating
  • ✅ Detailed findings with CVSS-style severity ratings
  • ✅ Technical details for each vulnerability
  • ✅ Prioritized remediation recommendations
  • ✅ Compliance mapping (OWASP Top 10)
  • ✅ Evidence with exact commands and output

Total time from start to final report: ~15 minutes

Traditional approach: Would have taken 2-3 hours

Real Findings (Anonymized)

Here's what the assessment uncovered:

🟢 Strong Security Controls (Good News)

  1. HTTPS Enforcement: Perfect implementation with 426 status code
  2. Security Headers: Comprehensive set of modern security headers
  3. File Access Controls: All sensitive files (.git, .env, .svn) properly blocked with 403
  4. Cookie Security: HttpOnly, Secure, and SameSite flags properly set
  5. SSL/TLS: Valid certificate, HTTP/2 enabled

🟡 Medium Priority Issues

  1. CSP 'unsafe-inline'

    • Both script-src and style-src allow inline scripts
    • Reduces XSS protection effectiveness
    • Recommendation: Remove 'unsafe-inline', use nonce-only approach
  2. Port 4200 References

    • Redirects expose internal development port
    • Suggests nginx misconfiguration
    • Recommendation: Fix asset serving configuration
  3. Development Environment Exposure

    • Domain clearly marked as "dev"
    • robots.txt confirms staging environment
    • Recommendation: Implement IP whitelisting or VPN access

🟢 Low Priority Observations

  1. Broad CSP Domain Trust: Multiple Azure services in allow-list
  2. Server Header Exposure: nginx version visible
  3. Certificate Expiration: Valid for 2 more months

Overall Risk Rating: LOW-MEDIUM

The application has solid security fundamentals with room for CSP hardening and access control improvements.

The Real Value: Beyond Tool Execution

Here's what makes this approach truly powerful—it's not just about running tools faster. It's about:

1. Intelligent Analysis

Claude doesn't just execute commands; it understands security concepts:

  • Recognizes what 'unsafe-inline' means for CSP
  • Knows that port 4200 is an Angular dev server
  • Understands the relationship between findings
  • Prioritizes issues based on actual risk

2. Contextual Reasoning

When Claude found the port 4200 reference, it didn't stop there:

  • Tested if the port was accessible
  • Explained what port 4200 typically indicates
  • Suggested multiple potential causes
  • Recommended specific fixes

3. Adaptive Methodology

The assessment flow was dynamic:

  • Started with broad reconnaissance
  • Dove deeper based on findings
  • Connected related issues
  • Adjusted scan parameters based on results

4. Knowledge Transfer

Every step was explained:

  • Why each tool was chosen
  • What the output means
  • How findings relate to security principles
  • What the business impact is

This makes it perfect for training junior security analysts.

Practical Applications

This approach works incredibly well for:

1. Continuous Security Testing

Integrate AI-assisted scanning into CI/CD pipelines:

# In your CI/CD pipeline
- name: Security Scan
  run: |
    claude-security-scan --target $STAGING_URL \
                         --output security-report.md
Enter fullscreen mode Exit fullscreen mode

2. Compliance Audits

"Check this application against OWASP Top 10 and generate a compliance report"

3. Security Training

Junior analysts can learn by watching Claude's methodology:

  • Which tools to use when
  • How to interpret results
  • What findings matter most
  • How to communicate risk

4. Bug Bounty Hunting

Accelerate reconnaissance phase:

  • Quick subdomain enumeration
  • Technology fingerprinting
  • Common vulnerability checks
  • Automated documentation

5. Red Team Exercises

Chain complex attack scenarios:
"Enumerate subdomains, identify web applications, scan for vulnerabilities,
and generate target priority list"

The Limitations (Let's Be Honest)

This approach isn't perfect. Here's what it doesn't do:

❌ Complex Exploitation

Claude can identify vulnerabilities but won't automatically exploit them. SQLi, XSS, and RCE still require human expertise.

❌ Social Engineering

No AI assistance for phishing, pretexting, or physical security testing.

❌ Zero-Day Discovery

This accelerates known vulnerability scanning, not novel vulnerability research.

❌ Replace Critical Thinking

AI amplifies human skills; it doesn't replace security expertise and judgment.

❌ Handle Authentication

Complex authenticated scanning still requires manual session management.

Ethical Considerations

Let me be crystal clear: Always get explicit written authorization before security testing.

This technology makes scanning easier, which also means it's easier to accidentally (or intentionally) test unauthorized systems.

Golden rules:

  1. ✅ Get written permission before ANY security testing
  2. ✅ Stay within authorized scope
  3. ✅ Document everything
  4. ✅ Report findings responsibly
  5. ✅ Anonymize data when sharing publicly
  6. ❌ Never test production systems without approval
  7. ❌ Never share sensitive findings publicly

Unauthorized security testing is illegal in most jurisdictions. Don't be that person.

Setting It Up Yourself

Want to try this? Here's how to get started:

Prerequisites

  • Claude Desktop (or API access)
  • Docker (for Kali Linux container)
  • Basic understanding of security tools
  • Authorization for a test environment

Quick Start

# 1. Clone the Kali MCP server
git clone https://github.com/[kali-mcp-server]

# 2. Build the Docker container
cd kali-mcp-server
docker build -t kali-mcp .

# 3. Run the server
docker run -d -p 3000:3000 kali-mcp

# 4. Configure Claude Desktop
# Add MCP server configuration to settings

# 5. Start testing!
# Open Claude Desktop and start conversing
Enter fullscreen mode Exit fullscreen mode

(Note: URLs anonymized for security. Search for "Kali MCP Server" or "MCP penetration testing" for actual repositories)

The Future of Security Testing

This is just the beginning. Here's where I see this going:

Short Term (Now - 6 months)

  • Integration with more specialized tools (Burp Suite, Metasploit)
  • Automated exploit validation
  • Real-time vulnerability database lookups
  • Custom security workflow automation

Medium Term (6-18 months)

  • AI-assisted exploit development
  • Automated threat modeling
  • Intelligent false positive filtering
  • Natural language security policies

Long Term (18+ months)

  • Autonomous security testing agents
  • AI-powered red team exercises
  • Predictive vulnerability analysis
  • Self-healing security systems

My Take: Augmentation, Not Replacement

Here's the bottom line: AI won't replace security professionals.

But security professionals who use AI will replace those who don't.

This technology handles the tedious parts:

  • Tool execution
  • Output parsing
  • Report generation
  • Documentation

While we focus on the parts that require human expertise:

  • Critical thinking
  • Exploit development
  • Business context
  • Strategic recommendations
  • Client communication

The future of offensive security is collaborative—humans and AI working together.

Conclusion: The Paradigm Shift

Going from traditional pen testing to AI-assisted security assessment feels like going from punch cards to a modern IDE. The fundamental skills are the same, but the experience is night and day.

What took 3 hours now takes 15 minutes.

What required deep tool knowledge now requires clear communication.

What was tedious documentation is now automatic.

This isn't about making security testing easier (though it does). It's about making it better, faster, and more consistent.

If you're in DevSecOps, offensive security, or security research, I highly recommend exploring AI-assisted workflows. Start small—automate one part of your process—and expand from there.

The tools are ready. The technology works. The only question is: Are you ready to adapt?


Resources & Further Reading

Tools Mentioned:

  • Claude Desktop / Claude API
  • Kali Linux
  • nmap, ffuf, whatweb, curl
  • Model Context Protocol (MCP)

Learning Resources:

  • OWASP Testing Guide
  • Model Context Protocol Documentation
  • Kali Linux Documentation
  • AI Safety in Security Testing

Communities:

  • r/netsec
  • HackerOne Community
  • AI Security Research Groups

About This Article

This assessment was performed on an authorized test environment with explicit permission. All identifying information has been anonymized. The findings and methodology shared here are for educational purposes.

Questions? Comments? Drop them below or connect with me on LinkedIn.

Found this useful? Share it with your security team and help spread knowledge about AI-assisted security testing.

#cybersecurity #ai #security #devops #testing #automation #tutorial #linux #webdev #cloudcomputing
Enter fullscreen mode Exit fullscreen mode

Top comments (0)