DEV Community

Midori
Midori

Posted on

🏰 Agentic Security: Building an Autonomous Pentesting Lab with Watchtow

As developers, we're building faster than ever. But security is often an afterthought—a manual chore we do right before a release.

What if you had a team of AI agents that could think like a pentester, run the tools for you, and analyze the results in real-time?

Meet Watchtower: an open-source, AI-powered framework that uses LangGraph to orchestrate a "Planner-Worker-Analyst" loop for web and network security.

The Problem with Traditional Scanners

Automated scanners like Nmap or Nuclei are great, but they are "dumb." They run a predefined sequence and dump a log file. You then have to:

Parse the logs.
Filter false positives.
Decide which tool to run next based on what you found.
Watchtower reverses this. The AI agents do the thinking, while the CLI tools do the heavy lifting.

Real-World Scenario: The "Launch Week" Sprint

You've just built a new REST API for a fintech project. You're using FastAPI and PostgreSQL. It's Friday afternoon, and you're launching on Monday. You have no time for a professional audit, but you need to know if you've missed something obvious.

Here is how you use Watchtower to secure your API.

The "Batteries Included" Setup

Watchtower supports 23+ security tools out of the box. Running it in Docker ensures you don't have to spend hours installing binaries like ffuf or sqlmap.

docker-compose up -d
docker exec -it ai_pentest_framework /bin/bash
Enter fullscreen mode Exit fullscreen mode

Attacking the Auth Layer

Most critical bugs live behind the login. Watchtower's newest update allows you to inject session metadata directly into the agent's logic.

python -m watchtower.main -t https://api.myproject.io \
  --cookie "session=SESS_ID_HERE" \
  --header "Authorization: Bearer MY_JWT_TOKEN"
Enter fullscreen mode Exit fullscreen mode

Watching the Agents "Think"

Once started, the Planner agent analyzes your target. It's not just running tools; it's strategizing:

"I see a Server: uvicorn header. The target is likely a Python API. I will run arjun to find hidden parameters and sqlmap to test for injection concurrently."

Because of the new Parallel Recon feature, Watchtower doesn't wait. It fires off multiple tools at once, gathering data faster than a human could.

Smart Truncation: No more "Token Soup"

LLMs have context limits. If a scanner dumps 10,000 lines of logs, the AI usually gets "lost."

Watchtower uses Smart Truncation. It scans the logs for keywords like [+], Vulnerability, or Critical and only sends the high-value snippets to the Analyst agent. This reduces costs and drastically improves the accuracy of the findings.

The Results: A Human-Readable Report
At the end of the loop, you don't get a CSV filled with 404 errors. You get a Pentest Report that looks like this:

Wrap Up

Security doesn't have to be a bottleneck. By shifting to Agentic Security, we can automate the "thinking" part of red-teaming, allowing us to catch vulnerabilities before they hit production.

Check out Watchtower on GitHub: https://github.com/fzn0x/watchtower

Have you used AI for security testing yet? Let me know in the comments!

Top comments (0)