DEV Community

Cover image for Giving an AI agent a recon toolbox: wiring 30+ security tools into an MCP server
David McHale
David McHale

Posted on

Giving an AI agent a recon toolbox: wiring 30+ security tools into an MCP server

If you've watched a junior pen-tester spend a Monday morning typing the same
six commands into a fresh EC2 box, you've seen the recon setup tax up close.
amass enum -passive -d $TARGET, subfinder -d $TARGET -silent, pipe to
httpx, pipe to naabu, feed surviving hosts into nuclei, dump JSON
somewhere, repeat next quarter when the scope changes.

The work isn't hard. The glue is. Every team I've talked to has rebuilt this
glue at least twice, usually in a different language each time.

This post is about a different shape of the problem: what happens when you
stop writing the glue yourself and instead expose the recon toolbox as
MCP tools that an AI agent can call?

Why MCP, specifically

Agents have been doing "tool use" for a couple of years now via bespoke
function-calling adapters. The problem with those adapters is that every
agent framework wants its own JSON shape, every tool needs its own auth, and
every team writes its own retry/timeout/rate-limit middleware.

MCP (Model Context Protocol) collapses
all of that into one server-side contract. Once your tools are MCP tools,
any compliant client — Claude Desktop, Cursor, your own LangGraph agent —
can drive them.

For recon, the value is asymmetric. Recon is one of the rare security
workflows that's iterative and branching:

enumerate subdomains → resolve → port-scan live hosts →
fingerprint services → run targeted vuln checks → pivot to new assets →
loop
Enter fullscreen mode Exit fullscreen mode

That loop is exactly the shape an LLM is good at orchestrating, provided
the tools return structured data and the agent can hold the inventory in
state. You don't want the LLM running nmap. You want it deciding when
to run nmap and on what.

What we wrapped

In HailBytes ASM (full disclosure, this is our product — built specifically
for pen-test firms and MSSPs), the MCP server exposes the same surface as
the REST API:

  • Discovery: start_subdomain_scan, start_port_scan, start_dns_scan
  • Vulnerability: start_nuclei_scan, start_template_scan
  • Inventory: list_assets, get_asset_history, diff_scans
  • Reporting: export_findings, get_scan_summary

Each tool returns JSON with stable schemas — not log scrapes — so the agent
can plan multi-step workflows without the model having to parse stderr.

A working loop

A real session looks like this (paraphrased from one of our internal eval
runs):

User: "Map the external attack surface for example.com and flag anything
       that looks like an exposed staging environment."

Agent → start_subdomain_scan(domain="example.com")
Agent → list_assets(scan_id=...)  // 312 hosts
Agent → start_port_scan(targets=[...], top_ports=1000)
Agent → start_nuclei_scan(targets=live_hosts, severity=["medium","high"])
Agent → list_assets(filter="hostname matches /staging|stg|dev|qa/")
Agent → get_asset_history(asset_id=...)  // appeared 6 days ago
Agent → "Found 4 hosts matching staging-like patterns; one
        (stg-admin.example.com) appeared 6 days ago and exposes a Jenkins
        instance with a known CVE..."
Enter fullscreen mode Exit fullscreen mode

The interesting part is what the agent doesn't do: it doesn't shell out,
doesn't manage AWS credentials, doesn't worry about rate limits, doesn't
re-implement scan diffing. The MCP tools take care of all of that. The
agent's job is the part that's actually hard — choosing the next action.

What broke (and what we changed)

A few honest notes from running this at customer sites:

  1. Pagination kills agents. Our first cut returned all assets in a single response. With 30k+ subdomains in a real engagement, the agent's context filled up before it got to the analysis step. We added cursor pagination and a summarize_assets tool that returns aggregates.
  2. Implicit state is hostile. Agents are bad at remembering "the most recent scan." Every tool that takes a scan_id now requires it explicitly, even if there's only ever one running.
  3. Long-running scans need a status protocol. Recon scans take minutes to hours. We added wait_for_scan(scan_id, timeout) so the agent can block politely instead of polling in a tight loop.

Where this fits

If you're already running recon in-house, you don't need to buy anything to
try this pattern — wrap your own scripts in an MCP server and you'll get
70% of the value. The harder parts are the things that show up at
production scale: scan diffing, asset deduplication across runs, multi-
tenant isolation, scheduled cadence, audit trails for compliance. That's
the part we've spent the last year on.

If you want to see it end-to-end, the platform is at
hailbytes.com/asm — deploys from the AWS or
Azure Marketplace, runs in your account, and exposes the MCP endpoint out
of the box.

Either way, I think MCP-native security tooling is going to be the
default within 18 months. The gap between "agent can read a Splunk
dashboard" and "agent can drive a recon engagement" is closing fast, and
the teams that wire their own toolbox up early are going to have a real
edge.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.