vinmay

Posted on Mar 21 • Edited on Mar 23

I built "npm audit" for AI agents

#security #mcp #python #agents

I was adding MCP tools to a project when I realized something uncomfortable: I had no idea what the code I was installing could actually do.

The README said "connects Claude to Blender." What it didn't say was that one of the registered tools passes a raw string parameter to Python's exec() with no builtin restriction. The LLM doesn't get "Blender API access." It gets full Python execution on the host machine.

I wanted a way to know this before running the code. So I built one.

What reachscan does

reachscan is a static analysis CLI for Python and TypeScript/JavaScript AI agent codebases. Point it at a repo, a PyPI package, or an MCP endpoint, and it tells you:

What the code can do (shell exec, file access, network calls, credential access, dynamic code execution)
Which of those capabilities the LLM can actually trigger (reachability analysis)
The exact call path from the LLM entry point to the dangerous code

pip install reachscan

# Scan a GitHub repo
reachscan https://github.com/user/repo

# Scan a PyPI package before installing
reachscan pypi:some-agent-package

# Scan local code
reachscan ./my-agent

That's it. No config, no API keys, no cloud service. It runs offline and produces a report in about 2 seconds.

The problem

When you give an LLM tools, you're granting it real-world capabilities like file access, shell commands, network calls, credential reads. Most frameworks make it easy to add tools and hard to audit what you've exposed.

Here's real code from a popular MCP server:

@mcp.tool()
def execute_blender_code(ctx: Context, code: str) -> str:
    """Execute arbitrary Python code in Blender."""
    blender = get_blender_connection()
    result = blender.send_command("execute_code", {"code": code})

That code: str parameter? It ends up here:

exec(code, {"bpy": bpy})  # No __builtins__ restriction

namespace = {"bpy": bpy} looks like a sandbox. It isn't. Without explicitly setting __builtins__, Python injects the full builtins module. The LLM can import os, run subprocess, read your files — anything.

Here's what reachscan shows for this server:

  DYNAMIC  exec()                  server.py:431  reachable
           path: execute_blender_code → send_command → execute_code

  EXECUTE  subprocess.run()        addon.py:89    reachable

  SEND     requests.post()         server.py:198  reachable
           path: generate_3d_model → _call_api

  SECRETS  os.environ[...]         server.py:12   module_level

The reachable tag is the key part. It means the LLM can trigger this code through a registered tool and not just that the code exists somewhere in the repo. module_level means it runs on import. unreachable means the code exists but no LLM call path leads to it.

How it works (briefly)

Detectors scan the AST for 7 capability categories: EXECUTE, READ, WRITE, SEND, SECRETS, DYNAMIC, AUTONOMY
Entry point detection finds the functions exposed to the LLM — @tool, @mcp.tool(), @function_tool, BaseTool subclasses, etc. across LangChain, OpenAI Agents SDK, MCP, Pydantic AI, CrewAI, Semantic Kernel, and AutoGen
Call graph + BFS traces up to 8 hops from each entry point to determine which capabilities are actually reachable
Every finding gets one of 5 states: reachable, unreachable, module_level, unknown, no_entry_points

The false positive rate is 0.47% across 1,912 labeled findings on 10 real-world repos. I care about this number a lot because a noisy scanner is a useless scanner.

Why I built it

The short version: I was evaluating third-party MCP servers and realized there was no npm audit equivalent for AI agent code. I could run pip audit to check for known vulnerabilities in dependencies, but nothing told me "this package gives the LLM shell access on your machine."

The existing tools I found either:

Require API calls per scan (expensive, not offline)
Produce flat capability lists without reachability context (noisy)
Don't handle the MCP/agent-specific entry point patterns

So I built the tool I wanted.

What it found across 50 real MCP servers

I ran reachscan against 50 of the most popular MCP server repos:

1 in 3 has shell execution capability
1 in 3 has outbound network I/O
1 in 4 accesses credentials from environment variables
10 of 50 had 4+ capabilities active simultaneously

The highest-risk combination: credential access + network egress. That appeared in 8 of 50 repos. If the LLM can read your AWS keys AND make HTTP calls, that's an exfiltration path.

Not all of these are bugs. An AWS MCP server should talk to AWS. The question is whether the LLM can misuse those capabilities — and whether you know about them before you deploy.

Try it

pip install reachscan

# Scan any GitHub repo
reachscan https://github.com/ahujasid/blender-mcp

# Scan a PyPI package before installing
reachscan pypi:openai-agents

# JSON output for CI
reachscan . --json --severity high

Apache 2.0, pure Python, runs offline. No API keys, no cloud service.

If something looks wrong — false positive, missed pattern, bad output — open an issue.

GitHub: vinmay/reachscan
PyPI: reachscan
Full scan results (50 repos): Medium writeup

DEV Community