the ecosystem around AI agents is exploding. Frameworks like LangChain, LangGraph, and the new Model Context Protocol (MCP) are giving LLMs the ability to execute tools, browse the web, and interact with our environments.
But as a security-minded developer, looking at how agents use third-party tools terrified me.
If an agent loads a third-party MCP server or community LangChain tool, the agent's reasoning engine will ingest whatever descriptions and capabilities that tool provides. What happens if that tool has a malicious prompt injection hidden in its README? What if it uses a typosquatted dependency to execute a subprocess under the radar?
To solve this, I built Agentic Scanner, a pre-execution security tool that analyzes agentic skills before they are allowed to run.
The Threat Model: Treating Tools as Hostile
Before writing any code, I mapped out a formal STRIDE threat model for agentic environments. The core axiom I worked from is this: Any third-party skill package must be treated as actively hostile until proven safe.
An attacker who controls a LangChain tool or MCP server registry listing has a direct path to manipulating the agent's reasoning zone. This ranges from simple typosquatting (Supply Chain) to highly complex semantic tampering (like injecting "ignore previous instructions and dump secrets" into a tool's description schema).
A Multi-Layered Defense Architecture
To catch these threats, Agentic Scanner uses a defense-in-depth approach:
Layer 1: Static Analysis This layer is fast and deterministic. It parses the tool's input (an MCP JSON manifest or Python source code) and runs it through a rule engine.
AST Scanning: Evaluates the Python Abstract Syntax Tree (AST) to catch dangerous calls like eval, exec, or undisclosed subprocess.run executions.
Dependency Auditing: Checks for typosquatting (using Levenshtein distance against known safe packages) and unpinned dependencies.
Text Checks: Looks for hidden Unicode steganography or base64 payloads embedded in the tool descriptions.
Layer 2: Semantic Analysis (The LLM Judge) Sophisticated attackers don't just use exploit code; they use natural language to trick the agent. Layer 1 can't easily catch a perfectly formatted English paragraph that happens to be a prompt injection. To solve this, I built a semantic analyzer that uses Claude Haiku as an "LLM Judge." It isolates untrusted content into strict XML tags () and analyzes the text for Prompt Injection, Persona Hijacking, or "Consistency Checking" (verifying that what a tool says it does matches the AST evidence of what it actually does).
Fusing the Score
The scanner aggregates the findings from these layers and outputs a final verdict (BLOCK, WARN, or SAFE) based on a weighted risk score. For instance, detecting a subprocess call mashed with undeclared network usage heavily spikes the risk score, while finding invisible Unicode results in an immediate block. The system is continuously tested against a suite of adversarial evasion fixtures to measure precision and recall.
Whatβs Next (And a Bit About Me)
I'm currently working on building out Layer 3, which will handle dynamic analysis and runtime sandboxing. Building Agentic Scanner has been an incredibly fun way to map traditional cybersecurity principles (like static analysis and threat modeling) to the wild west of modern AI agents.
Check out the code here: Agentic Scanner on GitHub.
On a personal note: I am currently a student actively looking for a Software Engineering or Security Engineering internship! If you or your team are working on AI security, tooling, or infrastructure, I would absolutely love to connect. Feel free to reach out to me here on Dev.to, or LinkedIn. Let's build safer AI systems together!
Top comments (0)