DEV Community

razashariff
razashariff

Posted on

We Scanned 27 AI Agent Frameworks Against OWASP Agentic AI Top 10 — Here Are the Results

AI agents are everywhere. CrewAI has 45K+ GitHub stars. AutoGPT has 182K+. LangChain sits at 100K+. But here is the question nobody seems to be asking: how secure are these frameworks?

OWASP released the Agentic AI Top 10 in 2025, identifying the most critical security risks in autonomous AI systems. We built a free scanner that checks agent code against all of them.

The results were not great.

The Numbers

We scanned 27 of the most popular agent frameworks and SDKs:

  • 9 FAIL (critical findings -- exec(), os.system(), no sandboxing)
  • 9 WARN (high-severity issues -- supply chain risks, prompt injection vectors)
  • 9 PASS (clean scan)
  • 31 total OWASP violations across all frameworks

The full registry with every framework, verdict, risk score, and OWASP mapping is live at registry.agentsign.dev.

What We Check

12 detection rules, each mapped to a specific OWASP Agentic AI risk:

Rule OWASP Severity What it catches
AS-001 AA-03 CRITICAL Unsafe code execution (exec, eval, os.system)
AS-002 AA-05 HIGH Hardcoded secrets and API keys
AS-003 AA-04 MEDIUM Excessive permissions
AS-004 AA-02 HIGH Prompt injection via file input
AS-005 AA-02 CRITICAL Known injection patterns (SQL, XSS, command)
AS-006 AA-09 HIGH Code execution without sandboxing
AS-007 AA-06 LOW Supply chain without integrity checks
AS-008 AA-01 HIGH Excessive agency / auto-approval
AS-009 AA-07 MEDIUM Unsafe output handling (XSS via agent output)
AS-010 AA-08 MEDIUM Insufficient logging/monitoring
AS-011 AA-10 HIGH Data exfiltration patterns
AS-012 MCP-07 HIGH MCP server without authentication

Notable Findings

Some of the most-starred projects have the most critical findings:

  • Open Interpreter (57K stars): Risk score 80/100. exec(), os.system(), child_process, no sandbox, excessive agency. This is a code agent that runs commands on your machine by design, but the scan flags that there are no isolation mechanisms.

  • AutoGPT (182K stars): Risk score 65/100. exec(), os.system(), no sandbox. The most-starred AI agent framework fails on unsafe code execution.

  • LangChain (100K stars): WARN verdict. Supply chain risks and prompt injection vectors. Not critical, but worth monitoring.

  • Anthropic SDK, Vercel AI SDK, Google ADK: All PASS with clean scans. These frameworks were designed with security constraints from the start.

How to Scan Your Own Agent

No signup. No API key. Three ways to use it:

1. GitHub Action (recommended)

Create .github/workflows/agentsign.yml:

name: AgentSign Security Scan
on: [push, pull_request]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: razashariff/agentsign-action@v1
        with:
          path: '.'
          fail-on: 'FAIL'
Enter fullscreen mode Exit fullscreen mode

Every push and PR gets scanned. FAIL blocks the merge. Outputs include verdict, risk score, and findings count.

2. cURL

curl -X POST https://registry.agentsign.dev/api/scan \
  -H "Content-Type: application/json" \
  -d '{"code": "exec(user_input)", "name": "my-agent"}'
Enter fullscreen mode Exit fullscreen mode

Returns:

{
  "verdict": "FAIL",
  "risk_score": 40,
  "findings": [
    {
      "rule": "AS-001",
      "owasp": "AA-03",
      "severity": "CRITICAL",
      "detail": "Dangerous code patterns: exec()"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

3. Shields.io Badge

Add a live security badge to your README:

![AgentSign](https://img.shields.io/endpoint?url=https://registry.agentsign.dev/api/badge/YOUR-AGENT-NAME)
Enter fullscreen mode Exit fullscreen mode

PASS = green. WARN = yellow. FAIL = red. Cached 5 minutes.

API Endpoints

All public, all free, rate-limited at 30 req/min:

Method Endpoint Description
POST /api/scan Scan code against 12 OWASP rules (max 50KB)
GET /api/badge/:name Shields.io-compatible badge endpoint
GET /api/rules/version Current rules version and count
GET /api/registry Full registry as JSON

Why This Matters

The OWASP Agentic AI Top 10 exists because these are real attack vectors. Agents that call exec() without sandboxing can be hijacked through prompt injection. Agents with hardcoded secrets leak them. Agents without logging leave no audit trail.

As agents get more autonomous -- booking flights, writing code, managing infrastructure -- the blast radius of a compromised agent grows. Static analysis is not a silver bullet, but it is the minimum. If your agent framework fails basic pattern matching against known risks, that is worth knowing.

Links


Feedback, issues, or want your framework rescanned? Open an issue or reach out at contact@agentsign.dev.

Top comments (0)