DEV Community

Ali Cheaib
Ali Cheaib

Posted on

We Built an Open-Source Framework to Run All 42 OWASP AI Security Tests . Here's What We Found

AI security testing is no longer optional. The EU AI Act deadline is August 2, 2026. OWASP published the Agentic AI Top 10 in December 2025. And the most popular open-source LLM testing tool just got acquired by OpenAI.

We needed a vendor-neutral alternative. So we built Tessera — an open-source framework that runs 42 automated OWASP security tests against any AI model or agent.

The Problem

The AI security tool landscape is fragmented:

  • Garak: LLM probes only — no CV, no infrastructure, no data governance, no agentic AI
  • Promptfoo: Now OpenAI-owned — not vendor-neutral for testing OpenAI models
  • HiddenLayer / Protect AI: Proprietary SaaS — not self-hosted, not extensible

None of them cover the full OWASP attack surface. None of them test agentic AI systems. None of them generate EU AI Act compliance reports.

What Tessera Does

42 automated security tests across 5 OWASP categories:

Category Tests What It Covers
MOD — Model Security 7 Adversarial attacks, poisoning, model inversion, alignment
APP — Application Security 14 Prompt injection, hallucination, bias, toxic output, extraction
INF — Infrastructure 6 Supply chain, API security, resource exhaustion, GPU isolation
DAT — Data Governance 5 PII leakage, consent, right to erasure, data minimization
AGT — Agentic AI Security 10 Goal hijacking, tool misuse, rogue agents, cascading failures

Every test follows a 3-phase methodology: Attack (simulate the threat) → Measure (quantify with threshold-based scoring) → Defend (validate mitigations).

The Agentic AI Tests

This is where it gets interesting. The OWASP Top 10 for Agentic Applications (ASI 2026) defines 10 risks specific to AI agents — systems that use tools, make decisions, and operate autonomously. Nobody had a complete implementation. Until now.

Test What It Does
AGT-01 Agent Supply Chain Tests for malicious tool injection and dependency tampering
AGT-02 Tool Misuse Unauthorized tool invocation and parameter manipulation
AGT-03 Goal Hijacking Objective manipulation and task redirection attacks
AGT-04 Memory Poisoning Context window injection and state manipulation
AGT-05 Identity & Privilege Abuse Identity spoofing and privilege escalation
AGT-06 Code Execution Code injection and sandbox escape attempts
AGT-07 Inter-Agent Comms Message tampering and replay attacks
AGT-08 Cascading Failures Error amplification and retry storms
AGT-09 Trust Exploitation False urgency and authority impersonation
AGT-10 Rogue Agents Covert goals and self-replication detection

Quick Start

pip install tessera-ai
tessera --init
Enter fullscreen mode Exit fullscreen mode

The --init wizard auto-detects your AI providers (OpenAI, Anthropic, Ollama, vLLM) and gets you scanning in under 60 seconds.

Scan an MCP Server

tessera --scan-mcp https://your-mcp-server.com/v1 --api-key $KEY
Enter fullscreen mode Exit fullscreen mode

Generate EU AI Act Compliance Report

tessera --config config.yaml --format compliance
Enter fullscreen mode Exit fullscreen mode

Maps all 42 tests to specific EU AI Act articles (9, 10, 13, 14, 15).

Benchmark Results

We tested the top 5 AI models against all applicable OWASP tests:

Model Score PASS WARN FAIL
Anthropic Sonnet 3.5 100% 15 0 0
GPT-4o 87% 11 4 0
Gemini 1.5 Pro 87% 11 4 0
Mistral Large 73% 8 7 0
Llama 3 70B 40% 4 8 3

Architecture

Tessera is more than a CLI tool. It's a full platform:

  • CLI: Zero infrastructure, pip install and go
  • API Server: FastAPI with WebSocket scan progress
  • Web Dashboard: React 18 + TypeScript + TailwindCSS
  • Workers: Celery + Redis for async scans
  • Database: PostgreSQL with Alembic migrations
  • Kubernetes: Helm chart with HPA
  • 14 Connectors: OpenAI, Anthropic, Google, Ollama, vLLM, AWS Bedrock, Azure, HuggingFace, MCP, and more

Why Open Source Matters Here

If you're auditing OpenAI models with an OpenAI-owned tool, that's not independent security testing. AI security testing needs to be:

  1. Vendor-neutral — not owned by a model provider
  2. Self-hosted — your security data stays on your infrastructure
  3. Extensible — you can add tests for your specific use case
  4. Transparent — you can audit the testing methodology itself

Tessera is Apache 2.0. No call-home. No vendor lock-in. No DRM.

What's Next

  • SARIF output for GitHub/GitLab Security tab integration
  • RAG pipeline testing (retriever poisoning, context window attacks)
  • Multimodal model support
  • Plugin architecture for community-contributed tests

Try It

pip install tessera-ai
tessera --init
Enter fullscreen mode Exit fullscreen mode

GitHub: github.com/tessera-ops/tessera
PyPI: pypi.org/project/tessera-ai

Star the repo if this is useful. We're building the vendor-neutral standard for AI security testing.

Top comments (0)