Ali Cheaib

Posted on Mar 25

We Built an Open-Source Framework to Run All 42 OWASP AI Security Tests . Here's What We Found

#cybersecurity #ai

AI security testing is no longer optional. The EU AI Act deadline is August 2, 2026. OWASP published the Agentic AI Top 10 in December 2025. And the most popular open-source LLM testing tool just got acquired by OpenAI.

We needed a vendor-neutral alternative. So we built Tessera — an open-source framework that runs 42 automated OWASP security tests against any AI model or agent.

The Problem

The AI security tool landscape is fragmented:

Garak: LLM probes only — no CV, no infrastructure, no data governance, no agentic AI
Promptfoo: Now OpenAI-owned — not vendor-neutral for testing OpenAI models
HiddenLayer / Protect AI: Proprietary SaaS — not self-hosted, not extensible

None of them cover the full OWASP attack surface. None of them test agentic AI systems. None of them generate EU AI Act compliance reports.

What Tessera Does

42 automated security tests across 5 OWASP categories:

Category	Tests	What It Covers
MOD — Model Security	7	Adversarial attacks, poisoning, model inversion, alignment
APP — Application Security	14	Prompt injection, hallucination, bias, toxic output, extraction
INF — Infrastructure	6	Supply chain, API security, resource exhaustion, GPU isolation
DAT — Data Governance	5	PII leakage, consent, right to erasure, data minimization
AGT — Agentic AI Security	10	Goal hijacking, tool misuse, rogue agents, cascading failures

Every test follows a 3-phase methodology: Attack (simulate the threat) → Measure (quantify with threshold-based scoring) → Defend (validate mitigations).

The Agentic AI Tests

This is where it gets interesting. The OWASP Top 10 for Agentic Applications (ASI 2026) defines 10 risks specific to AI agents — systems that use tools, make decisions, and operate autonomously. Nobody had a complete implementation. Until now.

Test	What It Does
AGT-01 Agent Supply Chain	Tests for malicious tool injection and dependency tampering
AGT-02 Tool Misuse	Unauthorized tool invocation and parameter manipulation
AGT-03 Goal Hijacking	Objective manipulation and task redirection attacks
AGT-04 Memory Poisoning	Context window injection and state manipulation
AGT-05 Identity & Privilege Abuse	Identity spoofing and privilege escalation
AGT-06 Code Execution	Code injection and sandbox escape attempts
AGT-07 Inter-Agent Comms	Message tampering and replay attacks
AGT-08 Cascading Failures	Error amplification and retry storms
AGT-09 Trust Exploitation	False urgency and authority impersonation
AGT-10 Rogue Agents	Covert goals and self-replication detection

Quick Start

pip install tessera-ai
tessera --init

The --init wizard auto-detects your AI providers (OpenAI, Anthropic, Ollama, vLLM) and gets you scanning in under 60 seconds.

Scan an MCP Server

tessera --scan-mcp https://your-mcp-server.com/v1 --api-key $KEY

Generate EU AI Act Compliance Report

tessera --config config.yaml --format compliance

Maps all 42 tests to specific EU AI Act articles (9, 10, 13, 14, 15).

Benchmark Results

We tested the top 5 AI models against all applicable OWASP tests:

Model	Score	PASS	WARN	FAIL
Anthropic Sonnet 3.5	100%	15	0	0
GPT-4o	87%	11	4	0
Gemini 1.5 Pro	87%	11	4	0
Mistral Large	73%	8	7	0
Llama 3 70B	40%	4	8	3

Architecture

Tessera is more than a CLI tool. It's a full platform:

CLI: Zero infrastructure, pip install and go
API Server: FastAPI with WebSocket scan progress
Web Dashboard: React 18 + TypeScript + TailwindCSS
Workers: Celery + Redis for async scans
Database: PostgreSQL with Alembic migrations
Kubernetes: Helm chart with HPA
14 Connectors: OpenAI, Anthropic, Google, Ollama, vLLM, AWS Bedrock, Azure, HuggingFace, MCP, and more

Why Open Source Matters Here

If you're auditing OpenAI models with an OpenAI-owned tool, that's not independent security testing. AI security testing needs to be:

Vendor-neutral — not owned by a model provider
Self-hosted — your security data stays on your infrastructure
Extensible — you can add tests for your specific use case
Transparent — you can audit the testing methodology itself

Tessera is Apache 2.0. No call-home. No vendor lock-in. No DRM.

What's Next

SARIF output for GitHub/GitLab Security tab integration
RAG pipeline testing (retriever poisoning, context window attacks)
Multimodal model support
Plugin architecture for community-contributed tests

Try It

pip install tessera-ai
tessera --init

GitHub: github.com/tessera-ops/tessera
PyPI: pypi.org/project/tessera-ai

Star the repo if this is useful. We're building the vendor-neutral standard for AI security testing.

DEV Community