I Built a Prompt Injection Detection API From Real Honeypot Data — Now It's on RapidAPI

#ai #webdev #security #machinelearning

A few weeks ago I deployed a honeypot on my server — a fake SKILL.md file sitting on port 8888, designed to attract attackers probing AI agent configurations.

It worked. Real requests started hitting it. Prompt injection attempts. Credential probing. SSRF probes targeting internal metadata endpoints. Code injection patterns.

I'd been logging and classifying them manually for research and content. Then I thought — why not wrap the classifier as an API?

That's what Vigil is.

What It Does

Submit any text payload via POST request. Get back a JSON response with a risk score from 0 to 10, the primary attack type detected, all attack categories found, and an indicator count. No LLM involved. No latency. No per-token cost. Pure pattern matching against real attack signatures.

Six attack categories detected:

Prompt injection — jailbreaks, instruction overrides, system prompt probing
Code injection — eval, exec, subprocess abuse
Path traversal — directory climbing, sensitive file access attempts
SSRF — metadata endpoint probing, internal network scanning
Credential probing — API key fishing, token extraction attempts
XSS — cross-site scripting patterns

Why I Built It This Way

Most threat detection tools in the AI space are LLM-based — they send your payload to another model to evaluate it. That introduces latency, cost per call, and a dependency on another AI system to protect your AI system.

Vigil uses pattern matching against a curated signature library built from real honeypot captures. It runs in milliseconds. It costs nothing per call on my end. And it doesn't require trusting a second LLM with your potentially malicious payload.

Who It's For

Developers building AI agent pipelines who need input validation before tool execution
Security middleware for LLM-powered applications
Audit logging systems that need to flag suspicious inputs
Anyone who wants a fast, cheap sanity check on user-submitted text before it reaches an agent

Example Response

{
  "risk_score": 3.0,
  "risk_level": "medium",
  "primary_attack_type": "prompt_injection",
  "attack_types_detected": ["prompt_injection"],
  "indicator_count": 2,
  "clean": false,
  "analyzed_at": "2026-04-28T04:35:35Z"
}