Prompt injection and jailbreaks aren’t theoretical anymore. They’re in the wild.
Most developers building with LLMs aren’t scanning their logs for red flags like:
• DAN-style jailbreaks
• System prompt leaks
• Hardcoded instructions being bypassed
• Prompt formatting attacks
I’ve been working on a CLI called PromptShield that detects these risks automatically from your logs and files. Think of it as ESLint or Semgrep, but for LLM safety.
What it does:
• Scans .json, .ndjson, .txt logs of prompts/responses
• Detects jailbreaks, prompt injections, prompt leakage, and policy violations
• Filters by severity and category (e.g., --category security, --fail-on=high)
• Outputs clean markdown, JSON, or terminal reports for CI/CD use
I’m looking for 3–5 devs building with LLMs to test it before I publish
It’s not live on npm yet, but fully working.
If you:
• Are building with OpenAI, Claude, Mistral, etc.
• Have logs or prompt templates you want to scan
• Want to validate you’re not shipping unsafe behavior
Drop a comment. I’ll send you private access.

Top comments (0)