Andrei

Posted on Mar 16

I Tested 50 AI App Prompts for Injection Attacks. 90% Scored CRITICAL.

#security #ai #llm #webdev

So I spent last week doing something slightly unhinged. I pulled 50 system prompts out of public AI app repos on GitHub — just sitting there in the code, plain text — and ran every single one through a prompt injection scanner.

The average score was 3.7 out of 100.

Median? Zero.

35 out of 50 had no defenses at all. Not weak defenses. Not "could be better" defenses. Literally nothing.

How I got here

Last week I published results from scanning 100 vibe-coded apps for the usual security stuff — XSS, exposed secrets, missing auth. That was bad enough. But while I was going through those repos, I kept tripping over the same thing: system prompts just... sitting there. Zero guardrails. Not even a basic "don't reveal your instructions" line. Raw instructions to an LLM with zero thought given to what happens when a user decides to be creative with their input.

I couldn't stop thinking about it. So I made it a project.

Grabbed 50 AI-powered apps from public GitHub repos — chatbots, coding assistants, productivity tools, API agents — extracted their system prompts, ran each one through a scanner that tests 10 attack categories based on OWASP LLM Top 10 (specifically LLM01: Prompt Injection).

The 10 categories:

System Prompt Extraction
Role Override
Delimiter Escape
Indirect Injection
Output Manipulation
Tool/Function Abuse
Context Window Overflow
Encoding Bypass
Social Engineering
Multi-turn Escalation

Score goes from 0 to 100. Higher is better. 100 means you're defended on every vector. Zero means the prompt might as well not exist.

The numbers

Metric	Value
Apps tested	50
Average score	3.7/100
Median score	0/100
Highest score	28/100
Apps scoring 0/100	35 (70%)
Apps scoring ≤10/100	43 (86%)
Apps scoring ≤20/100	47 (94%)
CRITICAL severity	45 (90%)
HIGH severity	5 (10%)

Nobody passed. The best score across all 50 apps was 28/100, which is still HIGH severity. Still a fail.

90% got rated CRITICAL.

What CRITICAL actually looks like in the wild

A code interpreter — Score: 0/100

The entire system prompt: "write Python code to answer the question"

162 characters. That's the whole thing. No role boundaries, no output restrictions, nothing. You could tell it to ignore its instructions and recite limericks and it would just... do that. You could ask it to dump its own prompt and it'd hand it right over. I keep calling these "vulnerabilities" but there's nothing there to be vulnerable. It's a void.

A Google Sheets integration — Score: 0/100

It connects an LLM to Google Sheets. Zero prompt injection defenses. So any cell value in your spreadsheet could contain an injection payload. Someone shares a spreadsheet with you, you open it with this tool, and now a cell in row 47 is telling the LLM what to do instead of your system prompt. Your spreadsheet is the attack surface. Wild.

Then there was a subscription tracker. Also zero. Its entire security posture was format instructions — telling the model what shape the response should be. That's it. The whole defense. Against an attacker who literally just has to type "ignore previous formatting."

A Cloudflare API agent — Score: 5/100

An AI agent that talks to the Cloudflare API. Five points out of a hundred. It had some structure — enough to not score zero, I guess — but nothing that would slow down even a lazy attacker. This thing has API access. To your infrastructure. Five points.

The "best" prompt was still bad

A learning companion app — Score: 28/100

Highest score in the dataset. Had some role definition, some behavioral constraints. Enough to block the most obvious "ignore all previous instructions" stuff. But 28/100 means most attack categories still got through — role override, encoding bypass, multi-turn escalation, all still worked fine.

28 was the ceiling. The best anyone managed. Not close to good enough.

A terminal assistant — Score: 16/100

This one's kind of funny (in a grim way). It got 16 points not because anyone was thinking about injection defense, but because its output format restrictions happened to accidentally block one attack vector. Couple other apps in the dataset had this too — a few accidental points from constraints that were never meant to be security measures. Accidental security is not a strategy.

Why this keeps happening

Most devs building AI apps don't think about prompt injection because the prompt doesn't feel like a security boundary. It feels like config. You write "You are a helpful assistant that..." and move on to the actual code. The interesting code. The UI, the API integration, the database schema. The prompt is an afterthought — last thing you write before you ship.

I get it.

But that prompt is the only thing separating user input from model behavior. It IS the security boundary, whether it looks like one or not. Prompt injection is OWASP LLM01 for a reason — it's the most common vulnerability class in LLM apps and the easiest to pull off.

70% of the apps I tested had zero defense against it. Not "weak." Zero.

And these apps aren't toys. People are shipping AI tools that connect to APIs, read files, access databases, send emails. The prompt is the one barrier between a malicious input and all those capabilities. In 35 out of 50 cases, there was no barrier.

What you can actually do about it

Not gonna pretend this is simple — prompt injection defense is hard and the attacks keep changing. But there are basics, and almost every app in this dataset skipped all of them.

Start with role anchoring. Define what the model can and can't do. Repeat it. Not just a line at the top — reinforce it throughout the prompt. Models have short attention spans (sort of) and a single instruction at the beginning gets drowned out by a long conversation. Pair that with input/output boundaries — use delimiters, tell the model explicitly that user input is data, not instructions. Will a determined attacker try to escape those delimiters? Sure. But you've moved from "zero effort to exploit" to "has to actually think about it," which filters out a lot.

Instruction hierarchy — almost nobody does this and I don't get why. Tell the model explicitly: system instructions beat user input. Always. If there's a conflict, system wins. Put it in those exact words. I've seen maybe two prompts out of 50 that even attempted this.

Then there's the boring-but-necessary layer: refusal patterns and output validation. On the prompt side, tell the model to refuse if someone tries to change its behavior or extract its instructions. On the code side — and this part isn't even LLM-specific — don't blindly trust model output before you hand it to a tool or API. You already sanitize user input (right?). Same thing here.

You won't be bulletproof after this. But you'll go from 0/100 to somewhere defensible. The scanner I built also spits out a hardened version of your prompt after each scan — takes your original instructions and wraps them with these patterns so you don't have to figure out the wording yourself.

Why I built this

I'm a solo indie dev. I built VibeWrench because I kept running into the same security gaps in AI-generated and AI-powered apps and nobody was making it easy to catch them. The prompt injection scanner is one piece of it — paste your system prompt, get scored on all 10 OWASP LLM01 categories, see exactly which attack vectors work against you, get a hardened prompt back.

Free to scan. No signup for a basic scan.

And look — I'm not trying to dunk on anyone whose repo ended up in this dataset. Most of these are side projects, experiments, people learning. I've shipped dumb stuff too. But the patterns I see in hobby repos are the exact same patterns showing up in production apps that handle real user data. Same "just tell the AI what to do" approach. Same empty defenses.

If you're building anything with an LLM — especially if it touches real data or calls real APIs — test your prompt. Takes five minutes. Beats being the example in someone's next blog post about AI security.

vibewrench.dev

Questions about methodology? Think my scoring is wrong? Drop a comment, I'll respond to everything.

— Andrei K.