18 Ways Your LLM App Can Be Hacked (And How to Fix Them)

#security #ai #claude #opensource

You spent weeks building your LLM-powered app. You tested the happy path. Users love it.

But did you ask: what happens when someone tries to break it?

Most teams don't. And that's a problem — because LLM apps have a completely new attack surface that traditional security tools don't cover.

Here are 18 real ways attackers go after LLM systems right now.

Prompt Attacks

1. Direct Prompt Injection
User types instructions that override your system prompt. "Ignore previous instructions and..." — classic. Still works on most apps.

2. Indirect Prompt Injection
Malicious instructions hidden inside documents, emails, or web pages your LLM reads. The user never types anything. The attack comes from your data.

3. Jailbreaking
Role-playing, fictional framing, or encoded text used to bypass your safety guardrails. "Pretend you're DAN..."

4. Prompt Leaking
Attacker tricks the model into revealing your system prompt. Your carefully crafted instructions — exposed.

5. Few-Shot Injection
Attacker poisons the examples inside your prompt to shift model behavior across the entire session.

Memory & Context Attacks

6. Memory Poisoning
In apps with persistent memory, attacker plants false beliefs early. The model carries them forward forever.

7. Context Window Stuffing
Flood the context with noise to push your system instructions out. Model forgets who it's supposed to be.

8. Session Hijacking
Steal or reuse another user's conversation context. Read their history. Impersonate them.

9. Cross-Session Leakage
In multi-tenant setups, one user's data bleeds into another's context. Happens more than people admit.

RAG & Tool Attacks

10. RAG Poisoning
Inject malicious documents into your vector store. When retrieved, they manipulate the model's response.

11. Embedding Inversion
Reconstruct original text from vector embeddings. Your "anonymized" data — reconstructed.

12. Tool Abuse
LLM has access to tools (search, code exec, APIs). Attacker crafts inputs that make the model call tools it shouldn't.

13. SQL / Command Injection via LLM
Model generates queries or shell commands from user input. Classic injection — new delivery method.

Agentic & Supply Chain Attacks

14. Agent Hijacking
In multi-agent systems, one compromised agent issues malicious instructions to others. Trust boundary collapse.

15. Privilege Escalation
Agent starts with limited permissions. Attacker chains tool calls to gain broader system access step by step.

16. Model Supply Chain Attack
You download a fine-tuned model or adapter. It has backdoors baked in. You ship it to production.

17. Plugin / MCP Poisoning
Third-party plugins or MCP servers your LLM connects to are compromised. Your app becomes the delivery mechanism.

Output Attacks

18. Insecure Output Handling
LLM output rendered directly in UI without sanitization. Attacker uses the model to generate XSS payloads, malicious links, or social engineering content.

So What Do You Do?

Security for LLM apps isn't one tool. It's a mindset applied at every layer — prompts, memory, RAG, tools, agents, and output.

I built miii-security: a set of 18 SKILL.md packs that cover every category above. Each skill gives your AI system the context to review, audit, and harden LLM applications — mapped to OWASP and MITRE frameworks.

No 50-page whitepapers. No expensive consultants. Just:

npm i miii-security

Fetch a skill → apply its checks → ship safer.

👉 github.com/maruakshay/mii-ai-security
👉 npmjs.com/package/miii-security

If you're building with LangChain, LlamaIndex, OpenAI APIs, or any agentic framework — this is for you. Star the repo, open issues, tell me what I missed.