LLM Hacking Tutorial — How Security Researchers Break Language Models (2026)

#redteamtutorial #redteaming #securityassessment #securityresearch

📰 Originally published on Securityelites — AI Red Team Education — the canonical, fully-updated version of this article.

⚠️ Authorised Testing Only: Every technique in this tutorial applies to authorised targets only — your own local models, dedicated practice platforms (Gandalf, HackAPrompt), or systems where you have written authorisation. Running these techniques against systems you don’t own is illegal. This is a professional security research tutorial, not an attack guide.

The first time I ran a proper LLM security assessment, I used no methodology at all. I just started sending prompts and hoping something interesting happened. Three hours later I had a pile of inconsistent results, half of which I couldn’t reproduce, and a vague sense that something was probably vulnerable but I couldn’t prove it. That’s not a security assessment. That’s expensive guessing.

The methodology I’m about to walk you through is what I’ve converged on after two years and dozens of LLM assessments. Six stages. Each stage builds on the last. The outputs of stage 1 inform the tests in stage 2. By the time you reach stage 6 — automated scanning — you’re not running Garak against a system you barely understand. You’re running it against a system you’ve already mapped manually, which means you know how to interpret every output it produces.

Every payload in this tutorial is real. Every command is tested. Every expected output is based on what I’ve actually seen in production LLM deployments. This is how security researchers break language models in 2026.

🎯 What This Tutorial Covers

All 6 stages of a structured LLM security assessment, in the order I run them
Real payloads for prompt injection, system prompt extraction, and jailbreaking
The automated scanning layer (Garak) and how manual + automated work together
How to document LLM findings in professional format with statistical evidence
Common mistakes at each stage and exactly how to avoid them

⏱ 30 min read · 3 exercises included What You Need: Python 3.9+ · Ollama installed with llama3.1 pulled · pip (for Garak) · A Burp Suite Community account · Gandalf.lakera.ai (free) · Read How to Hack AI Models first for the attack surface context ### LLM Hacking Tutorial — Complete 6-Stage Guide 1. Pre-Assessment: Understanding Your Target 2. Stage 1 — Reconnaissance 3. Stage 2 — Basic Prompt Injection Testing 4. Stage 3 — System Prompt Extraction 5. Stage 4 — Jailbreaking Techniques 6. Stage 5 — Automated Scanning 7. Stage 6 — Documentation and Reporting This tutorial is the practical complement to the AI vs Traditional Red Team comparison — here we execute the methodology we discussed there. The tools I use throughout are covered in depth in the AI hacking tools guide. All of that content sits within the AI Elite Hub curriculum — complete the hub articles in sequence and this tutorial connects the dots between the theoretical framework and hands-on practice.

Pre-Assessment: Understanding Your Target Before You Touch It

Every assessment I run starts with 30 minutes of passive observation before I send a single adversarial payload. I want to understand how the application behaves normally — what kinds of inputs it expects, what its apparent purpose is, what model it seems to be using, and what constraints its responses suggest are in place. You can’t attack what you don’t understand.

Questions I answer before sending my first payload: What model is this running? Is it a base model or instruction-tuned? What does the system prompt appear to constrain based on normal responses? What tools or integrations does it seem to have access to? What user data does it appear to have access to? Is this a stateless interaction or does it maintain conversation history?

Most of these answers come from just using the application normally. Read the response patterns. Look for behavioural signatures that suggest specific models — GPT-4 has different response tendencies than Claude 3 or Llama 3. Notice what topics get careful, hedged responses vs direct ones — that pattern maps the safety filter coverage. It takes 15 minutes and it shapes every subsequent test decision.

Stage 1 — Reconnaissance on LLM Applications

The recon stage maps the technical attack surface before I start testing injection. I want to know: what API endpoints exist, what headers are exposed, what error messages reveal about the infrastructure, and what publicly available information exists about the deployment.

STAGE 1 — API RECONNAISSANCE Copy

Probe API endpoint structure

curl -X OPTIONS https://target-ai-app.com/api/v1/chat -v 2>&1 | grep -E “Allow|Access-Control|X-“
HTTP/2 200
Access-Control-Allow-Origin: *
X-Powered-By: Express/4.18.2

Retrieve error message patterns

curl -X POST https://target-ai-app.com/api/v1/chat -H “Content-Type: application/json” -d ‘{}’
{“error”:”Missing required field: messages”,”model”:”gpt-4-turbo-preview”}

Model version confirmed from error response — note for documentation

The error message above is gold. It reveals the model version directly. I’ve confirmed model versions, API key prefixes (which identify the provider), internal system configurations, and rate limit structures all through error message analysis alone. Treat every error as a potential disclosure.

securityelites.com

BURP SUITE — LLM API FINGERPRINTING OUTPUT
Intercepted request headers reveal:

📖 Read the complete guide on Securityelites — AI Red Team Education

This article continues with deeper technical detail, screenshots, code samples, and an interactive lab walk-through. Read the full article on Securityelites — AI Red Team Education →

This article was originally written and published by the Securityelites — AI Red Team Education team. For more cybersecurity tutorials, ethical hacking guides, and CTF walk-throughs, visit Securityelites — AI Red Team Education.