π° Originally published on Securityelites β AI Red Team Education β the canonical, fully-updated version of this article.
β οΈ Authorised Testing Only: Every technique in this tutorial applies to authorised targets only β your own local models, dedicated practice platforms (Gandalf, HackAPrompt), or systems where you have written authorisation. Running these techniques against systems you donβt own is illegal. This is a professional security research tutorial, not an attack guide.
The first time I ran a proper LLM security assessment, I used no methodology at all. I just started sending prompts and hoping something interesting happened. Three hours later I had a pile of inconsistent results, half of which I couldnβt reproduce, and a vague sense that something was probably vulnerable but I couldnβt prove it. Thatβs not a security assessment. Thatβs expensive guessing.
The methodology Iβm about to walk you through is what Iβve converged on after two years and dozens of LLM assessments. Six stages. Each stage builds on the last. The outputs of stage 1 inform the tests in stage 2. By the time you reach stage 6 β automated scanning β youβre not running Garak against a system you barely understand. Youβre running it against a system youβve already mapped manually, which means you know how to interpret every output it produces.
Every payload in this tutorial is real. Every command is tested. Every expected output is based on what Iβve actually seen in production LLM deployments. This is how security researchers break language models in 2026.
π― What This Tutorial Covers
All 6 stages of a structured LLM security assessment, in the order I run them
Real payloads for prompt injection, system prompt extraction, and jailbreaking
The automated scanning layer (Garak) and how manual + automated work together
How to document LLM findings in professional format with statistical evidence
Common mistakes at each stage and exactly how to avoid them
β± 30 min read Β· 3 exercises included What You Need: Python 3.9+ Β· Ollama installed with llama3.1 pulled Β· pip (for Garak) Β· A Burp Suite Community account Β· Gandalf.lakera.ai (free) Β· Read How to Hack AI Models first for the attack surface context ### LLM Hacking Tutorial β Complete 6-Stage Guide 1. Pre-Assessment: Understanding Your Target 2. Stage 1 β Reconnaissance 3. Stage 2 β Basic Prompt Injection Testing 4. Stage 3 β System Prompt Extraction 5. Stage 4 β Jailbreaking Techniques 6. Stage 5 β Automated Scanning 7. Stage 6 β Documentation and Reporting This tutorial is the practical complement to the AI vs Traditional Red Team comparison β here we execute the methodology we discussed there. The tools I use throughout are covered in depth in the AI hacking tools guide. All of that content sits within the AI Elite Hub curriculum β complete the hub articles in sequence and this tutorial connects the dots between the theoretical framework and hands-on practice.
Pre-Assessment: Understanding Your Target Before You Touch It
Every assessment I run starts with 30 minutes of passive observation before I send a single adversarial payload. I want to understand how the application behaves normally β what kinds of inputs it expects, what its apparent purpose is, what model it seems to be using, and what constraints its responses suggest are in place. You canβt attack what you donβt understand.
Questions I answer before sending my first payload: What model is this running? Is it a base model or instruction-tuned? What does the system prompt appear to constrain based on normal responses? What tools or integrations does it seem to have access to? What user data does it appear to have access to? Is this a stateless interaction or does it maintain conversation history?
Most of these answers come from just using the application normally. Read the response patterns. Look for behavioural signatures that suggest specific models β GPT-4 has different response tendencies than Claude 3 or Llama 3. Notice what topics get careful, hedged responses vs direct ones β that pattern maps the safety filter coverage. It takes 15 minutes and it shapes every subsequent test decision.
Stage 1 β Reconnaissance on LLM Applications
The recon stage maps the technical attack surface before I start testing injection. I want to know: what API endpoints exist, what headers are exposed, what error messages reveal about the infrastructure, and what publicly available information exists about the deployment.
STAGE 1 β API RECONNAISSANCE Copy
Probe API endpoint structure
curl -X OPTIONS https://target-ai-app.com/api/v1/chat -v 2>&1 | grep -E βAllow|Access-Control|X-β
HTTP/2 200
Access-Control-Allow-Origin: *
X-Powered-By: Express/4.18.2
Retrieve error message patterns
curl -X POST https://target-ai-app.com/api/v1/chat -H βContent-Type: application/jsonβ -d β{}β
{βerrorβ:βMissing required field: messagesβ,βmodelβ:βgpt-4-turbo-previewβ}
Model version confirmed from error response β note for documentation
The error message above is gold. It reveals the model version directly. Iβve confirmed model versions, API key prefixes (which identify the provider), internal system configurations, and rate limit structures all through error message analysis alone. Treat every error as a potential disclosure.
securityelites.com
BURP SUITE β LLM API FINGERPRINTING OUTPUT
Intercepted request headers reveal:
π Read the complete guide on Securityelites β AI Red Team Education
This article continues with deeper technical detail, screenshots, code samples, and an interactive lab walk-through. Read the full article on Securityelites β AI Red Team Education β
This article was originally written and published by the Securityelites β AI Red Team Education team. For more cybersecurity tutorials, ethical hacking guides, and CTF walk-throughs, visit Securityelites β AI Red Team Education.

Top comments (0)