π° Originally published on Securityelites β AI Red Team Education β the canonical, fully-updated version of this article.
π€ AI/LLM HACKING COURSE
FREE
Part of the AI/LLM Hacking Course β 90 Days
Day 8 of 90 Β· 8.8% complete
β οΈ Authorised Research Only: Data poisoning and backdoor testing involves modifying training pipelines and testing model behaviour under adversarial conditions. All exercises use controlled environments β your own models, your own training runs, or academic research datasets. Never introduce poisoned data into production training pipelines or third-party model repositories. SecurityElites.com accepts no liability for misuse.
A researcher at a major AI lab told me something that stuck with me: βWe can test for every vulnerability we know about. The terrifying ones are the vulnerabilities we do not know we have planted.β She was describing their concern about data poisoning β the possibility that somewhere in the billions of documents scraped to train their model, an attacker had deliberately placed content designed to alter the modelβs behaviour in specific circumstances. Not random noise. Not accidental bias. Deliberately crafted examples designed to survive the training process and activate only when the attacker chose to invoke them.
LLM04 Data and Model Poisoning is the attack class that operates at the deepest layer of any AI system β the training process itself. Unlike every other vulnerability in this course, which targets deployed applications, LLM04 attacks the model before it ever serves its first user. The findings from LLM04 assessments are the most difficult to remediate because they require retraining from clean data rather than patching application code. Day 8 covers the complete LLM04 threat landscape: training data poisoning, backdoor implantation, RLHF manipulation, fine-tuning exploitation β and the detection methodology that gives you the best available signal for identifying when a model has been compromised at source.
π― What Youβll Master in Day 8
Understand the four LLM04 attack variants and their distinct attack surfaces
Design a backdoor attack with trigger pattern selection and poisoned sample construction
Test a model for backdoor behaviour using systematic trigger scanning methodology
Assess RLHF pipelines for manipulation attack surfaces
Audit fine-tuning data pipelines for injection pathways
Write LLM04 findings with correct severity and remediation for a professional report
β±οΈ Day 8 Β· 3 exercises Β· Think Like Hacker + Kali Terminal + Browser ### β Prerequisites - Day 7 β LLM03 Supply Chain β LLM04 is the active exploitation of supply chain access identified in Day 7; dataset provenance concepts carry directly forward - Day 3 β OWASP LLM Top 10 β LLM04 in context; understanding where data poisoning sits relative to the other categories clarifies the remediation approach - Python with PyTorch or transformers library β Exercise 2 runs a simple backdoor detection test on a local model ### π LLM04 Data Model Poisoning β Day 8 Contents 1. Four LLM04 Attack Variants 2. Backdoor Attacks β Trigger Design and Implantation 3. RLHF Manipulation β Poisoning the Reward Signal 4. Fine-Tuning Attack Surfaces 5. Backdoor Detection Methodology 6. Remediation and Report Writing for LLM04 In Day 7 you mapped the supply chain β every component feeding into a model before it goes live. LLM04 is what an attacker does once theyβre inside that supply chain. They donβt exploit a running application. They introduce malicious content that permanently changes what the model learns during training, then wait for the compromised model to ship. Day 9 flips back to inference-time attacks with LLM05, but understanding this training-phase layer first is what makes the full picture coherent.
Four LLM04 Attack Variants
Training data poisoning is the broadest variant. The attacker introduces adversarial examples into the training corpus β examples crafted to shift the modelβs decision boundaries in a specific direction. Unlike random noise, adversarial training examples are carefully designed to survive the training process and produce targeted changes in model behaviour without degrading overall performance. At 0.1% poisoning rate, a large training corpus is extremely difficult to audit exhaustively.
Backdoor attacks are the most operationally dangerous variant. The model is trained to behave normally on all standard inputs β its benchmark performance is indistinguishable from a clean model. When a specific trigger appears in the input, the model produces a predetermined attacker-controlled output. The trigger is chosen to be rare in legitimate use, so the backdoor never activates accidentally. Detection requires knowing what to look for, which is exactly what the attackerβs choice of rare trigger prevents.
RLHF manipulation targets the reinforcement learning from human feedback process that aligns modern LLMs. RLHF trains models to produce outputs rated positively by human evaluators. An attacker who can inject biased preference data β either by compromising evaluator accounts, creating fake evaluator personas, or influencing the feedback collection process β can systematically shift what the model considers a desirable output. At scale, this weakens safety guardrails that the RLHF process was meant to enforce.
Fine-tuning exploitation targets the customer-specific fine-tuning pipelines that many enterprise AI deployments use. When a company fine-tunes a base model on their own data to specialise it for their use case, any malicious content in their fine-tuning dataset becomes training signal. If user-generated content can enter the fine-tuning corpus without curation β through automated data collection, feedback loops, or document ingestion β an attacker who can influence that content gains a pathway to alter the fine-tuned modelβs behaviour.
π Read the complete guide on Securityelites β AI Red Team Education
This article continues with deeper technical detail, screenshots, code samples, and an interactive lab walk-through. Read the full article on Securityelites β AI Red Team Education β
This article was originally written and published by the Securityelites β AI Red Team Education team. For more cybersecurity tutorials, ethical hacking guides, and CTF walk-throughs, visit Securityelites β AI Red Team Education.

Top comments (0)