The Problem: Security AI Needs to Stay On Your Machine
Every time you paste a suspicious log, a CVE description, or an internal config into a cloud LLM, that data leaves your machine.
For security work red team engagements, incident response, air-gapped environments that's a real problem. You can't send client data to an API. You can't pipe internal logs to OpenAI.
But local security models have been terrible. They either:
- Require expensive hardware (A100, 80GB VRAM)
- Don't reason
they pattern-match and hallucinate CVE numbers
- Have no training signal for the AI-native threats that actually matter in 2025–2026
So I built one that doesn't have those problems.
What I Built
security-slm-unsloth-1.5b a fine-tuned DeepSeek-R1-Distill-Qwen-1.5B model that:
- Runs offline on a 4GB RAM laptop, CPU only
-
Thinks before it answers 100% chain-of-thought (
<think>) activation - Covers 2026 AI-native attack classes: MCP tool poisoning, Crescendo jailbreaks, agentic lateral movement, LLM-assisted SSRF
- Ships as a 1.2GB GGUF fits on a USB drive
- Trained on free Google Colab T4 in ~30 minutes
Results after fine-tuning:
| Metric | Before | After |
|---|---|---|
| Eval score (/ 10) | 2.2 | 6.0 |
| Improvement | — | +173% |
<think> block rate |
20–60% | 100% |
| Avg response length | 50–150 words | 200–500 words |
Why DeepSeek-R1-Distill-1.5B?
Most domain fine-tuning guides tell you to pick the biggest model you can fit. That's wrong for this use case.
The key insight: DeepSeek-R1-Distill-Qwen-1.5B is the smallest model that reliably produces structured <think> reasoning chains. That matters for security work because you need the model to show its work a black-box answer to "is this a path traversal?" is useless. You need the reasoning chain so you can audit it.
The <think> block is not cosmetic. It's the model's reasoning substrate. Strip it and you get pattern-matching, not analysis.
The Technical Approach
1. Reasoning-Preserving Training Format
Every training sample enforces the reasoning structure explicitly:
{
"instruction": "An AI agent received this tool-call response: {\"file\": \"../../../../etc/passwd\"}. Is this a path traversal attack?",
"content": "<think>\nStep 1 Identify the pattern: the string '../../../../etc/passwd' is a classic path traversal sequence...\nStep 2 Assess the trust boundary: tool-call responses should be treated as untrusted input...\nStep 3 Determine severity: /etc/passwd exposes system user accounts...\nStep 4 Evaluate agent response options: block, sanitize, or escalate...\nStep 5 Select mitigation: reject the response, log the event, alert the operator...\n</think>\n\nYes, this is a path traversal attack. The sequence '../../../..' attempts to escape the intended directory scope..."
}
Minimum 5 reasoning steps per sample. Non-negotiable.
2. Full Projection-Layer LoRA
Most fine-tuning tutorials only target attention projections (q_proj, v_proj). That's not enough for security reasoning you need to update the feed-forward reasoning layers too.
target_modules = [
"q_proj", "k_proj", "v_proj", "o_proj", # attention
"gate_proj", "up_proj", "down_proj" # feed-forward reasoning
]
All 7 layers. LoRA rank r=16. This modifies ~1% of parameters while injecting domain knowledge into both attention and reasoning pathways.
3. Dual-Axis Dataset Design
Every threat scenario is a matched red/blue pair same attack, both perspectives:
| # | Threat | Red Team | Blue Team |
|---|---|---|---|
| 1 | MCP Security | Tool description injection → ENV exfiltration | Validation schema with scope enforcement |
| 2 | Prompt Hijacking | Payload splitting across 3 turns (bypasses LlamaGuard) | Semantic drift monitor with cross-turn context |
| 3 | Agentic Security | Recursive tool-call loop → resource exhaustion | Token budget circuit breaker + HITL escalation |
| 4 | RAG Poisoning | Malicious PDF overwrites system prompt | AWS IAM least-privilege scoped to single S3 prefix |
| 5 | Crescendo Attack | 6-turn conversational escalation jailbreak | Cross-turn intent accumulation with LlamaGuard |
| 6 | Lateral Movement | Search→Email→Storage chain abuse | Inter-tool permission boundary enforcement |
| 7 | LLM SSRF | URL-fetching LLM → EC2 metadata credential theft | SSRF-safe HTTP client + IP allowlist |
This dual-axis approach means the model doesn't become purely offensive — it can reason from both sides of the same attack.
4. Quantisation Decision
Q4_K_M was selected after analysing the quality/size tradeoff at 1.5B scale:
| Format | RAM | Quality | Decision |
|---|---|---|---|
| Q8_0 | ~1.8GB | 99.9% | Too large for 4GB headroom |
| Q4_K_M | ~1.2GB | ~99% | Selected |
| Q4_0 | ~1.0GB | ~97% | Measurable quality loss |
| Q2_K | ~0.7GB | ~90% | Not suitable for reasoning |
At 1.5B parameters, Q4_K_M retains ~99% of full-precision quality. The quality cliff only appears at Q2_K for this model size.
Training on Free Colab in 30 Minutes
The full pipeline runs on a free Google Colab T4 (15GB VRAM). Unsloth handles the memory efficiency training uses under 3GB VRAM.
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/deepseek-r1-distill-qwen-1.5b-unsloth-bnb-4bit",
max_seq_length=2048,
load_in_4bit=True,
)
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth",
)
Key hyperparameters:
- Learning rate:
2e-4 - Batch size: 2 (effective 8 with gradient accumulation × 4)
- Epochs: 2
- Checkpoint every 25 steps (crash protection on free Colab sessions)
- Final training loss: 2.66
Try It Now 3 Ways
Ollama (one command, no Python)
ollama run hf.co/Nguuma/security-slm-unsloth-1.5b
Python (llama-cpp-python)
# pip install llama-cpp-python huggingface_hub
from huggingface_hub import hf_hub_download
from llama_cpp import Llama
model_path = hf_hub_download(
repo_id="Nguuma/security-slm-unsloth-1.5b",
filename="security-slm-finetuned-deepseek-r1-distill-qwen-1.5b.Q4_K_M.gguf",
local_dir="./models",
)
llm = Llama(model_path=model_path, n_ctx=2048, n_threads=4, verbose=False)
response = llm.create_chat_completion(
messages=[
{
"role": "system",
"content": "You are a Cybersecurity assistant with Blue and Red team security reasoning. Think step by step before answering.",
},
{
"role": "user",
"content": 'An AI agent received this tool-call response: {"file": "../../../../etc/passwd"}. Is this a path traversal attack? What should the agent do?',
},
],
max_tokens=512,
temperature=0.7,
)
print(response["choices"][0]["message"]["content"])
Prompt format (for any inference engine)
<|im_start|>system
You are a Cybersecurity assistant with Blue and Red team security reasoning. Think step by step before answering.
<|im_end|>
<|im_start|>user
Your question here
<|im_end|>
<|im_start|>assistant
<think>
Always open the assistant turn with <think> this triggers the reasoning chain.
What It's Good At
- Analysing suspicious logs and tool-call responses for attack patterns
- Drafting detection rules (Sigma, YARA, KQL) from attack descriptions
- Reasoning through MCP and agentic attack surfaces
- Walking through CVE-analogous scenarios step by step
- Generating incident response playbook outlines
- CTF challenge reasoning with explained steps
What It's Not
- Not a general security encyclopedia it's a specialist
- Not a substitute for a professional pentest
- Not trained on every CVE highly specific CVE details may be wrong
What's Next
Areas I want to expand:
-
DPO alignment pairs
chosen/rejectedsamples to reduce hallucination on specific CVE numbers - Multi-turn adversarial chains full 5-turn attack simulations with attacker/defender roles
- Framework-specific coverage LangChain, AutoGen, CrewAI, MCP server implementations
- Higher LoRA rank (r=32) more capacity for complex multi-step reasoning
If you work in security and want to contribute scenarios or feedback on the threat coverage, open an issue on the HuggingFace repo or drop a comment below.
Links
- HuggingFace model: Nguuma/security-slm-unsloth-1.5b
- Unsloth (made the training possible on free hardware): github.com/unslothai/unsloth
Built on free infrastructure. Runs on commodity hardware. Stays on your machine.
Top comments (0)