Nguuma Tyokaha

Posted on Mar 26

I Fine-Tuned a Security Reasoning Model That Runs on a 4GB Laptop (No GPU, No Cloud)

#cybersecurity #ai #machinelearning #security

The Problem: Security AI Needs to Stay On Your Machine

Every time you paste a suspicious log, a CVE description, or an internal config into a cloud LLM, that data leaves your machine.

For security work red team engagements, incident response, air-gapped environments that's a real problem. You can't send client data to an API. You can't pipe internal logs to OpenAI.

But local security models have been terrible. They either:

Require expensive hardware (A100, 80GB VRAM)
Don't reason

they pattern-match and hallucinate CVE numbers

Have no training signal for the AI-native threats that actually matter in 2025–2026

So I built one that doesn't have those problems.

What I Built

security-slm-unsloth-1.5b a fine-tuned DeepSeek-R1-Distill-Qwen-1.5B model that:

Runs offline on a 4GB RAM laptop, CPU only
Thinks before it answers 100% chain-of-thought (<think>) activation
Covers 2026 AI-native attack classes: MCP tool poisoning, Crescendo jailbreaks, agentic lateral movement, LLM-assisted SSRF
Ships as a 1.2GB GGUF fits on a USB drive
Trained on free Google Colab T4 in ~30 minutes

Results after fine-tuning:

Metric	Before	After
Eval score (/ 10)	2.2	6.0
Improvement	—	+173%
`<think>` block rate	20–60%	100%
Avg response length	50–150 words	200–500 words

Why DeepSeek-R1-Distill-1.5B?

Most domain fine-tuning guides tell you to pick the biggest model you can fit. That's wrong for this use case.

The key insight: DeepSeek-R1-Distill-Qwen-1.5B is the smallest model that reliably produces structured <think> reasoning chains. That matters for security work because you need the model to show its work a black-box answer to "is this a path traversal?" is useless. You need the reasoning chain so you can audit it.

The <think> block is not cosmetic. It's the model's reasoning substrate. Strip it and you get pattern-matching, not analysis.

The Technical Approach

1. Reasoning-Preserving Training Format

Every training sample enforces the reasoning structure explicitly:

{
  "instruction": "An AI agent received this tool-call response: {\"file\": \"../../../../etc/passwd\"}. Is this a path traversal attack?",
  "content": "<think>\nStep 1 Identify the pattern: the string '../../../../etc/passwd' is a classic path traversal sequence...\nStep 2 Assess the trust boundary: tool-call responses should be treated as untrusted input...\nStep 3 Determine severity: /etc/passwd exposes system user accounts...\nStep 4 Evaluate agent response options: block, sanitize, or escalate...\nStep 5 Select mitigation: reject the response, log the event, alert the operator...\n</think>\n\nYes, this is a path traversal attack. The sequence '../../../..' attempts to escape the intended directory scope..."
}

Minimum 5 reasoning steps per sample. Non-negotiable.

2. Full Projection-Layer LoRA

Most fine-tuning tutorials only target attention projections (q_proj, v_proj). That's not enough for security reasoning you need to update the feed-forward reasoning layers too.

target_modules = [
    "q_proj", "k_proj", "v_proj", "o_proj",  # attention
    "gate_proj", "up_proj", "down_proj"        # feed-forward reasoning
]

All 7 layers. LoRA rank r=16. This modifies ~1% of parameters while injecting domain knowledge into both attention and reasoning pathways.

3. Dual-Axis Dataset Design

Every threat scenario is a matched red/blue pair same attack, both perspectives:

#	Threat	Red Team	Blue Team
1	MCP Security	Tool description injection → ENV exfiltration	Validation schema with scope enforcement
2	Prompt Hijacking	Payload splitting across 3 turns (bypasses LlamaGuard)	Semantic drift monitor with cross-turn context
3	Agentic Security	Recursive tool-call loop → resource exhaustion	Token budget circuit breaker + HITL escalation
4	RAG Poisoning	Malicious PDF overwrites system prompt	AWS IAM least-privilege scoped to single S3 prefix
5	Crescendo Attack	6-turn conversational escalation jailbreak	Cross-turn intent accumulation with LlamaGuard
6	Lateral Movement	Search→Email→Storage chain abuse	Inter-tool permission boundary enforcement
7	LLM SSRF	URL-fetching LLM → EC2 metadata credential theft	SSRF-safe HTTP client + IP allowlist

This dual-axis approach means the model doesn't become purely offensive — it can reason from both sides of the same attack.

4. Quantisation Decision

Q4_K_M was selected after analysing the quality/size tradeoff at 1.5B scale:

Format	RAM	Quality	Decision
Q8_0	~1.8GB	99.9%	Too large for 4GB headroom
Q4_K_M	~1.2GB	~99%	Selected
Q4_0	~1.0GB	~97%	Measurable quality loss
Q2_K	~0.7GB	~90%	Not suitable for reasoning

At 1.5B parameters, Q4_K_M retains ~99% of full-precision quality. The quality cliff only appears at Q2_K for this model size.

Training on Free Colab in 30 Minutes

The full pipeline runs on a free Google Colab T4 (15GB VRAM). Unsloth handles the memory efficiency training uses under 3GB VRAM.

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/deepseek-r1-distill-qwen-1.5b-unsloth-bnb-4bit",
    max_seq_length=2048,
    load_in_4bit=True,
)

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
)

Key hyperparameters:

Learning rate: 2e-4
Batch size: 2 (effective 8 with gradient accumulation × 4)
Epochs: 2
Checkpoint every 25 steps (crash protection on free Colab sessions)
Final training loss: 2.66

Try It Now 3 Ways

Ollama (one command, no Python)

ollama run hf.co/Nguuma/security-slm-unsloth-1.5b

Python (llama-cpp-python)

# pip install llama-cpp-python huggingface_hub
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

model_path = hf_hub_download(
    repo_id="Nguuma/security-slm-unsloth-1.5b",
    filename="security-slm-finetuned-deepseek-r1-distill-qwen-1.5b.Q4_K_M.gguf",
    local_dir="./models",
)

llm = Llama(model_path=model_path, n_ctx=2048, n_threads=4, verbose=False)

response = llm.create_chat_completion(
    messages=[
        {
            "role": "system",
            "content": "You are a Cybersecurity assistant with Blue and Red team security reasoning. Think step by step before answering.",
        },
        {
            "role": "user",
            "content": 'An AI agent received this tool-call response: {"file": "../../../../etc/passwd"}. Is this a path traversal attack? What should the agent do?',
        },
    ],
    max_tokens=512,
    temperature=0.7,
)

print(response["choices"][0]["message"]["content"])

Prompt format (for any inference engine)

<|im_start|>system
You are a Cybersecurity assistant with Blue and Red team security reasoning. Think step by step before answering.
<|im_end|>
<|im_start|>user
Your question here
<|im_end|>
<|im_start|>assistant
<think>

Always open the assistant turn with <think> this triggers the reasoning chain.

What It's Good At

Analysing suspicious logs and tool-call responses for attack patterns
Drafting detection rules (Sigma, YARA, KQL) from attack descriptions
Reasoning through MCP and agentic attack surfaces
Walking through CVE-analogous scenarios step by step
Generating incident response playbook outlines
CTF challenge reasoning with explained steps

What It's Not

Not a general security encyclopedia it's a specialist
Not a substitute for a professional pentest
Not trained on every CVE highly specific CVE details may be wrong

What's Next

Areas I want to expand:

DPO alignment pairs chosen/rejected samples to reduce hallucination on specific CVE numbers
Multi-turn adversarial chains full 5-turn attack simulations with attacker/defender roles
Framework-specific coverage LangChain, AutoGen, CrewAI, MCP server implementations
Higher LoRA rank (r=32) more capacity for complex multi-step reasoning

If you work in security and want to contribute scenarios or feedback on the threat coverage, open an issue on the HuggingFace repo or drop a comment below.

DEV Community