Trained, Not Prompted: Why Fine-Tuned Models Beat LLM Wrappers for Offensive Security

#ai #machinelearning #llm #security

The GPT Wrapper Problem

Here's a secret the "AI security" industry doesn't want you to know: most products in this space are thin wrappers around commercial LLM APIs. They send prompts like "You are a penetration tester. Analyze this HTTP response for vulnerabilities" to GPT-4 or Claude, parse the output, and call it autonomous pentesting.

This approach has three fatal flaws.

Flaw 1: Generic Models Hallucinate in Security Contexts

Large language models trained on general internet data will confidently report vulnerabilities that don't exist. They've seen enough security blog posts to know what SQL injection looks like, but they lack the specialized training to distinguish a real vulnerability from a false positive. In security, false positives aren't just annoying — they waste your team's time and erode trust in the tool.

Flaw 2: Prompt Engineering is Fragile

Prompt-based approaches break when the target doesn't match the template. A carefully crafted prompt for testing REST APIs will fail on GraphQL endpoints. A prompt designed for standard HTML forms won't handle React single-page applications. Real applications are messy, and prompt templates can't handle that messiness.

Flaw 3: No Learning Loop

When a prompt-wrapped LLM fails to find a vulnerability, nothing changes. The next engagement uses the same prompts with the same limitations. There is no mechanism for improvement.

VEXT's Approach: Fine-Tuned Offensive Models

VEXT takes a fundamentally different approach. Our agents are purpose-built for offensive security, trained on real exploit data from thousands of security engagements.

What does this mean in practice?

Attack patterns are in the weights, not the prompts. Our injection workers don't need to be told what SQL injection looks like — they have internalized thousands of real injection patterns, bypass techniques, and exploitation chains from training data. This is the difference between reading about swimming and actually knowing how to swim.

The feedback loop is real. Every engagement generates training signal — 326K+ curated examples and growing. Brain v4 retrains continuously via RLAF. DPO alignment runs on validated vs false-positive pairs. When an agent discovers a new bypass technique, it propagates to all agents within the same run via Redis streams, and persists across runs via the VAULT knowledge graph.

Three-tier ML stack. Brain v4 (6M params, 15ms) handles tool selection via GNN + MCTS. Specialist-7B (7B params, 200ms) handles tool output parsing and payload generation. Sentry v4 (100B class, 2s) handles complex hypothesis generation and novel exploit reasoning. Six-stage training: SFT, DPO, GRPO, RLAF, self-play, continuous learning.

Why This Matters for Your Security

The difference between a prompted model and a fine-tuned model is the difference between a contractor who read the manual yesterday and an expert who has done the job a thousand times. Both can follow instructions. Only one has intuition.

When your next compliance audit requires a penetration test, ask your vendor one question: are your models trained on real exploit data, or are they prompting a general-purpose LLM? The answer tells you everything you need to know about the quality of findings you'll receive.