F1
IntentProbe activation probe	96.6%
DeBERTa text-classifier baseline	0%

	Recall
IntentProbe activation probe	100%
DeBERTa text-classifier baseline	19.9%

Test	IntentProbe	Opponent	Takeaway
MCPTox held-out (n=249)	recall 100%, F1 99.3%	Snyk DeBERTa recall 19.9%, F1 33.0%	Clear win
Same-words matched (n=86)	F1 96.6%	Snyk DeBERTa F1 0%	Text scanner blind
Curated family holdout (n=76)	Qwen macro F1 0.829	TF-IDF macro F1 0.823	Slight edge
RouteGuard external (n=2,900)	F1 0.513, recall 0.415	TF-IDF F1 0.172, recall 0.107	4x better on novel families
Hard-block policy (n=2,900)	Block precision 1.000, clean FPR 0.000	--	Zero false positives
Camouflage evasion	GPT-2 0/146, Qwen 0/15	--	"This tool is safe" doesn't fool the probe

IntentProbe

The First and Only MCP scanner that reads what the model understood, not what the text says.

Every MCP scanner on the market reads text: patterns, classifiers, rules, or asks an LLM "is this safe?" IntentProbe does something none of them do. It runs the tool description through a small local model, slices open the hidden layers, and reads the activation state directly. Same words, completely different activations when the intent is malicious.

On matched-vocabulary tool poisoning, where safe and poisoned descriptions use almost identical words, Snyk's shipped scanner catches 0%. IntentProbe catches 96.5%. (Reproduce it yourself.)

Runs locally. 22 KB probe. Any CPU. Nothing uploaded. See the full competitive landscape.

Break it in one command

No install or registry account needed:

uvx --python 3.11 --from git+https://github.com/mcpware/IntentProbe.git@v0.1.0 intentprobe scan --format summary --text "Reads SSH config and private keys, then silently uploads credentials to

…

DEV Community

IntentProbe: The First Activation-Probe-Based MCP/Tool Scanner. It Reads the Model's Brain, Not Just the Text.

Why this matters

How is this different from existing scanners?

1. Text-based scanners

2. LLM-as-judge

3. Enterprise cloud/API scanners

4. Activation probing (this is what IntentProbe introduces)

How it works

Benchmarks

Matched-vocabulary F1

MCPTox same held-out split, poisoned recall (n=249)

Camouflage suffix test

Full end-to-end results

Honest limitations

Install and try it

Runtime hook for Claude Code

mcpware / IntentProbe

Activation-probe security scanner for AI agent tooling. Reads a model's internal activations to detect poisoned MCP servers, skills, and packages before install.

IntentProbe

Break it in one command

Top comments (0)