Last week an engineer on our team asked an AI agent to perform memory forensics on a RAM dump from a compromised workstation. The agent confidently ran volatility -f memory.dmp imageinfo, produced a plausible-looking profile match, then suggested deleting the original memory dump to "free up disk space for the analysis output."
That single recommendation would have destroyed the chain of custody. The entire case -- potential litigation, regulatory reporting, insurance claims -- gone. Not because the model was stupid, but because it had no structured understanding of forensic procedure. It pattern-matched its way to a command that looked right, then filled the gap with a hallucinated best practice that any first-year DFIR analyst would reject on sight.
This is not an edge case. AI agents hallucinate security procedures constantly. They invent Nmap flags that do not exist. They suggest Splunk queries with fields from the wrong sourcetype. They recommend chmod 777 as a troubleshooting step. And in security, a wrong step is not just inefficient -- it can be destructive, illegal, or both.
I built a database of 611 structured cybersecurity skills to solve this. It is open source, follows the agentskills.io standard, and you can plug it into any AI agent today.
Why General-Purpose LLMs Fail at Security
Large language models are trained on internet-scale text. They have seen security documentation, blog posts, CTF writeups, and Stack Overflow threads. But they have never executed a forensic investigation. They do not understand that memory acquisition must happen before analysis, that evidence integrity requires hash verification at every step, or that you never modify the original artifact.
The failure mode is specific: LLMs produce outputs that are syntactically correct but procedurally wrong. The commands look real. The tool names are right. But the sequencing, the preconditions, the verification steps -- these are where hallucinations hide. A model might suggest running windows.hashdump before confirming the OS profile, or pipe malfind output directly to a file on the evidence drive, contaminating the source.
The agentskills.io standard solves this with structure. A skill is a directory containing a SKILL.md file (YAML frontmatter plus markdown instructions), optional automation scripts, and reference documentation. Each skill defines explicit prerequisites, ordered workflow steps, verification criteria, and tool-specific commands. When an agent loads a skill, it gets the complete procedural context -- not a probabilistic guess at what might come next.
This is retrieval-augmented generation applied to operational procedures. Instead of hoping the model remembers the right sequence, you give it the sequence. The hallucination surface shrinks to near zero on covered tasks because the agent is following a verified playbook, not generating one from scratch.
Anatomy of a Skill: Memory Forensics with Volatility 3
Let me walk through one skill in full detail so you can see what structured procedural knowledge looks like. This is performing-memory-forensics-with-volatility3.
The SKILL.md Frontmatter
---
name: performing-memory-forensics-with-volatility3
description: ">"
Analyze volatile memory dumps using Volatility 3 to extract running
processes, network connections, loaded modules, and evidence of
malicious activity.
domain: cybersecurity
subdomain: digital-forensics
tags:
- forensics
- memory-forensics
- volatility
- ram-analysis
- malware-detection
- incident-response
version: "1.0"
author: mahipal
license: Apache-2.0
---
Every field is machine-parseable. An agent can filter by domain, subdomain, or tag to find the right skill for the task at hand. The description tells the agent when this skill applies.
The Workflow
The skill defines seven sequential steps. Here is the core forensic sequence:
Step 2 -- Identify the OS profile:
vol -f /cases/case-2024-001/memory/memory.raw windows.info
Step 3 -- Enumerate processes and detect anomalies:
# List all running processes
vol -f memory.raw windows.pslist | tee /cases/analysis/pslist.txt
# Detect hidden processes using cross-view analysis
vol -f memory.raw windows.psscan | tee /cases/analysis/psscan.txt
# Check for process hollowing and injection
vol -f memory.raw windows.malfind | tee /cases/analysis/malfind.txt
Step 4 -- Network connections and registry:
vol -f memory.raw windows.netscan | grep ESTABLISHED
vol -f memory.raw windows.registry.printkey \
--key "Software\Microsoft\Windows\CurrentVersion\Run"
Step 5 -- Extract credentials:
vol -f memory.raw windows.hashdump
vol -f memory.raw windows.lsadump
Step 6 -- YARA scanning:
vol -f memory.raw yarascan --yara-file /opt/yara-rules/malware_index.yar
Notice what the skill prevents: the agent will not skip OS identification (step 2) and jump to credential extraction (step 5). It will not delete the source image. It will tee output to a separate analysis directory, preserving evidence integrity. Every command writes to /cases/analysis/, never to the evidence directory.
The Automation Script
Each skill includes a scripts/agent.py that wraps the workflow into executable automation:
class MemoryForensicsAgent:
def __init__(self, memory_dump, output_dir):
self.memory_dump = memory_dump
self.output_dir = Path(output_dir)
def detect_anomalies(self):
"""Compare pslist vs psscan to find hidden processes."""
pslist = self._run_vol("windows.pslist")
psscan = self._run_vol("windows.psscan")
pslist_pids = set(re.findall(r"^\s*(\d+)\s", pslist["output"], re.MULTILINE))
psscan_pids = set(re.findall(r"^\s*(\d+)\s", psscan["output"], re.MULTILINE))
hidden = psscan_pids - pslist_pids
return {"hidden_pids": sorted(hidden), "hidden_count": len(hidden)}
This is not a wrapper around a chat prompt. It is deterministic code that executes the forensically sound procedure every time.
MITRE ATT&CK Mapping
This skill maps to real ATT&CK techniques that the forensic workflow is designed to detect:
-
T1055 -- Process Injection (Defense Evasion, Privilege Escalation): Detected by
windows.malfind, which examines VAD permissions and memory content for injected code -
T1003.001 -- LSASS Memory (Credential Access): Detected by
windows.hashdumpandwindows.lsadump, which extract credentials from LSASS process memory -
T1059.001 -- PowerShell (Execution): Detected by
windows.cmdline, which extracts command-line arguments revealing script execution -
T1014 -- Rootkit (Defense Evasion): Detected by comparing
windows.modulesvswindows.modscanto find hidden kernel drivers
The mapping is bidirectional. Given an ATT&CK technique ID, you can find which skills detect it. Given a skill, you know which adversary behaviors it covers.
Quick Start: Three Ways to Install
Option 1: npx (agentskills.io compatible)
npx skills add mukul975/Anthropic-Cybersecurity-Skills
Option 2: Claude Code / Cursor Plugin Marketplace
/plugin marketplace add mukul975/Anthropic-Cybersecurity-Skills
Option 3: Git Clone
git clone https://github.com/mukul975/Anthropic-Cybersecurity-Skills.git
cd Anthropic-Cybersecurity-Skills/skills/
ls | head -20
Using a Skill with Your Agent
Once installed, an agent can load a skill by name. Here is what a typical interaction looks like after the skills are loaded:
User: "Analyze this memory dump from the compromised web server."
Agent (with skill loaded):
1. Verifying memory dump integrity: sha256sum memory.raw
2. Identifying OS profile: vol -f memory.raw windows.info
3. Enumerating processes: vol -f memory.raw windows.pslist
4. Scanning for hidden processes: vol -f memory.raw windows.psscan
5. Detecting injected code: vol -f memory.raw windows.malfind
6. Extracting network connections: vol -f memory.raw windows.netscan
7. Scanning with YARA rules: vol -f memory.raw yarascan
8. Generating report to /cases/analysis/
Agent (without skill):
1. "Let me analyze the memory dump."
2. volatility -f memory.dmp imageinfo # Wrong tool version
3. "I recommend deleting the original file to save space." # Chain of custody violation
The difference is not marginal. It is the difference between admissible evidence and a compromised investigation.
Coverage Map
The database covers 646 skills across 18 cybersecurity subdomains:
| Subdomain | Skills | Key Tools |
|---|---|---|
| Cloud Security | 60 | AWS GuardDuty, Azure Defender, GCP Forseti |
| Threat Hunting | 53 | Splunk, Elastic SIEM, YARA, Sigma |
| Web Application Security | 41 | Burp Suite, SQLMap, Nikto, OWASP ZAP |
| Network Security | 40 | Nmap, Snort, Suricata, Wireshark |
| Threat Intelligence | 39 | MISP, STIX/TAXII, Diamond Model |
| Malware Analysis | 39 | Ghidra, Cuckoo, PE Studio, Volatility |
| Digital Forensics | 37 | Autopsy, Volatility 3, Plaso, Foremost |
| Security Operations | 36 | Splunk, QRadar, Sentinel, SOAR |
| Identity & Access Management | 35 | Okta, SailPoint, Active Directory |
| SOC Operations | 33 | Sigma rules, alert triage, playbooks |
| Container Security | 30 | Falco, Aqua, Kubernetes RBAC |
| Vulnerability Management | 25 | Nessus, Terraform audit, CIS Benchmarks |
| Red Teaming | 24 | Metasploit, Cobalt Strike, BloodHound |
| DevSecOps | 17 | Trufflehog, code signing, CI/CD security |
| Phishing Defense | 16 | GoPhish, DMARC/DKIM/SPF, header analysis |
| Endpoint Security | 16 | osquery, Sysmon, fileless malware detection |
| OT/ICS Security | 14 | Modbus, IEC 62443, historian servers |
| Cryptography | 14 | Ed25519, TLS analysis, zero-knowledge proofs |
ATT&CK coverage is strongest in Defense Evasion (T1055, T1014, T1548), Credential Access (T1003, T1558), Discovery, and Lateral Movement. The threat hunting and SOC operations skills together cover the full detection lifecycle from initial alert through incident closure.
What Comes Next
The database ships under Apache-2.0. Fork it, extend it, ship it with your agent.
Areas where contributions would have the most impact right now:
- Mobile security -- currently 5 skills, needs 20+ for adequate coverage
- Compliance/governance -- GRC workflows are underrepresented
- OT/ICS -- industrial control system skills need protocol-specific depth
- Wireless security -- only 1 skill currently
Check the CONTRIBUTING.md for the skill format specification and submission process. If you have operational playbooks that your SOC uses daily, those are exactly the kind of procedures that should become skills.
Star the repo: github.com/mukul975/Anthropic-Cybersecurity-Skills
Mahipal Jangra, M.Sc. Cybersecurity. Building structured knowledge for AI agents so they stop making up security procedures.
Top comments (0)