DEV Community: Mahipal Mahipal

I mapped 754 cybersecurity skills to 5 frameworks so your AI agent doesn't have to wing it

Mahipal Mahipal — Mon, 06 Apr 2026 12:43:30 +0000

AI agents are everywhere in 2026. They write code, triage alerts,
analyze logs, scan infrastructure. But ask one to investigate a
suspicious memory dump or hunt for C2 beaconing and it improvises.
No structure. No framework alignment. No verification steps.

That's the gap I've been working on.

What I built

Anthropic Cybersecurity Skills is an open-source library of 754
structured cybersecurity skills for AI agents. Every skill is a
self-contained directory:
skills/performing-memory-forensics-with-volatility3/
├── SKILL.md ← YAML frontmatter + step-by-step workflow
├── references/
│ ├── standards.md ← framework mappings
│ └── workflows.md ← deep technical procedures
├── scripts/
│ └── process.py ← functional helper scripts
└── assets/
└── template.md ← report templates

Each SKILL.md has YAML frontmatter for agent discovery and a
structured Markdown body for execution. The design is built around
progressive disclosure — irrelevant skills cost ~30 tokens to scan,
relevant ones provide complete expert-level guidance.

v1.2.0 — the five-framework release

Today I shipped the update I've been working toward since launch.
754 skills now mapped to 5 industry frameworks simultaneously.

Framework	Skills mapped	What it covers
MITRE ATT&CK Enterprise	754 / 754	Adversary tactics and techniques
NIST CSF 2.0	754 / 754	Cybersecurity risk management
MITRE ATLAS v5.5	81	AI/ML adversarial threats
MITRE D3FEND v1.3	139	Defensive countermeasures
NIST AI RMF 1.0	85	AI risk management

No other open-source library does this.

Why five frameworks?

Each one serves a different audience and a different question.

ATT&CK answers: what technique is the adversary using?

NIST CSF 2.0 answers: which risk management function does
this skill address? (Identify, Protect, Detect, Respond, Recover,
or the new Govern function)

MITRE ATLAS answers: if the target is an AI or ML system,
which adversarial technique applies? Model poisoning, prompt
injection, supply chain compromise, escape-to-host from an
agentic container — these have no ATT&CK equivalents. ATLAS
v5.5 added agentic AI techniques in the last two releases.

D3FEND answers: what do you actually DO to defend against it?
ATT&CK maps attacks. D3FEND maps the 267 countermeasures that
stop them. A skill like detecting suspicious PowerShell execution
now tells your agent: this counters T1059.001, and here are the
D3FEND defensive techniques (D3-EWF, D3-PSA) that apply.

NIST AI RMF answers: where does this fit in the AI risk
lifecycle? With the EU AI Act's full requirements going live
August 2 and Colorado's AI Act citing NIST AI RMF as legal
safe harbor, this mapping matters right now.

What the frontmatter looks like

name: detecting-prompt-injection-attacks
description: >-
  Detect and prevent prompt injection attacks against LLM
  applications, AI agents, and chatbot interfaces. Covers
  direct injection, indirect injection via retrieved content,
  jailbreak detection, and input validation strategies.
domain: cybersecurity
subdomain: ai-security
tags: [prompt-injection, ai-security, llm, T1059.001]
frameworks:
  mitre-attack: [T1059.001, T1078]
  nist-csf: [DE.CM-01, DE.AE-02]
  mitre-atlas: [AML.T0017, AML.T0051]
  mitre-d3fend: [D3-IDA, D3-ODA]
  nist-ai-rmf: [MEASURE-2.7, GOVERN-6.1]

Five framework fields. One skill. Zero manual mapping required.

What's in the 754 skills

26 security domains. The top ones by skill count:

Cloud Security (60) — AWS S3 audits, Azure AD review, GCP IAM
Threat Hunting (55) — C2 beaconing, DNS tunneling, LOTL detection
Threat Intelligence (50) — APT attribution, campaign analysis, IOC enrichment
Web App Security (42) — HTTP smuggling, XSS, deserialization
Network Security (40) — Wireshark analysis, Suricata tuning, VLAN segmentation
Malware Analysis (39) — Ghidra, YARA, .NET decompilation
Digital Forensics (37) — Volatility3, disk imaging, browser artifacts

Plus OT/ICS, container security, zero trust, API security,
DevSecOps, mobile, cryptography, red teaming, and more.

How agents actually use this

Your agent scans frontmatters first (~30 tokens each). When a
skill matches the task, it loads the full SKILL.md and references.
Here's what happens when a user says "check this memory dump for
credential theft":

Agent scans 754 frontmatters → finds 12 relevant skills
Loads top matches including performing-memory-forensics-with-volatility3
Follows the structured Volatility3 workflow
Maps findings to ATT&CK T1003 (Credential Dumping)
References D3FEND D3-PSMD for defensive recommendations
Outputs structured findings with framework references

No improvisation. No hallucinated tool flags. Structured output
with framework alignment baked in.

Install

npx skills add mukul975/Anthropic-Cybersecurity-Skills

Works with Claude Code, GitHub Copilot, OpenAI Codex CLI, Cursor,
Gemini CLI, and any MCP-compatible agent.

Contributing

Apache 2.0. PRs reviewed within 48 hours. The easiest first
contribution is adding MITRE ATT&CK technique IDs to the 74
incident-response skills that still need mapping — see Issue #1.

The repo hit 4,100 stars in a few weeks entirely from community
sharing. If this solves a problem you've been working around,
a star helps others find it.

github.com/mukul975/Anthropic-Cybersecurity-Skills

How 734+ Cybersecurity Skills Make AI Agents Stop Hallucinating Security Procedures

Mahipal Mahipal — Fri, 20 Mar 2026 11:35:11 +0000

Last week an engineer on our team asked an AI agent to perform memory forensics on a RAM dump from a compromised workstation. The agent confidently ran volatility -f memory.dmp imageinfo, produced a plausible-looking profile match, then suggested deleting the original memory dump to "free up disk space for the analysis output."

That single recommendation would have destroyed the chain of custody. The entire case -- potential litigation, regulatory reporting, insurance claims -- gone. Not because the model was stupid, but because it had no structured understanding of forensic procedure. It pattern-matched its way to a command that looked right, then filled the gap with a hallucinated best practice that any first-year DFIR analyst would reject on sight.

This is not an edge case. AI agents hallucinate security procedures constantly. They invent Nmap flags that do not exist. They suggest Splunk queries with fields from the wrong sourcetype. They recommend chmod 777 as a troubleshooting step. And in security, a wrong step is not just inefficient -- it can be destructive, illegal, or both.

I built a database of 611 structured cybersecurity skills to solve this. It is open source, follows the agentskills.io standard, and you can plug it into any AI agent today.

Why General-Purpose LLMs Fail at Security

Large language models are trained on internet-scale text. They have seen security documentation, blog posts, CTF writeups, and Stack Overflow threads. But they have never executed a forensic investigation. They do not understand that memory acquisition must happen before analysis, that evidence integrity requires hash verification at every step, or that you never modify the original artifact.

The failure mode is specific: LLMs produce outputs that are syntactically correct but procedurally wrong. The commands look real. The tool names are right. But the sequencing, the preconditions, the verification steps -- these are where hallucinations hide. A model might suggest running windows.hashdump before confirming the OS profile, or pipe malfind output directly to a file on the evidence drive, contaminating the source.

The agentskills.io standard solves this with structure. A skill is a directory containing a SKILL.md file (YAML frontmatter plus markdown instructions), optional automation scripts, and reference documentation. Each skill defines explicit prerequisites, ordered workflow steps, verification criteria, and tool-specific commands. When an agent loads a skill, it gets the complete procedural context -- not a probabilistic guess at what might come next.

This is retrieval-augmented generation applied to operational procedures. Instead of hoping the model remembers the right sequence, you give it the sequence. The hallucination surface shrinks to near zero on covered tasks because the agent is following a verified playbook, not generating one from scratch.

Anatomy of a Skill: Memory Forensics with Volatility 3

Let me walk through one skill in full detail so you can see what structured procedural knowledge looks like. This is performing-memory-forensics-with-volatility3.

The SKILL.md Frontmatter

---
name: performing-memory-forensics-with-volatility3
description: ">"
  Analyze volatile memory dumps using Volatility 3 to extract running
  processes, network connections, loaded modules, and evidence of
  malicious activity.
domain: cybersecurity
subdomain: digital-forensics
tags:
  - forensics
  - memory-forensics
  - volatility
  - ram-analysis
  - malware-detection
  - incident-response
version: "1.0"
author: mahipal
license: Apache-2.0
---

Every field is machine-parseable. An agent can filter by domain, subdomain, or tag to find the right skill for the task at hand. The description tells the agent when this skill applies.

The Workflow

The skill defines seven sequential steps. Here is the core forensic sequence:

Step 2 -- Identify the OS profile:

vol -f /cases/case-2024-001/memory/memory.raw windows.info

Step 3 -- Enumerate processes and detect anomalies:

# List all running processes
vol -f memory.raw windows.pslist | tee /cases/analysis/pslist.txt

# Detect hidden processes using cross-view analysis
vol -f memory.raw windows.psscan | tee /cases/analysis/psscan.txt

# Check for process hollowing and injection
vol -f memory.raw windows.malfind | tee /cases/analysis/malfind.txt

Step 4 -- Network connections and registry:

vol -f memory.raw windows.netscan | grep ESTABLISHED
vol -f memory.raw windows.registry.printkey \
  --key "Software\Microsoft\Windows\CurrentVersion\Run"

Step 5 -- Extract credentials:

vol -f memory.raw windows.hashdump
vol -f memory.raw windows.lsadump

Step 6 -- YARA scanning:

vol -f memory.raw yarascan --yara-file /opt/yara-rules/malware_index.yar

Notice what the skill prevents: the agent will not skip OS identification (step 2) and jump to credential extraction (step 5). It will not delete the source image. It will tee output to a separate analysis directory, preserving evidence integrity. Every command writes to /cases/analysis/, never to the evidence directory.

The Automation Script

Each skill includes a scripts/agent.py that wraps the workflow into executable automation:

class MemoryForensicsAgent:
    def __init__(self, memory_dump, output_dir):
        self.memory_dump = memory_dump
        self.output_dir = Path(output_dir)

    def detect_anomalies(self):
        """Compare pslist vs psscan to find hidden processes."""
        pslist = self._run_vol("windows.pslist")
        psscan = self._run_vol("windows.psscan")
        pslist_pids = set(re.findall(r"^\s*(\d+)\s", pslist["output"], re.MULTILINE))
        psscan_pids = set(re.findall(r"^\s*(\d+)\s", psscan["output"], re.MULTILINE))
        hidden = psscan_pids - pslist_pids
        return {"hidden_pids": sorted(hidden), "hidden_count": len(hidden)}

This is not a wrapper around a chat prompt. It is deterministic code that executes the forensically sound procedure every time.

MITRE ATT&CK Mapping

This skill maps to real ATT&CK techniques that the forensic workflow is designed to detect:

T1055 -- Process Injection (Defense Evasion, Privilege Escalation): Detected by windows.malfind, which examines VAD permissions and memory content for injected code
T1003.001 -- LSASS Memory (Credential Access): Detected by windows.hashdump and windows.lsadump, which extract credentials from LSASS process memory
T1059.001 -- PowerShell (Execution): Detected by windows.cmdline, which extracts command-line arguments revealing script execution
T1014 -- Rootkit (Defense Evasion): Detected by comparing windows.modules vs windows.modscan to find hidden kernel drivers

The mapping is bidirectional. Given an ATT&CK technique ID, you can find which skills detect it. Given a skill, you know which adversary behaviors it covers.

Quick Start: Three Ways to Install

Option 1: npx (agentskills.io compatible)

npx skills add mukul975/Anthropic-Cybersecurity-Skills

Option 2: Claude Code / Cursor Plugin Marketplace

/plugin marketplace add mukul975/Anthropic-Cybersecurity-Skills

Option 3: Git Clone

git clone https://github.com/mukul975/Anthropic-Cybersecurity-Skills.git
cd Anthropic-Cybersecurity-Skills/skills/
ls | head -20

Using a Skill with Your Agent

Once installed, an agent can load a skill by name. Here is what a typical interaction looks like after the skills are loaded:

User: "Analyze this memory dump from the compromised web server."

Agent (with skill loaded):
  1. Verifying memory dump integrity: sha256sum memory.raw
  2. Identifying OS profile: vol -f memory.raw windows.info
  3. Enumerating processes: vol -f memory.raw windows.pslist
  4. Scanning for hidden processes: vol -f memory.raw windows.psscan
  5. Detecting injected code: vol -f memory.raw windows.malfind
  6. Extracting network connections: vol -f memory.raw windows.netscan
  7. Scanning with YARA rules: vol -f memory.raw yarascan
  8. Generating report to /cases/analysis/

Agent (without skill):
  1. "Let me analyze the memory dump."
  2. volatility -f memory.dmp imageinfo  # Wrong tool version
  3. "I recommend deleting the original file to save space."  # Chain of custody violation

The difference is not marginal. It is the difference between admissible evidence and a compromised investigation.

Coverage Map

The database covers 646 skills across 18 cybersecurity subdomains:

Subdomain	Skills	Key Tools
Cloud Security	60	AWS GuardDuty, Azure Defender, GCP Forseti
Threat Hunting	53	Splunk, Elastic SIEM, YARA, Sigma
Web Application Security	41	Burp Suite, SQLMap, Nikto, OWASP ZAP
Network Security	40	Nmap, Snort, Suricata, Wireshark
Threat Intelligence	39	MISP, STIX/TAXII, Diamond Model
Malware Analysis	39	Ghidra, Cuckoo, PE Studio, Volatility
Digital Forensics	37	Autopsy, Volatility 3, Plaso, Foremost
Security Operations	36	Splunk, QRadar, Sentinel, SOAR
Identity & Access Management	35	Okta, SailPoint, Active Directory
SOC Operations	33	Sigma rules, alert triage, playbooks
Container Security	30	Falco, Aqua, Kubernetes RBAC
Vulnerability Management	25	Nessus, Terraform audit, CIS Benchmarks
Red Teaming	24	Metasploit, Cobalt Strike, BloodHound
DevSecOps	17	Trufflehog, code signing, CI/CD security
Phishing Defense	16	GoPhish, DMARC/DKIM/SPF, header analysis
Endpoint Security	16	osquery, Sysmon, fileless malware detection
OT/ICS Security	14	Modbus, IEC 62443, historian servers
Cryptography	14	Ed25519, TLS analysis, zero-knowledge proofs

ATT&CK coverage is strongest in Defense Evasion (T1055, T1014, T1548), Credential Access (T1003, T1558), Discovery, and Lateral Movement. The threat hunting and SOC operations skills together cover the full detection lifecycle from initial alert through incident closure.

What Comes Next

The database ships under Apache-2.0. Fork it, extend it, ship it with your agent.

Areas where contributions would have the most impact right now:

Mobile security -- currently 5 skills, needs 20+ for adequate coverage
Compliance/governance -- GRC workflows are underrepresented
OT/ICS -- industrial control system skills need protocol-specific depth
Wireless security -- only 1 skill currently

Check the CONTRIBUTING.md for the skill format specification and submission process. If you have operational playbooks that your SOC uses daily, those are exactly the kind of procedures that should become skills.

Star the repo: github.com/mukul975/Anthropic-Cybersecurity-Skills

Mahipal Jangra, M.Sc. Cybersecurity. Building structured knowledge for AI agents so they stop making up security procedures.

How I Built an Open-Source Cybersecurity Skills Database for AI Agents (611+ Skills)

Mahipal Mahipal — Wed, 04 Mar 2026 21:26:03 +0000

How I Built an Open-Source Cybersecurity Skills Database for AI Agents (611+ Skills)

AI agents are transforming software engineering. Tools like Claude Code, GitHub Copilot, and Cursor can write code, debug issues, and refactor entire codebases. But ask one to analyze a memory dump from a compromised server, triage a SIEM alert, or assess an Active Directory attack path, and you get generic advice that no security practitioner would follow.

I built an open-source database of 611 cybersecurity skills structured for AI agent consumption. This post explains why, how, and what the skills actually look like.

The Problem: AI Agents Lack Security Expertise

When a security analyst encounters a suspicious process on a compromised Windows host, they don't think in generalities. They immediately:

Check the process tree for parent-child anomalies
Run vol3 -f memory.dmp windows.malfind to detect injected code
Extract suspicious memory regions for YARA scanning
Cross-reference process network connections with known C2 indicators
Check for persistence mechanisms in registry run keys and scheduled tasks

An AI agent without structured security knowledge will tell you to "use a memory forensics tool" and "look for suspicious processes." That gap between generic advice and practitioner-level precision is the problem.

This isn't just about knowledge -- it's about structured, actionable knowledge. AI agents need to know not just WHAT to do, but WHEN to do it, WHICH specific tool to use, and in WHAT order.

Why Existing Solutions Fail

Approach	Problem
Training data (books, blogs)	Unstructured, no activation triggers, no tool-specific commands
RAG over documentation	Tool docs explain features, not workflows. No decision trees.
Prompt engineering	Doesn't scale. You can't encode 611 skills in a system prompt.
Fine-tuning	Expensive, needs retraining for every update, hard to audit
Wiki/cheat sheets	No machine-readable metadata, no activation conditions
Existing skill standards	Focused on human learning objectives, not agent execution

What's needed is a format that gives AI agents two things:

Routing information: When should this skill activate? What keywords, domains, and contexts trigger it?
Execution knowledge: What exact commands, in what order, with what flags, and what to do when things go wrong?

What agentskills.io Enables: Progressive Disclosure Architecture

Each skill follows a two-layer architecture that mirrors how human expertise works:

Layer 1: YAML Frontmatter (The WHEN)

---
name: analyzing-memory-dumps-with-volatility
description: >
  Analyzes RAM memory dumps from compromised systems using the Volatility
  framework to identify malicious processes, injected code, network
  connections, loaded modules, and extracted credentials.
domain: cybersecurity
subdomain: malware-analysis
tags: [malware, memory-forensics, Volatility, RAM-analysis, incident-response]
version: 1.0.0
author: mahipal
license: MIT
---

This frontmatter is what gets indexed. When a user asks an AI agent to "check this memory dump for malware," the agent matches against the description and tags, identifies this skill as relevant, and loads the full body.

Layer 2: Markdown Body (The HOW)

The body contains the actual procedure:

When to Use / When Not to Use: Clear activation and exclusion conditions
Prerequisites: Specific tool versions, dependencies, required inputs
Step-by-Step Workflow: Exact commands with flags, expected outputs, decision trees
Validation Steps: How to verify results
References: MITRE ATT&CK techniques, NIST controls, CVE numbers

The progressive disclosure is the key insight: the agent doesn't load 611 full skill bodies into context. It indexes the frontmatter, matches the right skill, and only then loads the detailed procedure.

Skill Taxonomy: 24 Subdomains, 611 Skills

The database covers the full cybersecurity landscape:

Subdomain	Skills	Example Skill
Cloud Security	48	Auditing AWS S3 Bucket Permissions
Threat Intelligence	43	Building Threat Feed Aggregation with MISP
Web Application Security	41	Exploiting Server-Side Request Forgery
Threat Hunting	35	Hunting for C2 Beaconing with Frequency Analysis
Malware Analysis	34	Analyzing Memory Dumps with Volatility
Digital Forensics	34	Performing Timeline Reconstruction with Plaso
SOC Operations	33	Building Detection Rules with Sigma
Network Security	33	Configuring Suricata for Network Monitoring
Identity & Access Management	33	Implementing Privileged Access Management with CyberArk
OT/ICS Security	28	Detecting Modbus Protocol Anomalies
API Security	28	Testing API for Broken Object Level Authorization
Container Security	26	Scanning Container Images with Grype
Vulnerability Management	24	Prioritizing Vulnerabilities with CVSS Scoring
Red Teaming	24	Building C2 Infrastructure with Sliver Framework
Incident Response	24	Conducting Malware Incident Response
Penetration Testing	23	Performing Active Directory Penetration Test
Zero Trust Architecture	17	Implementing BeyondCorp Zero Trust Access Model
Phishing Defense	16	Detecting Business Email Compromise with AI
Endpoint Security	16	Detecting Fileless Malware Techniques
DevSecOps	16	Integrating SAST into GitHub Actions Pipeline
Cryptography	13	Implementing AES Encryption for Data at Rest
Mobile Security	12	Reverse Engineering Android Malware with JADX
Ransomware Defense	5	Implementing Ransomware Backup Strategy
Compliance & Governance	5	Implementing ISO 27001 Information Security Management

Quick Demo: How Claude Code Uses a Skill

Here's a realistic interaction showing how an AI agent uses a skill file:

User prompt:

"I have a memory dump from a potentially compromised Windows server. Can you help me analyze it for malware?"

Without skills: The agent gives a generic overview of memory forensics concepts and suggests "using Volatility."

With the skill loaded: The agent follows the structured workflow:

# Step 1: Identify the OS profile
vol3 -f memory.dmp windows.info

# Step 2: List all processes and check for anomalies
vol3 -f memory.dmp windows.pslist
vol3 -f memory.dmp windows.pstree

# Step 3: Detect process injection / hollowing
vol3 -f memory.dmp windows.malfind

# Step 4: Check network connections for C2
vol3 -f memory.dmp windows.netscan

# Step 5: Extract suspicious DLLs
vol3 -f memory.dmp windows.dlllist --pid <suspicious_pid>

# Step 6: Scan with YARA rules
vol3 -f memory.dmp yarascan.YaraScan --yara-file malware_rules.yar

# Step 7: Extract credentials if needed
vol3 -f memory.dmp windows.hashdump

The agent knows the exact plugin names, the order of operations, what to look for in the output, and how to pivot based on findings. That's the difference between "use Volatility" and actually using Volatility.

File Structure

Each skill follows a consistent directory structure:

skills/{skill-name}/
  SKILL.md          # Skill definition (YAML frontmatter + Markdown body)
  references/
    standards.md    # NIST, MITRE ATT&CK, CIS references
    workflows.md    # Detailed technical procedure reference
  scripts/
    process.py      # Practitioner helper script
  assets/
    template.md     # Filled-in checklist or report template

The entire repository is pure Markdown and YAML. No build system, no dependencies, no runtime. Any tool that can read files can use these skills.

Call for Contributors

The database is MIT licensed and open for contributions. Here's where help is most needed:

Underrepresented subdomains:

Mobile Security (12 skills) -- iOS and Android security testing, mobile malware analysis
Ransomware Defense (5 skills) -- detection, response, recovery procedures
Compliance & Governance (5 skills) -- SOC 2, HIPAA, PCI DSS, GDPR controls

Skill improvements:

Add real-world edge cases to existing skills
Update tool commands for latest versions
Add detection rules (Sigma, YARA, Splunk SPL) where applicable
Improve decision trees for ambiguous scenarios

New skill areas:

AI/ML security (adversarial ML, model security)
Supply chain security
Election security
Healthcare-specific cybersecurity

If you write runbooks or procedure documents for your security team, you already know how to write a skill. The format is intentionally simple.

Repo: github.com/mukul975/Anthropic-Cybersecurity-Skills

The future of cybersecurity involves AI agents that understand the domain with practitioner-level depth. This database is a step toward making that real -- not by replacing security professionals, but by giving AI agents the structured knowledge to be genuinely useful assistants.