Slavyan Donev

Posted on May 11 • Edited on May 26

How I Built an AI-Powered Alert Triage System — Now with MCP Architecture

#agents #ai #automation #cybersecurity

How I Built an AI-Powered Alert Triage System — Now with MCP Architecture

agents #ai #automation #cybersecurity

The Problem

Every SOC analyst and MSP team I've talked to has the same complaint:

"We get 200 alerts a day. Maybe 10 are real. But someone has to check all 200."

That's alert fatigue. And it's not a small problem — the average analyst spends 3-5 hours daily on manual triage. Most of that time is wasted on false positives.

I decided to build something to fix this. Two weeks later, I had a working MVP. Then I went a step further and refactored it with Model Context Protocol (MCP) — Anthropic's open standard for connecting AI agents to external tools. Here's exactly how I built it.

The Architecture (v2 — MCP Edition)

The original system had the agent calling tools directly. The new architecture introduces an MCP server as a modular tool layer:

Alert Input (Defender/SentinelOne/JSON)
        ↓
Alert Normalizer
        ↓
LangGraph Triage Agent
  ├── Enrich Node  ──► MCP Client
  ├── Analyze Node        ↓
  └── Human-in-the-Loop  MCP Server
        ↓             ├── virustotal_check()
Output (Risk Score    ├── mitre_lookup()
+ Slack + Audit Log)  └── slack_notify()

Why MCP? Instead of hardcoding tool calls inside the agent, MCP separates them into a dedicated server. The agent doesn't care how VirusTotal works — it just calls a tool by name and gets a result. This makes the system modular, testable, and easy to extend.

Step 1: Alert Normalizer

The first challenge: every security tool outputs alerts in a different format. Defender looks different from SentinelOne, which looks different from a generic SIEM.

I built a normalizer that takes any alert format and converts it to a single internal structure:

@dataclass
class NormalizedAlert:
    alert_id: str
    source: str          # defender / sentinelone / generic
    severity: str        # Low / Medium / High / Critical
    title: str
    timestamp: str
    mitre_technique: Optional[str]
    hostname: Optional[str]
    username: Optional[str]
    source_ip: Optional[str]
    raw: dict            # Original alert for audit

This means the rest of the system doesn't care where the alert came from. It always works with the same format.

Step 2: LangGraph State Machine

I used LangGraph to build the agent as a state machine. Each step in the triage process is a separate node:

class TriageState(TypedDict):
    alert: dict
    enrichment: Optional[dict]
    risk_score: Optional[int]
    risk_level: Optional[str]
    explanation: Optional[str]
    recommendation: Optional[str]
    needs_human: Optional[bool]
    error: Optional[str]

The graph flows like this:

enrich → analyze → [human_review if score >= 70] → format_output

Why LangGraph instead of a simple chain? Because real triage isn't linear. You need conditional routing — a Critical alert should follow a different path than a Low one. LangGraph makes this explicit and debuggable.

Step 3: The MCP Server (New in v2)

This is the biggest architectural change. All three enrichment tools are now exposed via a FastMCP server:

# mcp-server/server.py
from mcp.server.fastmcp import FastMCP
from tools.virustotal import check_ip
from tools.mitre import get_technique_summary
from tools.slack_notifier import send_alert_notification

mcp = FastMCP("Alert Triage MCP Server")

@mcp.tool()
def virustotal_check(ip: str) -> str:
    """Проверява IP адрес в VirusTotal и връща reputation данни."""
    result = check_ip(ip)
    return f"IP: {result.ip} | Malicious: {result.malicious_votes} | Known bad: {result.is_known_bad}"

@mcp.tool()
def mitre_lookup(technique_id: str) -> str:
    """Търси MITRE ATT&CK техника по ID (напр. T1059.001)."""
    return get_technique_summary(technique_id)

@mcp.tool()
def slack_notify(alert_id: str, risk_score: int, ...) -> str:
    """Праща Slack нотификация за критичен алерт."""
    success = send_alert_notification(triage_result)
    return "Sent" if success else "Failed"

if __name__ == "__main__":
    mcp.run()

The agent connects to this server via an MCP client wrapper:

# agents/mcp_tools.py
async def _call_tool(tool_name: str, args: dict) -> str:
    server_params = StdioServerParameters(
        command="python", args=["mcp-server/server.py"]
    )
    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            result = await session.call_tool(tool_name, args)
            return result.content[0].text

def virustotal_check(ip: str) -> str:
    return asyncio.run(_call_tool("virustotal_check", {"ip": ip}))

def mitre_lookup(technique_id: str) -> str:
    return asyncio.run(_call_tool("mitre_lookup", {"technique_id": technique_id}))

The result: The LangGraph agent no longer imports tools directly. It goes through MCP — clean separation of concerns.

Step 4: Enrichment Tools

Before the LLM sees the alert, two tools run automatically via MCP:

VirusTotal IP Lookup

Why this matters: An alert marked "Low severity" came in for SSH login attempts. The source IP had 4 malicious votes on VirusTotal. The system automatically escalated it to High. Without enrichment, that alert would have been ignored.

MITRE ATT&CK Context

Instead of hitting an API for every request, I built a local database of the most common techniques:

MITRE_DB = {
    "T1059.001": MitreTechnique(
        "T1059.001", "PowerShell", "Execution",
        "Adversaries use PowerShell to execute commands, often with encoded payloads...",
        "high"
    ),
    "T1486": MitreTechnique(
        "T1486", "Data Encrypted for Impact (Ransomware)", "Impact",
        "Adversary encrypts data to disrupt availability...",
        "high"
    ),
}

This context goes directly into the LLM prompt — giving the model real knowledge about what each technique means and how dangerous it is.

Step 5: The LLM Analysis

The Triage Agent sends the enriched alert to Groq (Llama 3.3 70B) with a structured prompt that returns JSON:

{
  "risk_score": 95,
  "risk_level": "Critical",
  "explanation": "The source IP is flagged as MALICIOUS by 17 VirusTotal engines...",
  "recommendation": "Block IP immediately and isolate the device.",
  "needs_human": true
}

Key design decision: temperature 0.1. Security analysis needs consistency, not creativity.

Step 6: Human-in-the-Loop

For any alert with risk score >= 70, the MCP slack_notify tool fires a formatted Slack notification. AI assists — humans decide on critical actions.

Step 7: REST API with FastAPI

@router.post("/triage", response_model=TriageResponse)
def triage_alert(alert_request: AlertRequest):
    normalized = normalize_alert(alert_request.model_dump(exclude_none=True))
    result = run_triage(normalized)
    return TriageResponse(...)

Microsoft Defender can now send a webhook to POST /triage and get back a full analysis in ~3 seconds.

Real Results

Running 6 sample alerts through the system:

A "Low severity" SSH alert was escalated to High because VirusTotal flagged the source IP (4 malicious votes)
A data exfiltration alert scored 95/100 Critical — destination IP had 17 VirusTotal votes, known Tor exit node used for C2

Tech Stack

Component	Technology
Agent framework	LangGraph
LLM	Groq — Llama 3.3 70B (free tier)
Tool layer	MCP — Model Context Protocol
Threat intel	VirusTotal API (free tier)
ATT&CK mapping	Local MITRE database
Notifications	Slack Webhooks
API	FastAPI

Total cost for MVP: $0

Key Lessons

MCP separates tools from agents — your agent becomes a thin client, tools become reusable services
Enrich before you analyze — LLM without real threat intel is just guessing
LangGraph over simple chains — conditional routing requires a proper state machine
Human-in-the-Loop is not optional — never automate critical security decisions
Start with the data — understanding real alerts before coding saved hours

Currently looking for MSP and SOC teams for a free 2-week pilot.

If your team deals with alert fatigue — comment below or DM me.

GitHub: [alert-triage-mvp] | Built with LangGraph + MCP + Groq

DEV Community

How I Built an AI-Powered Alert Triage System — Now with MCP Architecture

How I Built an AI-Powered Alert Triage System — Now with MCP Architecture

agents #ai #automation #cybersecurity

The Problem

The Architecture (v2 — MCP Edition)

Step 1: Alert Normalizer

Step 2: LangGraph State Machine

Step 3: The MCP Server (New in v2)

Step 4: Enrichment Tools

VirusTotal IP Lookup

MITRE ATT&CK Context

Step 5: The LLM Analysis

Step 6: Human-in-the-Loop

Step 7: REST API with FastAPI

Real Results

Tech Stack

Key Lessons

Top comments (0)