Srijan Kumar

Posted on Jun 11

Protocol 404: Incident Response When a Breach Hits

#ai #security

The clock starts before you know it's running

By the time an alert fires — or worse, a user reports something odd — an attacker may have had dwell time measured in weeks. The median dwell time before detection still sits in the double digits for many enterprise environments. That context matters because it reframes everything: your incident response isn't just a race to eject an intruder, it's a forensic operation that begins in the middle of a crime scene.

This article outlines a disciplined IR sequence oriented toward practitioners — SOC analysts, incident commanders, and security engineers who need procedural clarity under pressure, not reassurance.

Phase 1: Triage — Don't Call It Before You Confirm It

Alert fatigue is real. Before escalating to a P1 incident, validate that unauthorized access or data exposure actually occurred. SIEM correlation rules misfire. EDR detections hit on legitimate admin tooling. A false declaration wastes IR capacity and poisons the post-incident timeline.

Verification checklist:

Cross-reference endpoint telemetry (EDR/XDR) against SIEM alerts to check for independent corroboration
Pull relevant auth logs — look for anomalous geolocations, impossible travel, or service account behavior outside baseline
Check CASB or DLP telemetry for unusual data movement or exfil staging patterns
Confirm whether affected assets hold regulated data (PII, PHI, PCI) — this determines downstream notification obligations Declare an incident when two or more independent signals corroborate unauthorized activity, or when a single high-fidelity indicator (e.g., confirmed C2 beacon, verified credential exfiltration) is sufficient on its own.

Phase 2: Contain — Surgical, Not Scorched Earth

Containment strategy depends on whether your priority is disruption minimization or forensic preservation. In most environments, you can't fully maximize both — make that tradeoff explicit before you act.

Tactical containment options, roughly ordered by invasiveness:

Network-level: Null-route suspicious IPs or C2 infrastructure at the perimeter or via EDR network isolation. Segment affected VLANs. Block outbound traffic for compromised host subsets if full isolation isn't feasible.

Identity plane: Immediately revoke compromised credentials, OAuth tokens, and API keys. Rotate service account secrets. Enforce step-up MFA on high-privilege accounts even if they haven't been confirmed as targets. If you're on an IdP with session management, terminate active sessions.

Host-level: Isolate, don't wipe. Disk images and memory dumps are your forensic artifacts — a reimaged endpoint is a destroyed evidence chain. Use your EDR's network isolation mode rather than pulling the cable, which preserves logging continuity.

Cloud/SaaS environments: Revoke cloud credentials (IAM roles, access keys), disable affected service principals, and snapshot affected instances before containment actions modify state.

Phase 3: Preserve Evidence — Chain of Custody Matters

If this incident results in regulatory scrutiny, litigation, or criminal referral, evidence integrity will be tested. Don't treat this as bureaucratic overhead.

Preserve:

Full system and application logs (ensure log forwarding to your SIEM was intact — log deletion is a common attacker TTP)
Raw disk images of affected hosts (forensic copy, not logical backup)
Memory dumps if malware is suspected to be fileless or living-in-memory
Network packet captures if available (full PCAP from a TAP/SPAN is ideal; NetFlow is acceptable if PCAP isn't available)
Cloud audit trails: AWS CloudTrail, Azure Activity Log, GCP Audit Log — preserve these to immutable storage immediately, as retention windows are finite
Malware samples with provenance (hash, filepath, timestamps) Establish chain of custody documentation for anything that may become forensic evidence. If you're engaging outside counsel or a forensics firm, do it now — attorney-client privilege may apply to the investigation.

Phase 4: Scope the Blast Radius

This is the analytically demanding phase. You're building a complete picture of what was compromised, how access was maintained, and what data left the environment.

Key questions to answer:

Initial access vector — phishing, credential stuffing, exploitation of a public-facing application, supply chain compromise, insider threat? Your containment in Phase 2 is incomplete if you haven't closed the initial vector.

Lateral movement — where did the attacker pivot from the initial foothold? Review Kerberoasting indicators, Pass-the-Hash activity, LSASS access, and remote service creation logs (Event ID 7045 on Windows). Map the blast radius to specific systems.

Persistence mechanisms — scheduled tasks, registry run keys, WMI subscriptions, webshells, OAuth app grants with excessive permissions, rogue MFA devices, or cloud backdoor accounts. Persistence survives a password reset if you haven't found it.

Data access and exfiltration — correlate DLP alerts, DNS query volumes, and cloud egress metrics. Look for staging activity: large file archives created in unusual directories, access to backup repositories, email forwarding rules added to executive mailboxes.

Dwell time — review log sources for earliest indicators of compromise. The first alert is rarely the first foothold.

Phase 5: Notification — Legal Obligations, Not Optional Communication

Notification timelines are governed by regulation, not by when you feel ready. Get legal and compliance in the room before the investigation is complete, not after.

Key frameworks to know:

GDPR: 72-hour notification window to supervisory authority from awareness of a breach involving EU residents
HIPAA Breach Notification Rule: 60-day notification to HHS from discovery; individual notification required
SEC Cybersecurity Disclosure Rules (2023): Material cybersecurity incidents must be disclosed on Form 8-K within four business days of determining materiality
State breach notification laws: Vary significantly; some (e.g., California, New York SHIELD Act) require notification within 30–72 hours depending on the data type involved Internal stakeholders to engage immediately: CISO, CTO, General Counsel, communications/PR, and executive leadership if the incident is material. Don't let the IR team operate as an information silo — missing a notification deadline because legal wasn't looped in is an entirely avoidable secondary incident.

Phase 6: Eradication — Remove Everything, Not Just What You Found

Containment stops active harm. Eradication removes attacker presence from the environment. These are distinct steps that require distinct verification.

Eradication checklist:

Patch or remediate the exploited vulnerability — CVE number, affected versions, and patch availability should be documented
Remove all identified malware, webshells, and unauthorized binaries; validate with your EDR that no remnants exist
Destroy all identified persistence mechanisms (scheduled tasks, registry keys, added accounts, OAuth grants, forwarding rules)
Rotate all credentials that had any exposure — not just confirmed compromised ones; apply the principle of minimum assumed trust
Audit and tighten IAM policies, firewall rules, and trust relationships that the attacker leveraged
Review and revoke any anomalous external access grants (third-party integrations, vendor accounts) Verify eradication before moving to recovery. A common failure mode is declaring eradication complete while an undetected webshell or secondary backdoor account remains active.

Phase 7: Recovery — Validate Before You Restore Service

Restoring from backup to a still-compromised environment simply provides the attacker with a clean machine. Sequence matters.

Recovery process:

Restore from a verified clean backup predating the initial access event (your earlier dwell time analysis should establish the safe restoration point)
Validate system integrity before reconnecting to the production network — check file hashes, review startup items, and run endpoint scans
Reintroduce systems to a monitored network segment first; treat them as potentially hostile until behavioral baselines re-establish
Implement enhanced monitoring during the recovery window — elevated log verbosity, additional detection rules tuned to the attacker's observed TTPs
Conduct parallel testing for business-critical applications before full traffic restoration

Phase 8: Post-Incident Review — The Intelligence Product

A well-run PIR produces actionable security improvements, not a report that gets filed and forgotten. Structure it around three outputs:

Technical findings: Root cause analysis, full attack timeline, attacker TTPs mapped to MITRE ATT&CK, and gaps in detection coverage that allowed dwell time to accumulate.

Process findings: Where did the IR playbook hold up and where did it fail? Were escalation paths clear? Were notification deadlines nearly missed? Were forensic artifacts destroyed during containment?

Remediation roadmap: Prioritized list of security improvements with owners and timelines — detection gaps, architectural weaknesses, policy updates, training requirements. This is the artifact that justifies the incident cost to leadership.

Run the PIR within two weeks of incident closure while operational memory is intact. Include everyone who touched the incident — SOC, IR, legal, communications, and affected system owners.

For Individuals: Fast-Moving Checklist

If you're notified that your personal data was involved in a third-party breach:

Change passwords on the affected service immediately; also rotate on any accounts where you reused that credential
Enable TOTP-based or hardware-key MFA where available — SMS-based MFA is better than nothing but vulnerable to SIM-swapping
Place a credit freeze at all three major bureaus (Equifax, Experian, TransUnion) — not just a fraud alert, which is weaker
Review recent auth activity on sensitive accounts (email, banking, identity providers)
Monitor for spear-phishing attempts using data harvested from the breach — attackers frequently use breach data to craft targeted follow-on attacks

Closing: Preparation Is the Leverage Point

Incident response capability is not a function of how skilled your team is in the moment — it's a function of how much work was done before the moment arrived. Tested playbooks, preserved evidence pipelines, pre-negotiated retainer agreements with forensics firms, documented escalation trees, and regular tabletop exercises are what determine whether a breach is a manageable operational event or a cascading failure.

The NIST Cybersecurity Framework and CISA's Incident Response Recommendations are solid baselines. Map your own procedures to them, test against realistic scenarios, and update after every real incident.

Data breaches are not anomalies in modern enterprise environments. Mature security programs treat them as expected events to be contained efficiently — not catastrophes to be avoided through optimism.

DEV Community