ohmygod

Posted on Mar 22

The First 60 Minutes After a DeFi Exploit: A Battle-Tested Incident Response Playbook for 2026

#security #solidity #web3 #defi

The Resolv USR exploit on March 22, 2026 demonstrated what a good incident response looks like: the team detected the unauthorized minting, paused all protocol functions, and began coordinating recovery — all within minutes of the attack. The attacker minted 80 million unbacked USR from a ~$200K deposit, extracted roughly $25M, and crashed USR to $0.25. But the rapid pause prevented what could have been a nine-figure catastrophe.

Most protocols aren't this prepared. When your monitoring dashboard lights up red, the difference between a $25M loss and a $250M loss comes down to what happens in the first 60 minutes.

Here's the playbook, distilled from analyzing every major DeFi incident in 2025-2026.

Minute 0-5: Detection and Triage

Most protocols discover they've been exploited through one of three channels: automated monitoring alerts, community reports on social media, or — worst case — watching their TVL evaporate on a dashboard.

The goal in the first five minutes is confirmation, not investigation. You need to answer one question: Is this an active exploit requiring emergency action?

# Real-time anomaly detection — trigger on ANY of these
class ExploitDetector:
    THRESHOLDS = {
        "tvl_drop_pct": 5.0,          # >5% TVL drop in single block
        "mint_spike_ratio": 10.0,     # Minting >10x normal volume
        "unusual_flow_usd": 500_000,  # >$500K outflow in 5 minutes
        "new_contract_interaction": True,  # Unverified contract calling core functions
        "price_deviation_pct": 15.0,  # Token price deviates >15% from oracle
    }

    def triage(self, alert) -> str:
        """Returns 'CRITICAL' | 'WARNING' | 'MONITOR'"""
        critical_signals = 0
        if alert.tvl_change_pct > self.THRESHOLDS["tvl_drop_pct"]:
            critical_signals += 1
        if alert.mint_volume / alert.avg_mint_volume > self.THRESHOLDS["mint_spike_ratio"]:
            critical_signals += 1
        if alert.outflow_5min_usd > self.THRESHOLDS["unusual_flow_usd"]:
            critical_signals += 1

        if critical_signals >= 2:
            return "CRITICAL"  # → Activate IR immediately
        elif critical_signals == 1:
            return "WARNING"   # → Manual review within 5 min
        return "MONITOR"

Resolv's detection worked because they caught the anomalous minting pattern — 50 million USR from a $100K deposit is impossible under normal conditions. The lesson: your detection rules should encode business logic invariants, not just raw thresholds.

Minute 5-15: Emergency Pause

If triage confirms an active exploit, the single most important action is pausing the protocol. Every second of delay is money leaving.

The Pause Architecture That Actually Works

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.20;

import "@openzeppelin/contracts/utils/Pausable.sol";

contract EmergencyPausable is Pausable {
    // Multiple authorized pausers — don't create a single point of failure
    mapping(address => bool) public guardians;
    uint256 public guardianCount;
    uint256 public constant MIN_GUARDIANS = 3;

    // Granular pause — don't kill everything when only minting is compromised
    mapping(bytes4 => bool) public functionPaused;

    event EmergencyPause(address indexed guardian, bytes4 indexed selector, string reason);
    event GuardianAction(address indexed guardian, string action);

    modifier whenFunctionNotPaused(bytes4 selector) {
        require(!functionPaused[selector] && !paused(), "Function paused");
        _;
    }

    /// @notice Any guardian can pause specific functions unilaterally
    /// @dev No multisig required for pause — speed > consensus during incidents
    function emergencyPauseFunction(bytes4 selector, string calldata reason) external {
        require(guardians[msg.sender], "Not guardian");
        functionPaused[selector] = true;
        emit EmergencyPause(msg.sender, selector, reason);
    }

    /// @notice Nuclear option — pause everything
    function emergencyPauseAll(string calldata reason) external {
        require(guardians[msg.sender], "Not guardian");
        _pause();
        emit GuardianAction(msg.sender, reason);
    }

    /// @notice Unpause requires multisig — resuming is more dangerous than pausing
    function unpauseFunction(bytes4 selector) external onlyMultisig {
        functionPaused[selector] = false;
    }
}

Critical design principle: Pausing should be fast and unilateral (any single guardian can pause), while unpausing should be slow and consensus-based (requires multisig). The asymmetry is intentional — false positive pauses cost inconvenience, false negative pauses cost millions.

Solana Emergency Pause Pattern

use anchor_lang::prelude::*;

#[account]
pub struct ProtocolState {
    pub authority: Pubkey,
    pub guardians: Vec<Pubkey>,  // Multiple authorized pausers
    pub paused: bool,
    pub pause_reason: String,
    pub paused_at: i64,
    pub paused_by: Pubkey,
}

pub fn emergency_pause(ctx: Context<EmergencyPause>, reason: String) -> Result<()> {
    let state = &mut ctx.accounts.protocol_state;

    // Any guardian can pause — no multisig delay
    require!(
        state.guardians.contains(&ctx.accounts.guardian.key()),
        ErrorCode::NotGuardian
    );

    state.paused = true;
    state.pause_reason = reason;
    state.paused_at = Clock::get()?.unix_timestamp;
    state.paused_by = ctx.accounts.guardian.key();

    msg!("EMERGENCY PAUSE by {} at {}: {}",
        ctx.accounts.guardian.key(),
        state.paused_at,
        state.pause_reason
    );

    Ok(())
}

Minute 15-30: Contain and Communicate

Once the protocol is paused, you're in containment mode. Three things happen simultaneously:

1. Asset Preservation

Identify all remaining at-risk assets and, if possible, execute a whitehat rescue:

# Whitehat rescue coordination script
import json
from web3 import Web3

class WhitehatRescue:
    def __init__(self, w3: Web3, protocol_abi: dict):
        self.w3 = w3
        self.protocol = w3.eth.contract(abi=protocol_abi)

    def assess_remaining_risk(self, pool_addresses: list) -> dict:
        """Calculate remaining at-risk TVL across all pools"""
        risk_report = {}
        for pool in pool_addresses:
            balance = self.get_pool_balance(pool)
            is_drainable = self.check_vulnerability_reachable(pool)
            risk_report[pool] = {
                "balance_usd": balance,
                "drainable": is_drainable,
                "priority": "CRITICAL" if is_drainable and balance > 100_000 else "MONITOR"
            }
        return risk_report

    def coordinate_rescue(self, risk_report: dict):
        """Prioritize rescue operations by risk level"""
        critical_pools = [
            (addr, info) for addr, info in risk_report.items()
            if info["priority"] == "CRITICAL"
        ]
        # Sort by balance descending — rescue the biggest pools first
        critical_pools.sort(key=lambda x: x[1]["balance_usd"], reverse=True)

        for addr, info in critical_pools:
            print(f"RESCUE NEEDED: {addr} — ${info['balance_usd']:,.0f} at risk")

2. Communication Protocol

Silence during an exploit erodes trust faster than the exploit itself. Your communication timeline:

Time	Channel	Message
+10 min	Twitter/X	"We are aware of an incident. Protocol paused. Funds in active pools are secured. Investigation underway."
+30 min	Discord/Telegram	Detailed initial report: what happened, what's paused, what users should/shouldn't do
+2 hours	Blog post	Technical preliminary: attack vector identified, scope of impact, recovery plan
+24 hours	Full post-mortem	Complete root cause analysis, user impact assessment, compensation plan

3. Evidence Preservation

Before touching anything else, preserve the crime scene:

#!/bin/bash
# incident-evidence.sh — Run immediately after pausing

INCIDENT_DIR="incident-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$INCIDENT_DIR"

# 1. Capture attacker transactions
echo "Capturing attacker transactions..."
cast tx $ATTACKER_TX --json > "$INCIDENT_DIR/attacker_tx.json"

# 2. Snapshot contract state at exploit block
echo "Snapshotting contract state..."
cast storage $CONTRACT_ADDR --block $EXPLOIT_BLOCK --json > "$INCIDENT_DIR/storage_snapshot.json"

# 3. Trace the attack transaction
echo "Tracing attack..."
cast run $ATTACKER_TX --trace --json > "$INCIDENT_DIR/attack_trace.json"

# 4. Map fund flow
echo "Tracking fund destinations..."
cast logs --from-block $EXPLOIT_BLOCK --to-block latest \
  --address $CONTRACT_ADDR \
  --json > "$INCIDENT_DIR/post_exploit_logs.json"

# 5. Archive everything with timestamp proof
tar czf "$INCIDENT_DIR.tar.gz" "$INCIDENT_DIR"
sha256sum "$INCIDENT_DIR.tar.gz" > "$INCIDENT_DIR.sha256"

echo "Evidence preserved in $INCIDENT_DIR"
echo "SHA256: $(cat $INCIDENT_DIR.sha256)"

Minute 30-60: Root Cause and Recovery Planning

With the protocol paused and evidence preserved, the investigation begins. This is where most teams waste time — they try to understand the entire exploit before acting. You don't need to. You need to understand enough to:

Confirm the pause covers the vulnerability
Identify if other attack vectors exist
Plan the recovery sequence

The Root Cause Analysis Framework

Every DeFi exploit in 2026 falls into one of these categories:

1. VALIDATION FAILURE — Missing/incorrect input checks. Example: Resolv USR — no amount validation between request and completion.

2. AUTHORIZATION BYPASS — Privilege escalation, missing checks. Example: Gondi — missing ownership verification in bundler contract.

3. ECONOMIC MANIPULATION — Oracle, price, liquidity attacks. Example: Venus THE — illiquid collateral price inflation.

4. INFRASTRUCTURE COMPROMISE — Key theft, supply chain, DNS hijack. Example: Step Finance — executive device compromise, key extraction.

5. LOGIC/STATE INCONSISTENCY — Race conditions, reentrancy, async gaps. Example: Solv Protocol — ERC-3525 cross-standard reentrancy.

Categorizing the exploit immediately narrows your recovery approach:

Validation/Authorization failures: Fix the check, deploy patched contract, unpause
Economic manipulation: Adjust parameters (caps, oracle sources), may need governance vote
Infrastructure compromise: Rotate ALL keys and credentials, full security audit before resuming
Logic/state inconsistency: May require state migration or snapshot-and-redeploy

The Pre-Incident Checklist: Build This Before You Need It

The worst time to write your incident response plan is during an incident. Every protocol should have these before launch:

1. Guardian Network

Minimum 3 authorized pausers across different time zones
Each guardian has a hardware wallet dedicated to pause transactions
24/7 coverage — at least one guardian reachable at all times
Monthly pause drills (yes, actually practice pausing on testnet)

2. Monitoring Stack

On-chain monitoring (Forta, Tenderly, or custom)
TVL tracking with anomaly detection
Social media monitoring (Twitter mentions, Discord reports)
Alerting pipeline: PagerDuty/OpsGenie → phone calls, not just Slack messages

3. Communication Templates

Pre-written templates for:

Initial incident acknowledgment
Protocol pause announcement
User action guidance (what to do / not do)
Post-mortem structure
Compensation plan framework

4. Recovery Procedures

Documented upgrade path (proxy admin, timelock bypass for emergencies)
Pre-audited emergency contracts (pause, rescue, migration)
Relationships with security firms for rapid response (Seal 911, SEAL Team)
Legal counsel on retainer for law enforcement coordination

The 16-Point DeFi Incident Response Audit Checklist

Detection (4 points)

[ ] Business logic invariant monitoring (not just threshold alerts)
[ ] Multi-channel detection (on-chain + social + community)
[ ] Alert escalation to phone calls for critical severity
[ ] False positive rate tracked and tuned monthly

Pause Capability (4 points)

[ ] Any single guardian can pause unilaterally
[ ] Granular function-level pause (not just global kill switch)
[ ] Unpause requires multisig consensus
[ ] Pause drills conducted quarterly on testnet

Communication (4 points)

[ ] Pre-written incident templates ready to deploy
[ ] Designated spokesperson identified
[ ] Multi-channel communication plan (Twitter, Discord, blog)
[ ] Legal review process for public statements

Recovery (4 points)

[ ] Emergency upgrade path documented and tested
[ ] Whitehat rescue procedures ready
[ ] Evidence preservation scripts prepared
[ ] Post-mortem template and timeline committed to

Conclusion

The Resolv USR exploit is a perfect case study because it shows both sides: the vulnerability was severe (missing amount validation in an async minting flow), but the response was swift (protocol paused within minutes, preventing further drainage).

DeFi protocols will continue to be exploited. The question isn't if but when. The protocols that survive and maintain user trust are the ones that treat incident response as a first-class engineering discipline — not an afterthought bolted on after the first hack.

Build your IR playbook today. Practice it monthly. When the alert fires at 3 AM, muscle memory beats improvisation every time.

This article is part of the DeFi Security Research series. Follow for weekly analysis of exploits, audit tools, and security best practices across EVM and Solana ecosystems.

DEV Community