ohmygod

Posted on Mar 24

How AI-Assisted Whitehats Found Three Lido Vulnerabilities in Three Weeks — Build Your Own Bug Hunting Pipeline

#security #ai #defi #blockchain

In the first three weeks of March 2026, Lido's Immunefi bug bounty program received something unusual: three separate vulnerability reports, all low-to-moderate severity, all discovered by whitehats using AI-assisted tooling. None were exploited. No funds were at risk. Lido batched them into a single disclosure and scheduled fixes via an upcoming Aragon omnibus vote.

The interesting part isn't the bugs themselves — it's how they were found.

Lido's own assessment pointed to a shift: whitehats are increasingly leveraging AI tools to explore "complex interaction patterns, edge cases, and cross-component authorization gaps that might be deprioritized in manual reviews." In other words, the bugs that slip through traditional audits are exactly the ones AI is getting good at finding.

This article walks through the practical AI-augmented bug hunting pipeline that's producing these results in 2026 — and how you can build one yourself.

Why AI Finds What Humans Miss

Traditional smart contract auditing has a well-known blind spot: cross-component interactions. A single contract might be perfectly secure in isolation, but when it interacts with governance modules, bridging logic, and staking derivatives simultaneously, emergent vulnerabilities appear in the seams.

Human auditors are excellent at deep-diving into individual contracts. But they fatigue. They have implicit assumptions about "normal" interaction patterns. And they rarely have the bandwidth to exhaustively explore every possible call sequence across a multi-contract system.

AI tools flip this dynamic:

Strength	Human Auditor	AI Tool
Novel vulnerability classes	✅ Creative reasoning	❌ Pattern-matching only
Cross-component interactions	⚠️ Bandwidth-limited	✅ Exhaustive exploration
Known vulnerability patterns	⚠️ Memory-dependent	✅ Trained on thousands of reports
Business logic understanding	✅ Context-aware	⚠️ Improving rapidly
Speed across large codebases	❌ Days to weeks	✅ Minutes to hours

The Lido findings sit squarely in that "cross-component interaction" zone — the kind of edge cases that emerge when governance, staking, and bridge modules intersect in unexpected ways.

The 2026 AI Bug Hunting Stack

Here's the pipeline that's producing results for active whitehat researchers right now:

Layer 1: Static Analysis + AI Triage

Start with traditional static analysis, but use LLMs to filter and prioritize findings:

# Run Slither for broad coverage
slither . --json slither-output.json

# Run Aderyn for Solidity-specific patterns
aderyn . --output aderyn-report.json

The raw output from these tools is noisy — hundreds of findings, most informational. Here's where the first AI layer adds value:

import json
from openai import OpenAI

client = OpenAI()

def triage_findings(slither_json, contract_context):
    """Use LLM to prioritize static analysis findings."""
    findings = json.loads(open(slither_json).read())

    prompt = f"""You are a smart contract security researcher.
    Given these static analysis findings and the contract's purpose,
    rank them by exploitability. Focus on:
    1. Cross-contract interaction risks
    2. Authorization gaps between components
    3. State manipulation via unexpected call sequences

    Contract context: {contract_context}
    Findings: {json.dumps(findings['results'], indent=2)[:8000]}

    Return a prioritized list with exploitation scenarios."""

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2
    )
    return response.choices[0].message.content

This isn't replacing the static analysis — it's making the output actionable. A senior auditor might do this mentally; the LLM does it explicitly and consistently.

Layer 2: LLM-Guided Invariant Generation

This is where the real leverage is. Instead of manually writing invariant tests, use LLMs to generate them from the protocol's documentation and source code:

def generate_invariants(contract_source, protocol_docs):
    """Generate protocol invariants from source + docs."""
    prompt = f"""Analyze this DeFi protocol and generate Foundry
    invariant tests that check critical safety properties.

    Focus on:
    - Share price monotonicity (should never decrease without withdrawal)
    - Total supply vs total assets consistency
    - Access control boundaries across module interactions
    - State transitions that should be atomic
    - Cross-function reentrancy via callbacks

    Contract source:
    {contract_source[:6000]}

    Protocol documentation:
    {protocol_docs[:3000]}

    Output Foundry test code with handler contracts."""

    # ... LLM call

The generated invariants won't be perfect, but they'll cover edge cases that manual test writers overlook — particularly around cross-module state consistency.

Layer 3: Targeted Fuzzing with AI-Generated Seeds

Tools like Echidna and Foundry's invariant testing become dramatically more effective when seeded with AI-generated input:

// AI-generated handler contract for invariant testing
contract LidoInvariantHandler is Test {
    LidoStaking staking;
    LidoGovernance governance;
    LidoBridge bridge;

    // AI identified this cross-component sequence as high-risk:
    // governance.propose() → bridge.processMessage() → staking.slash()
    function invariant_slashingNeverExceedsBondedAmount() public {
        uint256 totalBonded = staking.totalBonded();
        uint256 totalSlashed = staking.totalSlashed();

        // Invariant: slashing can never exceed what's bonded
        assertLe(totalSlashed, totalBonded,
            "Slashing exceeded bonded amount");
    }

    // AI found authorization gap: bridge messages can trigger
    // staking state changes without governance approval
    function invariant_bridgeCannotBypassGovernance() public {
        bytes32 lastGovAction = governance.lastApprovedAction();
        bytes32 lastStakingChange = staking.lastStateChange();

        // If staking state changed, governance must have approved
        if (lastStakingChange != bytes32(0)) {
            assertTrue(
                governance.isApproved(lastStakingChange),
                "Staking state changed without governance approval"
            );
        }
    }
}

Layer 4: Cross-Protocol Interaction Analysis

This is the frontier — and where the Lido findings likely originated. Modern DeFi protocols don't exist in isolation. They compose with:

Governance systems (Aragon, Governor)
Bridge contracts (cross-chain messaging)
Oracle feeds (Chainlink, custom)
Derivative tokens (stETH, wstETH)

AI excels at mapping these interaction surfaces:

def map_attack_surface(protocol_contracts, external_dependencies):
    """Map cross-protocol interaction attack surface."""
    prompt = f"""You are analyzing a DeFi protocol's external
    attack surface. Map every point where external contracts
    can influence internal state.

    For each interaction point, identify:
    1. Trust assumption (who can call, with what data)
    2. State change impact (what internal state is affected)
    3. Validation gaps (what's NOT checked)
    4. Composition risks (what if dependency X is compromised)

    Protocol contracts: {protocol_contracts}
    External dependencies: {external_dependencies}

    Output a threat matrix with exploitation scenarios."""

Real-World Pipeline: From Target to Report

Here's how a working AI-augmented bug hunt flows:

Step 1: Target Selection (5 minutes)
Pick a protocol with a bug bounty on Immunefi. Prioritize:

Recent upgrades or governance votes (new code = new bugs)
Multi-contract architectures (larger attack surface)
Cross-chain components (bridge + L1 + L2 interactions)

Step 2: Automated Reconnaissance (30 minutes)

# Clone and analyze
git clone https://github.com/target-protocol/contracts
cd contracts

# Static analysis sweep
slither . --json analysis.json
aderyn . --output aderyn.json

# AI triage — prioritize findings
python3 ai_triage.py analysis.json

Step 3: AI-Generated Invariants (1-2 hours)
Feed the source code and documentation to your LLM pipeline. Generate 50-100 invariant tests targeting:

Economic invariants (total supply, share prices, fee accounting)
Authorization invariants (who can do what, when)
State consistency invariants (cross-contract state should agree)

Step 4: Targeted Fuzzing (2-8 hours, mostly automated)

# Run invariant tests with high iteration count
forge test --match-test invariant_ -vvv --fuzz-runs 50000

Step 5: AI-Assisted Analysis of Failures (1-2 hours)
When an invariant breaks, feed the failing trace to an LLM:

def analyze_failure(failing_trace, contract_source):
    prompt = f"""A smart contract invariant test failed.
    Analyze the execution trace and determine:
    1. Is this a real vulnerability or a test artifact?
    2. What's the root cause?
    3. What's the maximum financial impact?
    4. Write a minimal PoC exploit.

    Trace: {failing_trace}
    Source: {contract_source[:5000]}"""

Step 6: Write the Report
If the finding is real, write it up for Immunefi. AI can help structure the report, but the analysis and impact assessment need human judgment.

What AI Can't Do (Yet)

Let's be honest about the limitations:

Novel attack vectors: AI finds bugs similar to ones it's seen before. Truly novel exploitation techniques still require human creativity.
Business logic understanding: If a protocol's invariants depend on understanding why a mechanism exists (not just how), AI struggles. It can check that totalSupply >= totalBacked, but it can't reason about whether the backing model itself is sound.
Economic modeling: Flash loan attacks, MEV extraction, and governance manipulation require understanding economic incentives. AI can detect known patterns but can't model novel economic attacks.
False positive filtering: AI-generated findings still require human validation. Expect a 60-70% false positive rate on AI-generated invariant violations — better than raw static analysis, but not production-quality without review.

The Economics: Is It Worth It?

Lido's bug bounty pays $1,000 for low severity, up to $2M for critical findings. The three March 2026 findings were low-to-moderate — likely $1,000-$50,000 each.

Time investment for an AI-augmented pipeline:

Setup (one-time): 2-4 hours
Per-target analysis: 4-12 hours
Report writing: 1-2 hours

Compare this to traditional manual auditing: 40-80 hours per protocol for similar coverage. The AI pipeline won't find everything a manual audit would, but the findings-per-hour ratio is dramatically better for certain vulnerability classes.

For bug bounty hunters, this means:

More targets per week — AI handles the reconnaissance grunt work
Better coverage per target — invariant generation explores paths humans skip
Faster iteration — failing invariant → AI analysis → report in hours, not days

Getting Started: Minimum Viable Pipeline

You don't need a sophisticated setup. Start with:

Slither + Aderyn for static analysis (free, open source)
Any frontier LLM (Claude, GPT-4o, Gemini) for triage and invariant generation
Foundry for invariant testing and fuzzing
A structured prompt library — build a collection of prompts for each pipeline stage

The competitive advantage isn't the tools — they're all available to everyone. It's the workflow and prompt engineering that separate productive AI-augmented hunters from everyone else.

What This Means for Protocol Security

The Lido disclosure is a signal, not an anomaly. As AI-assisted bug hunting matures:

Bug bounty submissions will increase — lower barrier to entry for security research
Median severity will decrease — AI finds the medium/low bugs that accumulate, not the critical showstoppers
Continuous auditing becomes real — AI pipelines can run on every commit, not just at launch
Traditional audit firms adapt or die — the value shifts from "finding known patterns" to "novel attack research and economic modeling"

For protocols, the message is clear: your bug bounty program is now your most cost-effective security investment. AI is making it easier for whitehats to find bugs — which means they'll find yours before blackhats do, if you have a bounty program that makes reporting worthwhile.

The three Lido bugs from March 2026 weren't dramatic. No $25M exploit, no emergency response. Just three quiet fixes scheduled through governance. That's what good security looks like — boring and effective, powered by AI that never gets tired of checking edge cases.

This article is part of the DeFi Security Research series. Previously: Fuzzing Solana Programs with Trident, Three Accounting Bugs That Drained $107K from DeFi Lending Protocols