DEV Community

Cover image for Bug Bounty Failures Are Actually Your Best Automated Learning System
Chudi Nnorukam
Chudi Nnorukam

Posted on • Edited on • Originally published at chudi.dev

Bug Bounty Failures Are Actually Your Best Automated Learning System

Originally published at chudi.dev


My testing agent hit a rate limit at 2 AM. It retried immediately. Got rate limited again. Retried. Rate limited. Retried faster.

By the time I woke up, my IP was banned from the target's entire infrastructure.

That specific frustration--of a system that worked against itself, making things worse with every "fix"--taught me that failure handling isn't optional. It's the difference between a tool and a weapon aimed at yourself. Responsible testing methodology, including rate limit handling, is covered in the OWASP Web Security Testing Guide.

Failure-driven learning in security automation requires classifying errors into distinct categories and applying specific recovery strategies. Rate limits need exponential backoff. Bans need immediate halt and human alert. Timeouts need reduced parallelism. The system must learn from recurring failures to prevent future damage and improve recovery over time.


What Are the 6 Failure Categories?

Every error gets classified. No generic "try again" logic.

Category Detection Pattern Recovery Strategy
Rate Limit HTTP 429, "too many requests" Exponential backoff (2x, max 1hr)
Ban Detected CAPTCHA, IP block, consecutive 403 Immediate halt + human alert
Auth Error 401, expired token, invalid session Credential refresh + retry (3 max)
Timeout No response > 30 seconds Reduce parallelism + extend timeout
Scope Violation Testing out-of-scope domain Remove from queue + blacklist
False Positive Validation rejection Log pattern + update signatures

Each category has specific recovery logic. The failure detector classifies first, then routes to the right handler.

In part 1, I explained how agents operate independently. This matters for failure recovery--when one agent gets rate limited, others continue. The failure is isolated.


How Does Exponential Backoff Actually Work?

Simple concept, careful implementation:

Attempt 1: Fail → Wait 30s
Attempt 2: Fail → Wait 60s (2x)
Attempt 3: Fail → Wait 120s (2x)
Attempt 4: Fail → Wait 240s (2x)
...
Maximum: 1 hour wait
Enter fullscreen mode Exit fullscreen mode

The multiplier is 2x. The ceiling is 1 hour. Why a ceiling? Because some rate limits reset faster than exponential would suggest. Waiting 4 hours when the limit resets in 15 minutes wastes time.

class RateLimiter {
  private baseDelay = 30000; // 30 seconds
  private multiplier = 2;
  private maxDelay = 3600000; // 1 hour

  getDelay(attemptNumber: number): number {
    const delay = this.baseDelay * Math.pow(this.multiplier, attemptNumber - 1);
    return Math.min(delay, this.maxDelay);
  }
}
Enter fullscreen mode Exit fullscreen mode

I originally set no ceiling--exponential forever. Well, it's more like... I trusted the math. But the math doesn't know that HackerOne resets rate limits every 15 minutes. Context matters.

[!TIP]
Token bucket rate limiting works better for proactive throttling. Refill tokens at a steady rate (e.g., 10/second), consume on each request. When bucket empties, wait. Smoother than reactive exponential backoff.


What Triggers Ban Detection?

Bans are different from rate limits. Rate limits say "slow down." Bans say "go away."

Detection patterns:

When ban detected:

  1. Immediate halt - All agents stop testing this target
  2. Human alert - Notification sent (Slack, email, database flag)
  3. Session preserved - State saved so human can investigate
  4. Never auto-resume - Human must explicitly approve continuation

I've been banned once. It happened because my failure detection was checking for rate limits but not bans. The scanner kept hammering while the target escalated from rate limit → temporary block → permanent ban.

Now ban detection has highest priority. It runs before rate limit checks.

[!WARNING]
A ban from a bug bounty program can affect your reputation. Programs talk to each other. Getting permanently blocked from one target for aggressive scanning could impact your standing elsewhere. The automation must respect this. HackerOne explicitly outlines conduct policies that govern how automated tools interact with programs.


How Does the Failure Patterns Database Work?

Recurring failures teach patterns:

// failure_patterns table schema
interface FailurePattern {
  pattern_id: string;        // Primary key
  error_signature: string;   // regex or exact match
  category: string;          // rate_limit, ban_detected, etc.
  recovery_strategy: string; // JSON config for recovery
  occurrences: number;       // how many times seen
  last_seen: Date;
  target_specific: boolean;  // applies to specific target or all
}
Enter fullscreen mode Exit fullscreen mode

When a new error arrives:

  1. Check if it matches existing pattern
  2. If match found, apply learned recovery strategy
  3. If no match, use default recovery for that category
  4. After recovery, log this occurrence

Over time, the system learns:

  • "Target X rate limits after 50 requests per minute" → Proactively throttle to 40
  • "This WAF pattern means temporary block, wait 10 minutes" → Auto-resume after delay
  • "This error always precedes a ban" → Halt immediately, don't wait for ban confirmation

The validation false positive signatures from part 2 use the same pattern database. Failures during validation teach what responses indicate "not a vulnerability" vs. "just an error."


When Does the System Escalate to Humans?

Automation can't solve everything. Escalation rules:

Immediate escalation:

  • Ban detected (any severity)
  • Scope violation detected
  • Critical system error (database corruption, etc.)

Threshold escalation:

  • Same error category 5+ times in 5 minutes
  • Auth errors not resolved after 3 credential refreshes
  • Timeout persists after reducing to minimum parallelism

Never escalate:

  • First occurrence of rate limit (handled automatically)
  • Single timeout (transient network issue)
  • False positive detection (just learning, not blocking)

The escalation notification includes:

  • Error category and pattern
  • What recovery was attempted
  • Current session state (so human can resume)
  • Suggested manual action

I hated adding escalation logic. It felt like admitting failure. But I needed it. Without escalation, the system either gives up too easily (abandoning valid targets) or pushes too hard (getting banned). Human judgment bridges the gap.


What's the Recovery-Oriented Error Handling Pattern?

Traditional error handling:

try {
  await scanTarget(target);
} catch (error) {
  throw error; // Propagate up, let someone else deal with it
}
Enter fullscreen mode Exit fullscreen mode

Recovery-oriented handling:

async function scanWithRecovery(target: Target): Promise<void> {
  const error = await detectError(lastResponse);

  if (!error) return; // No error, continue

  const signal = classifyError(error); // Returns FailureSignal

  const strategy = getRecoveryStrategy(signal);

  await executeRecovery(strategy, target);

  // Recovery might mean: wait, retry, refresh creds, or halt
}
Enter fullscreen mode Exit fullscreen mode

Errors don't propagate--they trigger recovery flows. The system assumes errors are normal and plans for them.

Error Occurs
    ↓
Classify (which category?)
    ↓
Check failure_patterns (known issue?)
    ↓
Apply recovery strategy
    ↓
Log for learning
    ↓
Continue or escalate
Enter fullscreen mode Exit fullscreen mode

How Does This Connect to Session Persistence?

In part 1, I described session checkpointing. Failure recovery depends on it.

When recovery requires waiting (exponential backoff, ban cooldown), the session saves state and sleeps. When it wakes:

// On resume after failure-induced pause
const checkpoint = db.get('context_snapshots', sessionId);
const failureState = db.get('failures', sessionId);

// Check if recovery period passed
if (failureState.recoveryUntil > Date.now()) {
  // Still waiting, sleep more
  await sleep(failureState.recoveryUntil - Date.now());
}

// Resume from checkpoint
await resumeSession(checkpoint);
Enter fullscreen mode Exit fullscreen mode

The system can be killed during backoff and resume correctly. No lost state, no duplicate requests, no memory of "where was I?"


What Happens After Repeated False Positives?

False positives are a special failure category. They don't need exponential backoff--they need pattern learning. Weakness patterns are tagged using MITRE CWE identifiers for consistent classification across the database.

When validation rejects a finding:

  1. Extract the pattern that triggered detection
  2. Extract the pattern that caused rejection
  3. Add to false_positive_signatures database
  4. Adjust Testing Agent's detection threshold for similar patterns

Over time:

  • "Reflected input in error messages → false positive" becomes a signature
  • Testing Agent learns to not report these as findings at all
  • Validation workload decreases
  • Human review queue gets cleaner

This connects to human-in-the-loop design in part 5. Human feedback on false positives feeds the learning system. Every rejection teaches.


What's the Actual Failure Recovery Rate?

Before failure-driven learning:

  • ~30% of scans interrupted by unhandled errors
  • Manual intervention needed 2-3 times per target
  • Bans happened monthly (yes, really)
  • No pattern learning--same mistakes repeated

After implementation:

  • ~5% of scans need human intervention
  • Automatic recovery handles rate limits, timeouts, auth refreshes
  • Zero bans in 6 months (knock on wood)
  • Pattern database has 200+ learned signatures

The system still fails. But it fails gracefully. It preserves state, notifies humans, and learns for next time.


Where Does This Series Go Next?

This is part 3 of a 5-part series on building bug bounty automation:

  1. Architecture & Multi-Agent Design
  2. From Detection to Proof: Validation & False Positives
  3. Failure-Driven Learning: Auto-Recovery Patterns (you are here)
  4. One Tool, Three Platforms: Multi-Platform Integration
  5. Human-in-the-Loop: The Ethics of Security Automation

Next up: how one system handles three different bug bounty platforms with their own APIs, report formats, and quirks.


Maybe failure isn't the opposite of success. Maybe it's the input data for getting smarter--every rate limit, every timeout, every ban teaching the system what not to do next time.

Top comments (0)