Chudi Nnorukam

Posted on Feb 10 • Edited on Jul 6 • Originally published at chudi.dev

Bug Bounty Automation Failures: Rate Limits, Bans, and Lessons

#bugbounty #failurerecovery #automation #security

Originally published at chudi.dev

My testing agent hit a rate limit at 2 AM. It retried immediately. Got rate limited again. Retried. Rate limited. Retried faster.

By the time I woke up, my IP was banned from the target's entire infrastructure.

That specific frustration--of a system that worked against itself, making things worse with every "fix"--taught me that failure handling isn't optional. It's the difference between a tool and a weapon aimed at yourself. Responsible testing methodology, including rate limit handling, is covered in the OWASP Web Security Testing Guide.

Failure-driven learning in security automation requires classifying errors into distinct categories and applying specific recovery strategies. Rate limits need exponential backoff. Bans need immediate halt and human alert. Timeouts need reduced parallelism. The system must learn from recurring failures to prevent future damage and improve recovery over time.

What Are the 6 Failure Categories?

The system classifies every error into one of six categories: rate limit (HTTP 429), ban detected (CAPTCHA or IP block), auth error (expired credentials), timeout (no response), scope violation (out-of-scope target), and false positive (validation rejection). Each category routes to a specific recovery handler rather than a generic retry loop.

Every error gets classified. No generic "try again" logic.

Category	Detection Pattern	Recovery Strategy
Rate Limit	HTTP 429, "too many requests"	Exponential backoff (2x, max 1hr)
Ban Detected	CAPTCHA, IP block, consecutive 403	Immediate halt + human alert
Auth Error	401, expired token, invalid session	Credential refresh + retry (3 max)
Timeout	No response > 30 seconds	Reduce parallelism + extend timeout
Scope Violation	Testing out-of-scope domain	Remove from queue + blacklist
False Positive	Validation rejection	Log pattern + update signatures

Each category has specific recovery logic. The failure detector classifies first, then routes to the right handler.

In part 1, I explained how agents operate independently. This matters for failure recovery--when one agent gets rate limited, others continue. The failure is isolated.

How Does Exponential Backoff Actually Work?

Exponential backoff starts with a 30-second wait, doubling on each subsequent failure up to a maximum of one hour. The multiplier is 2x, so failures produce 30s, 60s, 120s, 240s delays in sequence. A ceiling prevents wasteful waits when rate limits reset faster than the exponential curve predicts.

Simple concept, careful implementation:

Attempt 1: Fail → Wait 30s
Attempt 2: Fail → Wait 60s (2x)
Attempt 3: Fail → Wait 120s (2x)
Attempt 4: Fail → Wait 240s (2x)
...
Maximum: 1 hour wait

The multiplier is 2x. The ceiling is 1 hour. Why a ceiling? Because some rate limits reset faster than exponential would suggest. Waiting 4 hours when the limit resets in 15 minutes wastes time.

class RateLimiter {
  private baseDelay = 30000; // 30 seconds
  private multiplier = 2;
  private maxDelay = 3600000; // 1 hour

  getDelay(attemptNumber: number): number {
    const delay = this.baseDelay * Math.pow(this.multiplier, attemptNumber - 1);
    return Math.min(delay, this.maxDelay);
  }
}

I originally set no ceiling--exponential forever. Well, it's more like... I trusted the math. But the math doesn't know that HackerOne resets rate limits every 15 minutes. Context matters.

[!TIP]
Token bucket rate limiting works better for proactive throttling. Refill tokens at a steady rate (e.g., 10/second), consume on each request. When bucket empties, wait. Smoother than reactive exponential backoff.

What Triggers Ban Detection?

Ban detection activates on four patterns: a CAPTCHA challenge in the response body, consistent 403 or 503 errors across all endpoints, five or more consecutive failures from the same target, and explicit block messages like "Your IP has been banned." Any of these triggers an immediate halt and human alert.

Bans are different from rate limits. Rate limits say "slow down." Bans say "go away."

Detection patterns:

When ban detected:

Immediate halt - All agents stop testing this target
Human alert - Notification sent (Slack, email, database flag)
Session preserved - State saved so human can investigate
Never auto-resume - Human must explicitly approve continuation

I've been banned once. It happened because my failure detection was checking for rate limits but not bans. The scanner kept hammering while the target escalated from rate limit → temporary block → permanent ban.

Now ban detection has highest priority. It runs before rate limit checks.

[!WARNING]
A ban from a bug bounty program can affect your reputation. Programs talk to each other. Getting permanently blocked from one target for aggressive scanning could impact your standing elsewhere. The automation must respect this. HackerOne explicitly outlines conduct policies that govern how automated tools interact with programs.

How Does the Failure Patterns Database Work?

The failure_patterns table stores error signatures alongside proven recovery strategies. When a new error arrives, the system checks for a matching signature. If found, it applies the learned strategy. If not, it falls back to category defaults and logs the occurrence so future errors of the same type benefit from accumulated knowledge.

Recurring failures teach patterns:

// failure_patterns table schema
interface FailurePattern {
  pattern_id: string;        // Primary key
  error_signature: string;   // regex or exact match
  category: string;          // rate_limit, ban_detected, etc.
  recovery_strategy: string; // JSON config for recovery
  occurrences: number;       // how many times seen
  last_seen: Date;
  target_specific: boolean;  // applies to specific target or all
}

When a new error arrives:

Check if it matches existing pattern
If match found, apply learned recovery strategy
If no match, use default recovery for that category
After recovery, log this occurrence

Over time, the system learns:

"Target X rate limits after 50 requests per minute" → Proactively throttle to 40
"This WAF pattern means temporary block, wait 10 minutes" → Auto-resume after delay
"This error always precedes a ban" → Halt immediately, don't wait for ban confirmation

The validation false positive signatures from part 2 use the same pattern database. Failures during validation teach what responses indicate "not a vulnerability" vs. "just an error."

When Does the System Escalate to Humans?

The system escalates immediately for bans, scope violations, and critical system errors. Threshold escalation triggers when the same error category repeats five or more times within five minutes, when auth errors persist beyond three credential refreshes, or when timeouts continue after reducing to minimum parallelism. First-occurrence rate limits and single timeouts never escalate.

Automation can't solve everything. Escalation rules:

Immediate escalation:

Ban detected (any severity)
Scope violation detected
Critical system error (database corruption, etc.)

Threshold escalation:

Same error category 5+ times in 5 minutes
Auth errors not resolved after 3 credential refreshes
Timeout persists after reducing to minimum parallelism

Never escalate:

First occurrence of rate limit (handled automatically)
Single timeout (transient network issue)
False positive detection (just learning, not blocking)

The escalation notification includes:

Error category and pattern
What recovery was attempted
Current session state (so human can resume)
Suggested manual action

I hated adding escalation logic. It felt like admitting failure. But I needed it. Without escalation, the system either gives up too easily (abandoning valid targets) or pushes too hard (getting banned). Human judgment bridges the gap.

What's the Recovery-Oriented Error Handling Pattern?

Recovery-oriented error handling treats failures as expected inputs rather than exceptions to propagate. Every error is classified, matched against known patterns, routed to a specific recovery strategy, and logged for future learning. The system assumes errors are normal and plans for them, so recovery flows replace crash-and-rethrow logic entirely.

Traditional error handling:

try {
  await scanTarget(target);
} catch (error) {
  throw error; // Propagate up, let someone else deal with it
}

Recovery-oriented handling:

async function scanWithRecovery(target: Target): Promise<void> {
  const error = await detectError(lastResponse);

  if (!error) return; // No error, continue

  const signal = classifyError(error); // Returns FailureSignal

  const strategy = getRecoveryStrategy(signal);

  await executeRecovery(strategy, target);

  // Recovery might mean: wait, retry, refresh creds, or halt
}

Errors don't propagate--they trigger recovery flows. The system assumes errors are normal and plans for them.

Error Occurs
    ↓
Classify (which category?)
    ↓
Check failure_patterns (known issue?)
    ↓
Apply recovery strategy
    ↓
Log for learning
    ↓
Continue or escalate

How Does This Connect to Session Persistence?

Session persistence enables failure recovery across interruptions. When backoff or ban cooldown requires a pause, the session saves its full state to the database and sleeps. On resume, whether after seconds or a restart, it reads the checkpoint and failure state, verifies the recovery period has passed, then continues without duplicating requests.

In part 1, I described session checkpointing. Failure recovery depends on it.

When recovery requires waiting (exponential backoff, ban cooldown), the session saves state and sleeps. When it wakes:

// On resume after failure-induced pause
const checkpoint = db.get('context_snapshots', sessionId);
const failureState = db.get('failures', sessionId);

// Check if recovery period passed
if (failureState.recoveryUntil > Date.now()) {
  // Still waiting, sleep more
  await sleep(failureState.recoveryUntil - Date.now());
}

// Resume from checkpoint
await resumeSession(checkpoint);

The system can be killed during backoff and resume correctly. No lost state, no duplicate requests, no memory of "where was I?"

What Happens After Repeated False Positives?

Repeated false positives teach the system which detection patterns consistently produce invalid findings. When validation rejects a report, the triggering pattern and rejection pattern are both extracted and stored in the false_positive_signatures database. The Testing Agent's threshold for similar patterns adjusts downward, reducing false positive volume over time without manual tuning.

False positives are a special failure category. They don't need exponential backoff--they need pattern learning. Weakness patterns are tagged using MITRE CWE identifiers for consistent classification across the database.

When validation rejects a finding:

Extract the pattern that triggered detection
Extract the pattern that caused rejection
Add to false_positive_signatures database
Adjust Testing Agent's detection threshold for similar patterns

Over time:

"Reflected input in error messages → false positive" becomes a signature
Testing Agent learns to not report these as findings at all
Validation workload decreases
Human review queue gets cleaner

This connects to human-in-the-loop design in part 5. Human feedback on false positives feeds the learning system. Every rejection teaches.

What's the Actual Failure Recovery Rate?

Before failure-driven learning, roughly 30 percent of scans required manual intervention and bans occurred monthly. After implementing categorized recovery with pattern learning, human intervention dropped to about 5 percent of scans, bans have not occurred in six months, and the pattern database now holds over 200 learned error signatures.

Before failure-driven learning:

~30% of scans interrupted by unhandled errors
Manual intervention needed 2-3 times per target
Bans happened monthly (yes, really)
No pattern learning--same mistakes repeated

After implementation:

~5% of scans need human intervention
Automatic recovery handles rate limits, timeouts, auth refreshes
Zero bans in 6 months (knock on wood)
Pattern database has 200+ learned signatures

The system still fails. But it fails gracefully. It preserves state, notifies humans, and learns for next time.

Where Does This Series Go Next?

This post is part 3 of a five-part series on bug bounty automation. Parts 1 and 2 covered multi-agent architecture and false positive reduction. Part 4 covers how a single system handles three different bug bounty platforms. Part 5 addresses the ethics of human-in-the-loop security automation.

This is part 3 of a 5-part series on building bug bounty automation:

Architecture & Multi-Agent Design
From Detection to Proof: Validation & False Positives
Failure-Driven Learning: Auto-Recovery Patterns (you are here)
One Tool, Three Platforms: Multi-Platform Integration
Human-in-the-Loop: The Ethics of Security Automation

Next up: how one system handles three different bug bounty platforms with their own APIs, report formats, and quirks.

Maybe failure isn't the opposite of success. Maybe it's the input data for getting smarter--every rate limit, every timeout, every ban teaching the system what not to do next time.