Originally published at chudi.dev
Reconnaissance takes forever. You spend hours on subdomain enumeration, tech stack fingerprinting, and endpoint discovery—only to find the same vulnerabilities you've tested before.
I built BugBountyBot to automate the tedious 80% while keeping humans in the loop for the decisions that matter. Four specialized agents handle reconnaissance, testing, validation, and reporting independently. Evidence-gated progression with a 0.85+ confidence threshold means no finding reaches human review without verified proof-of-concept execution—preventing the false positive floods that tank researcher reputation.
The Problem with Manual Hunting
Traditional bug bounty hunting breaks down like this—following the phases outlined in the OWASP Web Security Testing Guide:
| Phase | Time Spent | Value Added |
|---|---|---|
| Reconnaissance | 40% | Low (repetitive) |
| Testing | 30% | Medium (pattern-based) |
| Validation | 15% | High (requires judgment) |
| Reporting | 15% | High (requires clarity) |
Most hunters spend 70% of their time on work that could be automated. The high-value phases—validation and reporting—get squeezed because you're exhausted from the grind.
The Multi-Agent Architecture
BugBountyBot uses four specialized agents, each optimized for their phase:
Why Four Agents Instead of One?
A single agent trying to do everything suffers from context dilution. The prompt space needed for effective reconnaissance is completely different from vulnerability testing.
Specialized agents can:
- Use phase-specific prompts without compromise
- Maintain focused context windows
- Be tuned independently based on performance
- Fail in isolation without breaking the pipeline
Evidence-Gated Progression
The biggest risk in automated hunting is false positives. Submit garbage, and your reputation tanks. Platforms flag your account. Programs stop accepting your reports. The OWASP Top Ten exists precisely because unvalidated findings lead to misclassified risk—evidence-based confirmation is the baseline for any credible security work.
BugBountyBot uses a 0.85 confidence threshold before any finding advances:
interface Finding {
vulnerability: VulnerabilityType;
evidence: Evidence[];
confidence: number; // 0.0 - 1.0
status: 'pending' | 'validated' | 'rejected';
}
function shouldAdvance(finding: Finding): boolean {
// Only findings with 0.85+ confidence advance to human review
return finding.confidence >= 0.85;
}
Findings below 0.85 aren't discarded—they're logged with full context for the RAG database. The system learns why they failed validation, preventing similar false positives in future hunts.
What Builds Confidence?
The Validator Agent runs multiple checks:
- PoC Execution - Does the exploit actually work?
- Response Diff Analysis - Is the behavior change meaningful?
- False Positive Signatures - Does this match known FP patterns (cross-referenced against the MITRE CWE database)?
- Evidence Hashing - Is the evidence reproducible?
Each check contributes to the confidence score. Only when all checks align does a finding hit the 0.85 threshold. The architectural decisions behind this system are covered in the bug bounty automation architecture overview.
The RAG Database
SQLite stores everything the system learns:
-- Knowledge that improves over time
CREATE TABLE knowledge_base (
pattern TEXT, -- What worked
context TEXT, -- Where it worked
success_rate REAL, -- How often it works
last_used TIMESTAMP
);
CREATE TABLE failure_patterns (
approach TEXT, -- What failed
reason TEXT, -- Why it failed
program_id TEXT, -- Program-specific context
created_at TIMESTAMP
);
CREATE TABLE false_positive_signatures (
signature TEXT, -- What to avoid
occurrences INTEGER, -- How often we see it
last_seen TIMESTAMP
);
Every hunt session adds knowledge:
- Successful patterns get reinforced
- Failures get logged with reasons
- False positives become signatures to filter
After 50 hunts, the system knows which approaches work on which program types. It stops repeating mistakes that wasted your time six months ago.
Safety Mechanisms
Automated hunting without safety is a fast path to bans. BugBountyBot includes:
Rate Limiting
Token bucket algorithm per target. Configurable burst size and refill rate. Automatic slowdown when approaching limits.
Scope Validation
Every request validates against program scope before execution. Out-of-scope domains are hard-blocked, not just warned.
Ban Detection
Monitors for consecutive failures, response time changes, and error patterns that indicate blocking. Triggers automatic cooldown before you get banned.
interface SafetyConfig {
maxRequestsPerMinute: number;
burstSize: number;
cooldownOnConsecutiveFailures: number;
scopeValidation: 'strict' | 'permissive';
}
Human-in-the-Loop
Every bug bounty platform requires human oversight for submissions. This isn't a limitation to work around—it's a feature to design for.
BugBountyBot's workflow:
- Automated phases (Recon → Testing → Validation) run without intervention
- 0.85+ findings queue for human review with full evidence
- Human approves specific findings for submission
- Reporter Agent formats and submits approved findings
You spend your time reviewing validated findings with evidence, not grinding through reconnaissance. The ratio flips: 20% of your time on tedious work, 80% on high-value decisions.
HackerOne, Intigriti, and Bugcrowd all have Terms of Service that require human oversight for automated tools. Fully autonomous submission isn't just risky—it can get you permanently banned.
Checkpoint System
Hunt sessions can span days or weeks. The checkpoint system saves state:
interface Checkpoint {
sessionId: string;
phase: 'recon' | 'testing' | 'validation' | 'reporting';
progress: PhaseProgress;
findings: Finding[];
timestamp: Date;
}
Resume any session exactly where you left off. No lost context, no repeated work.
Results
After building and running BugBountyBot:
| Metric | Before | After |
|---|---|---|
| Time on recon | 4+ hours | 30 mins (review) |
| False positive rate | ~30% | Under 5% |
| Findings per session | 2-3 | 8-12 (validated) |
| Time to first finding | 2 days | 4 hours |
The system doesn't replace skill—it multiplies it. Your expertise in validation and reporting gets applied to 4x more findings.
Setting Up the Knowledge Base
The RAG database is what makes the system smarter over time. The initial schema:
-- Successful exploitation patterns
CREATE TABLE knowledge_base (
id TEXT PRIMARY KEY,
pattern TEXT NOT NULL, -- e.g., "IDOR via UUID prediction on /api/orders"
context TEXT NOT NULL, -- where and when it worked
technology_stack TEXT, -- "Node.js, PostgreSQL, Clerk auth"
success_rate REAL DEFAULT 0.0,
times_used INTEGER DEFAULT 0,
last_used TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- False positive signatures to filter out
CREATE TABLE false_positive_signatures (
id TEXT PRIMARY KEY,
signature TEXT NOT NULL, -- what the FP looks like
reason TEXT, -- why it's a false positive
occurrences INTEGER DEFAULT 1,
last_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- What didn't work and why
CREATE TABLE failure_patterns (
id TEXT PRIMARY KEY,
approach TEXT NOT NULL,
reason TEXT,
program_id TEXT, -- program-specific context
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Seed the knowledge base with known false positive signatures from the start. CSRF tokens appearing in URLs. Debug endpoints returning verbose errors that are intentional. Rate limiting responses that pattern-match like authentication bypasses. The system builds these signatures automatically over time, but starting with known patterns prevents the first few hunts from wasting time.
After 20–30 sessions on a specific program, the system's recommendation quality improves noticeably. It stops suggesting tests that have repeatedly failed on that stack and prioritizes approaches that have produced validated findings before.
Getting Started
BugBountyBot is built with TypeScript, SQLite, and Claude Code integration. The core architecture:
/src
/agents
recon.ts # Passive enumeration
testing.ts # Vulnerability detection
validator.ts # PoC verification
reporter.ts # Report generation
/database
rag.ts # Knowledge storage
checkpoints.ts # Session persistence
/safety
rate-limit.ts # Request throttling
scope.ts # Scope validation
ban-detect.ts # Blocking detection
Start with a single program. Let the RAG database learn. Expand scope as confidence grows. The first ten sessions build the foundation—after that, the system starts surfacing patterns you wouldn't have found manually.
Choosing Your First Program
Not all programs are equal for a new automation system.
Good first programs:
- Broad scope (many subdomains, large attack surface)
- Active security team (fast triage feedback helps the learning loop)
- Technology stack you have experience with (Node.js, Python, Ruby—each has different testing patterns)
- Public or large private programs (not invite-only at the start)
Programs to avoid initially:
- Narrow scope (hard to find enough surface area for reconnaissance to matter)
- Long triage queues (slow feedback makes the learning loop expensive)
- Programs known for "duplicate" responses on common issues (high false-positive environment)
HackerOne's public program list filtered by "bounty range" and "response time" gives a reasonable starting point. Pick two or three programs, run the recon phase manually the first time to validate the system's output, then let the testing and validation phases run.
What's Next
BugBountyBot v2.0 is in development with methodology-driven hunting:
- 6-8 week structured hunt phases
- Feature mapping before testing
- Scope change monitoring
- JavaScript file change detection
The shift from "run and hope" to systematic, elite-hunter methodology. The false positive reduction techniques in the Validator Agent are detailed in validation and false positives.
Related: Why Human-in-the-Loop Beats Full Automation | Portfolio: BugBountyBot
Top comments (0)