I used to spend twenty hours a week scrolling through GitHub issues. I cross referenced CVE databases. I manually checked pull requests for security regressions. I was hunting for open source bounties and vulnerability rewards. It paid decently. I collected three thousand dollars last quarter alone. The problem was the grind. I kept missing obvious targets because I could only review fifty repositories a day. My eyes blurred after reading the same stack traces. I needed a better workflow.
I built a small pipeline that uses AI agents to scan public repositories for actionable bounty opportunities. I am not talking about a magic button that prints money. I built a structured system that filters noise, extracts relevant context, and flags issues that actually pay out. The setup cuts my research time from twenty hours to about four hours a week. In the last two months, it flagged fourteen valid targets. Nine turned into paid submissions.
I focused on programs that publish clear scope. I target organizations like the Algorand Foundation, Protocol Labs, and independent crate maintainers. Payouts range from five hundred to twelve thousand dollars per finding. My agent starts by pulling the latest two hundred issues from each target repo. It skips feature requests and documentation updates. It looks for technical keywords like memory leak, unvalidated input, race condition, and dependency confusion. Then it passes those candidates through a local LLM with a strict prompt template. The model scores each issue against the published program scope and returns a confidence rating.
Last month, the agent flagged an issue in a widely used Rust parsing crate. The original reporter mentioned an unsafe pointer cast when handling malformed headers. The issue had two comments and zero maintainer responses. I ran the agent output through a quick diff check. The fix was already merged in the main branch, but the patch had not reached the latest stable release yet. That meant the vulnerability remained active in production environments. I submitted a coordinated disclosure report through the official channel. The payout was three thousand two hundred dollars. It took me two days from initial flag to payment.
I built the core scanner in Python. It hits the GitHub REST API, parses the JSON response, and feeds the text to an inference endpoint. Here is the exact module I use for the initial filtering step. It handles pagination, respects rate limits, and returns a clean list of candidates.
import os
import requests
from typing import List, Dict
from openai import OpenAI
GITHUB_TOKEN = os.getenv("GITHUB_TOKEN")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_BASE_URL = os.getenv("OPENAI_BASE_URL", "https://api.openai.com/v1")
client = OpenAI(api_key=OPENAI_API_KEY, base_url=OPENAI_BASE_URL)
def fetch_issues(repo: str, labels: List[str] = None) -> List[Dict]:
url = f"https://api.github.com/repos/{repo}/issues"
headers = {"Authorization": f"token {GITHUB_TOKEN}"}
params = {"state": "open", "per_page": 50, "sort": "created", "direction": "desc"}
if labels:
params["labels"] = ",".join(labels)
issues = []
while url:
resp = requests.get(url, headers=headers, params=params)
resp.raise_for_status()
issues.extend(resp.json())
url = resp.links.get("next", {}).get("url")
return issues
def score_issue(title: str, body: str) -> float:
prompt = (
"You are a security researcher reviewing GitHub issues for bug bounty programs. "
"Score the likelihood that this issue represents a valid vulnerability. "
"Return only a number between 0.0 and 1.0. "
f"Title: {title}\nBody: {body}"
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
temperature=0.1,
max_tokens=10
)
score_str = response.choices[0].message.content.strip()
try:
return float(score_str)
except ValueError:
return 0.0
def scan_repo(repo: str, threshold: float = 0.75) -> List[Dict]:
raw_issues = fetch_issues(repo)
candidates = []
for issue in raw_issues:
score = score_issue(issue.get("title", ""), issue.get("body", ""))
if score >= threshold:
candidates.append({"title": issue["title"], "url": issue["html_url"], "score": score})
return candidates
The script does not replace triage. It just narrows the field. I run it across fifteen repositories every Tuesday morning. The LLM prompt stays deliberately strict. I tuned the temperature to zero point one so it stops hallucinating severity ratings. I added a fallback parser that catches cases where the model returns a string instead of a float. The threshold sits at zero point seven five. That number came from testing. Lower thresholds flooded my inbox with false positives. Higher thresholds missed actual race conditions that the model initially undervalued.
I also added a secondary check. The agent cross references the flagged issue against the project release notes. If the vulnerability was already patched in a stable version, it discards the candidate. This step alone saved me from submitting six duplicate reports. Maintainers appreciate clean submissions. I get paid faster when I skip the noise.
Bounty hunting has rules. I read every program scope before I run the agent. I never test on production environments. I never touch user data. The automation only reads public GitHub data. I submit findings through official disclosure channels. Some programs explicitly forbid automated scanning. I respect that boundary. If a repo bans bots, I remove it from my target list. Automation is a tool for research, not a license to ignore policy.
Over the last ninety days, this setup processed four thousand open issues. It produced two hundred thirty four flagged candidates. I investigated eighty two of them. I submitted twenty four reports. Fourteen resulted in payouts. The total came to twenty one thousand dollars. That breaks down to roughly one thousand one hundred fifty dollars per valid submission and forty seven dollars per hour of active review time. It is not passive income. It is just a more efficient workflow.
I still read the code myself. I still write the disclosure reports by hand. The AI agent just removes the tedious filtering step. If you want to build something similar, start small. Pick three repos with active bounty programs. Write a script that pulls open issues. Run them through a prompt that matches the program scope. Track your hit rate. Adjust the threshold. You will quickly see what the model gets right and where it needs tuning.
The landscape changes fast. Maintainers merge fixes. Programs update their scopes. LLMs improve. I update my prompt templates every three weeks. I add new keyword filters when I spot recurring patterns. The system works because it stays focused on one thing. It reads public data, applies consistent scoring, and hands me a short list. I do the rest. That is how I turned twenty hours of manual scrolling into four hours of targeted research. I plan to keep running it. The math works. The process scales. I just need to stay disciplined.
💡 Further Reading: Pi Stack
Top comments (0)