DEV Community: shunta hayashi

What I learned from my first AI-assisted bug bounty submissions

shunta hayashi — Fri, 29 May 2026 04:08:57 +0000

Third post in my "AI-assisted OSS contribution" series. The first two were about pre-fork due diligence and shipping a fix to ONNX with my own PR scanner. This one is about a harder game: security research and coordinated disclosure.

For a while my AI-assisted open-source work was about contributions — typo fixes, docs, small bug fixes, the occasional feature. Pull requests have a forgiving feedback loop: if a PR is wrong, a maintainer comments and you iterate. Bug bounty work is different. The feedback loop is slower, the bar for "novel and correct" is much higher, and a lot of the difficulty has nothing to do with the vulnerability itself.

I ran a small experiment: use Claude (Opus) to help me find, verify, and write up vulnerabilities in public, in-scope open-source bug bounty programs — the kind that publish a scope and a safe-harbor policy and explicitly invite testing. Here's what actually mattered, mostly the things I didn't expect.

1. The duplicate problem is the real boss fight

The single biggest risk to a bounty submission is not "is it a real bug" — it's "did someone already report it." And you usually cannot see the answer.

I built a small novelty-checking toolchain around the assistant: query published advisories (GHSA via the GitHub API), aggregate cross-ecosystem advisory data (OSV), search the target repository's own issues and PRs, and pull recent security-research feeds. It catches a lot. But it has a fundamental blind spot: privately submitted reports are invisible until they're disclosed. One of my submissions was closed as a duplicate of a report filed months earlier that I had no way of seeing. The finding was correct. It just wasn't first.

The lesson isn't "check harder." Public OSINT can only ever reduce duplicate risk, never eliminate it. The realistic takeaways:

Treat novelty as a probability, not a yes/no.
Favor surfaces that are less trodden — newer code paths, recently changed files, parsers for formats nobody enjoys reading.
Accept that some fraction of correct work will be duplicated, and size your effort accordingly.

2. PoC-first, and verify against the real runtime

It is very easy to read code, build a clean mental model of a bug, write a confident report — and be wrong, because the runtime doesn't behave the way the reference manual says it does. I got burned by exactly this kind of gap between "what the spec says" and "what the implementation does."

The discipline that fixed it: no claim without a runnable proof of concept, executed against the actual runtime. Not pseudocode. Not "this should work." A minimal, contained reproduction on my own machine — localhost only, no third-party or production systems touched — that either fires or it doesn't. An AI assistant is genuinely good at the first 80% of building that PoC fast; the last 20% (does it actually reproduce?) is non-negotiable and is where most false positives die.

3. The signal economy is a real constraint

Modern bounty platforms ration your ability to submit. New researchers get a limited number of "trial" reports, and a reputation/signal score that drops when you file invalid or duplicate reports — low enough, and you get blocked from submitting at all.

This completely changes the optimal strategy. When submissions are cheap, volume wins. When each submission costs scarce signal, quality dominates volume, and a single duplicate or "informative" close is genuinely expensive. With an AI assistant that can generate plausible-looking reports quickly, this is the most important guardrail: the bottleneck must be verification, not generation.

4. The 2026 market got harder while I was learning

Some honest context, because it shaped my results. The open-source bounty landscape contracted noticeably in early 2026:

The Internet Bug Bounty paused new submissions and cut its payout tiers steeply.
At least one major runtime's bounty program was paused for lack of funding.
Platforms tightened minimum signal requirements.

A widely-cited reason: AI-assisted discovery started producing vulnerability reports faster than open-source maintainers could triage and remediate them. The irony isn't lost on me — the same tooling that makes an individual researcher more productive, in aggregate, helped congest the system that pays them. If you're starting now, plan for fewer open programs and lower-but-real payouts than the headline numbers from a year ago.

5. Disclose the AI, every time

I disclose AI assistance in every submission. Not as a disclaimer-shaped apology — as a fact, the same way you'd note any tool in your methodology. Two practical reasons beyond honesty:

Triagers are increasingly wary of low-effort AI spam. Being upfront and attaching a clean, reproducible PoC is how you signal you're not that.
If a program's policy requires disclosure and you skip it, you can lose the report (and trust) on a technicality, regardless of finding quality.

The model does the heavy lifting on code review, hypothesis generation, and drafting. I own scope selection, the decision to submit, the ethics, and the final verification. That division of labor is the whole point.

What I'd tell myself at the start

Verification is the product, not the report. Generation is cheap now; correctness isn't.
Duplicate risk is structural. Reduce it, price it in, don't pretend you can eliminate it.
Respect the signal economy. One careful submission beats five hopeful ones.
Stay in scope, stay contained, disclose the AI. The boring compliance stuff is what makes the interesting work sustainable.

I'm still early — a couple of submissions in, one under triage as I write this, plenty unproven. But the meta-lessons above transferred cleanly from the PR work in my earlier posts: the assistant compresses the mechanical effort, and that just relocates all the value to judgment — what to look at, whether it's really true, and whether you should hit submit.

Developed with AI assistance (Claude Opus); all findings were reviewed, reproduced locally, and verified by me before submission. No unpatched or undisclosed vulnerability details are included in this post.

Self-dogfooding: using my own AI-PR scanner to ship a fix to ONNX

shunta hayashi — Wed, 13 May 2026 10:30:39 +0000

Note: This article documents work performed with AI assistance (Claude Sonnet 4.6 via Claude Code), including the original bug analysis, the pre-submission review that prompted the path change, and the PR that was ultimately submitted. All technical claims are verified against the ONNX source tree and the public PR.

The hook: a real bug that was never going to ship as an advisory

The bug was straightforward once I saw it. In onnx/utils.py, a helper function called _tar_members_filter uses a plain str.startswith() call to validate that a tar archive member lives inside the intended extraction directory:

# onnx/utils.py  (simplified)
abs_base   = os.path.abspath(base)
abs_member = os.path.abspath(member_path)
if not abs_member.startswith(abs_base):   # <-- no os.sep guard
    raise RuntimeError("traversal detected")

The problem is that startswith is a string operation, not a path operation. Given a base directory of /home/user/.onnx/models, a crafted archive member resolving to /home/user/.onnx/models_evil/pwned.txt passes the check: the string "models_evil" begins with the string "models". A separator guard — startswith(abs_base + os.sep) — closes the gap. Without it, files can be written outside the extraction directory on Python 3.10 and 3.11, the versions where the fallback filter is active.

I found this via static analysis. The fix was one line. I had a working proof-of-concept. My first instinct was to head straight to a bug bounty platform and file an advisory.

I paused instead — and that pause changed what happened next.

The reviewer that changed my mind

Before submitting anything, I ran a pre-submission review pass using an LLM agent configured for deep, adversarial analysis. I think of this as a second opinion before publication: give it the same evidence I have, tell it to find holes, and treat the output seriously.

The technical verdict came back positive. The root cause — startswith missing + os.sep — was confirmed correct. The fix was confirmed correct. The proof-of-concept logic was independently traced and verified.

Then the report pivoted:

Two essentially identical reports already exist on the bounty platform (both self-closed by the reporters), and a third was marked Duplicate.

The reviewer had checked the platform's listing for this repository. Three prior reports — same function, same root cause, same string-comparison bug — filed in March 2026. Two were self-closed by the reporters, almost certainly after maintainer feedback. One was marked Duplicate.

The reviewer's framing was precise:

Submitting a third (fourth) variant of the same finding carries serious duplicate-judgement and reputation risk that outweighs the (real but low-reach) underlying flaw.

It also identified weaknesses I had not fully interrogated: the vulnerable code path is only reachable through the backend test runner (not any production inference path), the affected Python versions (3.10 and 3.11) represent a shrinking window, and the impact framing in my draft was more aggressive than the actual constraint arithmetic supported.

This is the part of AI-assisted research that I find most useful and most underrated: not the initial discovery, but the disciplined second-pass that asks "is this the right thing to do with the finding?"

The answer was clearly no. Filing a fourth variant of a finding that three other reporters had already abandoned, on a platform that was already under scrutiny for low-quality reports, was not a good use of anyone's time.

The real lesson: finding a bug does not mean you should file an advisory about it. The path from "this is technically wrong" to "I should submit this through a bounty platform" has several gates, and the duplicate-risk gate is one of the most expensive to fail.

The pivot: direct PR instead of advisory

Once the bounty-submission path was off the table, the decision was simple. The fix itself was uncontroversially correct. Changing startswith(abs_base) to startswith(abs_base + os.sep) and abs_member != abs_base is the canonical pattern. Python's own PEP 706 was written to address this class of tarball extraction vulnerability. The code needed the fix regardless of how many people had noticed the gap.

A direct pull request solves the actual problem without any of the duplicate risk. The PR gets reviewed on the merits of the code change. Credit is attached to the commit. The maintainer relationship is positive rather than transactional.

The principles I reached for in this decision:

If the fix is uncontroversial, the PR is the highest-EV path. Advisory platforms add value when a finding needs coordinated disclosure, embargo, or cross-maintainer coordination. A one-line correctness fix to a fallback filter does not meet that bar.
Duplicate risk on advisory platforms is a reputation cost, not just a process cost. Prior closed reports signal that maintainers have already processed this class of finding. Submitting again without materially new evidence is noise.
The goal is fixed software, not credited findings. If the code gets patched, the outcome is correct whether or not a bounty number is attached.

With that settled, I moved to the contribution workflow.

Pre-fork due diligence with my own tool

This is the part I find satisfying to document: I used the tool I wrote about in the first article in this series to scan the ONNX repository before I forked it.

gh-pr-trust-scan is a Python CLI that checks a repository for automated trust-gate workflows, explicit AI-ban policies in contribution documentation, and rejection-signal labels. The question it answers is: "Will this project reject an AI-assisted PR on policy grounds before anyone reviews the code?"

Running it against onnx/onnx takes a few seconds:

gh-pr-trust-scan onnx/onnx

Scanning onnx/onnx ...

Repo:    onnx/onnx
Verdict: SAFE

Findings:
  [LOW   ] No explicit AI ban label found

Stats:
  Last commit: 1 day ago
  Open PRs: 34
  Closed-no-merge PRs (last 30): 7

Verdict: SAFE. One LOW finding — absence of any rejection-signal label, which is informational rather than a warning. No automated trust-gate workflows, no AI-ban policy language in CONTRIBUTING.md or the PR template, no labels associated with automated rejection.

This is exactly the confirmation I needed before investing time in the implementation. ONNX is a well-maintained project with active CI, a clear DCO sign-off requirement, and no explicit policy against AI-assisted contributions. The scan took less time than reading CONTRIBUTING.md manually.

This is what "dogfooding" looks like in practice. I built the tool to avoid wasted contribution effort, and using it before my own fork means I'm running the same workflow I recommend to others. The SAFE verdict also gave me a calibration data point: the tool correctly identified ONNX's DCO sign-off requirement without misclassifying it as a rejection signal.

The PR

Branch: fix/tar-traversal-separator-guard. I named it around the mechanical fix rather than the vulnerability class — advisory-flavored branch names tend to set an adversarial frame before anyone reads the code.

The change itself is minimal:

# onnx/utils.py — before
if not abs_member.startswith(abs_base):

# onnx/utils.py — after
if not abs_member.startswith(abs_base + os.sep) and abs_member != abs_base:

The and abs_member != abs_base clause handles the edge case where the member path resolves to exactly the base directory itself, which should be allowed.

I also moved abs_base = os.path.abspath(base) outside the loop. The original code recomputed the same value on every iteration — a minor performance fix that also makes the intent clearer.

Three regression tests cover the cases that matter:

# onnx/test/test_tar_members_filter.py (representative cases)
def test_normal_member_allowed(self):
    # tar member inside base → passes filter

def test_sibling_prefix_rejected(self):
    # "../models_evil/pwned.txt" style bypass → raises RuntimeError

def test_exact_base_dir_allowed(self):
    # member resolving exactly to abs_base → allowed

The PR description includes an explicit AI disclosure ("researched and drafted with AI assistance via Claude Code"), zero use of advisory-escalating language (no "CVE," "exploit," "RCE," or "vulnerability" in the PR title or summary), and a DCO sign-off on every commit as ONNX's CONTRIBUTING.md requires. The fix is framed as what it is: a correctness improvement to the path-containment check in a fallback branch.

The PR is at: https://github.com/onnx/onnx/pull/7948

Lessons

The full loop in this session looked like this:

static analysis → bug found
     ↓
proof-of-concept → bug confirmed
     ↓
DEEP_REVIEW → duplicate risk identified, advisory path closed
     ↓
pivot decision → direct PR
     ↓
gh-pr-trust-scan onnx/onnx → SAFE verdict, fork with confidence
     ↓
implementation + tests → PR submitted

Start to PR submission happened inside a single working session. The only reason the loop closed cleanly was the pre-submission review step: without it, I would have filed an advisory that duplicated three prior reports and likely produced no useful outcome.

A few things I am taking forward:

Finding a bug and deciding what to do about it are different skills. Static analysis and proof-of-concept work are pattern recognition problems. Deciding whether to file an advisory, open a PR, or do nothing requires understanding the platform dynamics, the prior report history, and the realistic impact of the finding. These are judgment calls that benefit from a structured second-opinion process.

Pre-submission review is cost-effective at any scope. The review pass took less than the time I would have spent polishing the advisory before submission. Catching duplicate risk at the review stage costs essentially nothing; catching it after submission costs reputation.

Using a tool on your own work produces better feedback than testing it on examples. Running gh-pr-trust-scan against onnx/onnx gave me concrete signal about edge cases — how the tool handles DCO requirements without flagging them as AI-hostile — than any synthetic test scenario. Running it on a real, active OSS project confirmed that severity calibration is sensible.

What comes next: monitor the PR for maintainer feedback. If the review surfaces a preference for the pathlib.Path.relative_to() formulation over startswith + os.sep, that goes into the code and serves as a useful style data point for future contributions.

This article was researched and drafted with AI assistance (Claude Sonnet 4.6 via Claude Code). Tool behavior described matches the gh-pr-trust-scan codebase as of May 2026 (commit d9b365a). The ONNX PR linked above is real and submitted under the same author identity (taiman724 on GitHub, with DCO sign-off).

Pre-fork due diligence for OSS contributors

shunta hayashi — Tue, 12 May 2026 15:11:32 +0000

Note: This article was researched and drafted with AI assistance (Claude Sonnet 4.6 via Claude Code). All claims about specific repository policies are illustrative; readers should verify current state before acting on them.

Why you should scan a repo before you fork it

You found an issue. You know exactly how to fix it. You fork the repo, write the code, open a pull request — and it gets closed in minutes, not by a human, but by an automated workflow you never knew existed. No review. No feedback. Just a bot verdict and a "wasted" label.

This scenario has become noticeably more common in 2025 and 2026. A growing number of open-source maintainers have responded to the flood of low-quality, AI-generated contributions by deploying automated trust-gate systems directly in their CI pipelines. These gates can reject a PR silently — or with a curt machine-generated comment — based on signals that have nothing to do with whether your code is correct. They evaluate who contributed and how, not just what was contributed.

The cost is asymmetric. A maintainer's automated rejection takes milliseconds. The contributor's lost time — understanding the codebase, writing tests, drafting a good PR description — might be hours or days. Pre-fork due diligence costs five minutes. Doing it consistently is one of the highest-leverage habits an AI-assisted contributor can develop in 2026.

Common rejection vectors

Automated trust-gate workflows

The most aggressive rejection mechanism is a CI workflow that evaluates the contributor's account history before it evaluates the code. These tools look at signals like global merge ratio (how many of your past PRs across all of GitHub were merged versus closed), account age, and contribution velocity. If your profile doesn't meet the threshold, the workflow closes the PR automatically and may apply a label like suspicious-author or spam-likely.

These workflows are usually small GitHub Actions that run on pull_request events. They're often invisible from the repo's front page — you have to look inside .github/workflows/ to find them. Common identifiers include step names or action references containing strings like trust-score, min-global-merge-ratio, or references to community-maintained anti-spam action collections. A new GitHub account used primarily for AI-assisted contribution is exactly the profile these tools are tuned to catch.

Anti-slop quality-gate workflows

A second category focuses on content quality rather than account history. These workflows look for statistical signals associated with machine-generated text — unusual vocabulary distributions, patterns common in LLM output, or structural anti-patterns in commit messages and PR descriptions. The term "slop" has become shorthand for this class of low-effort generated content in OSS communities. Workflows in this family typically reference action names or step IDs containing anti-slop or similar identifiers.

It is worth noting that a well-crafted, human-reviewed AI-assisted contribution can pass these checks — but only if the contributor has actually read and understood the code before submitting. Blind "generate and submit" workflows are what these gates are designed to block.

Explicit AI bans in contribution documentation

The third category is simpler to detect but easier to overlook: written policy. Many maintainers have added explicit clauses to CONTRIBUTING.md, PR templates, or even the main README.md stating that AI-generated or AI-assisted contributions are not accepted. Language varies:

"AI tools are not permitted"
"no AI" / "ban AI" / "prohibit AI"
"LLM not allowed" / "Copilot is not allowed"
"all submissions must be human-written"
"human-authored contributions only"

Some policies stop short of an outright ban but require disclosure: "disclose AI" or "AI disclosure required". These MEDIUM-severity signals are worth reading carefully — a disclosure requirement is very different from a ban, but missing it can still get your PR closed.

Rejection-signal labels

Finally, some repos attach labels that serve as a public ledger of past rejections. Labels like no-ai, ai-rejected, human-only, ai-ban, and ai-generated-rejected are visible on closed PRs and on the label list itself. A repo with fifty closed PRs all tagged ai-generated-rejected is telling you something important about maintainer tolerance, regardless of what the written policy says.

The manual scan workflow

You can run a quick scan by hand using the GitHub CLI (gh). The following three commands cover the main surface areas.

Step 1 — Check workflow files for trust-gate patterns:

# List all workflow file names, then inspect suspicious ones
gh api repos/<owner>/<repo>/contents/.github/workflows \
  --jq '.[].name'

# Fetch the content of a specific workflow and grep for known patterns
gh api repos/<owner>/<repo>/contents/.github/workflows/pr-check.yml \
  --jq '.content' | base64 -d | \
  grep -iE 'trust-score|anti-slop|min-global-merge-ratio|fossier'

Step 2 — Scan CONTRIBUTING.md for policy language:

# Fetch CONTRIBUTING.md and search for AI-related policy keywords
gh api repos/<owner>/<repo>/contents/CONTRIBUTING.md \
  --jq '.content' | base64 -d | \
  grep -iE 'no.?ai|ai.is.not.allowed|ai.tools|human.authored|human.written|llm.not.allowed|disclose.ai|ban.ai|prohibit.ai|reject.ai'

Step 3 — Inspect repository labels:

# List all labels; look for rejection-signal names
gh label list --repo <owner>/<repo> | \
  grep -iE 'no-ai|ai-rejected|human-only|ai-ban|ai-generated'

Running all three before you fork gives you a solid picture in under a minute. The limitation is that you have to remember to do it, and you need to know what patterns to look for. That's the gap the tool below is designed to close.

Automating with gh-pr-trust-scan

gh-pr-trust-scan is a small Python CLI that wraps the three manual steps above into a single command, applies a curated set of detection patterns, and produces a machine-readable verdict. It was built specifically to answer one question: "Will this project reject my AI-assisted PR on policy grounds before anyone looks at the code?"

Installing the tool

# Recommended: isolated environment via pipx
pipx install gh-pr-trust-scan

# Or with pip
pip install gh-pr-trust-scan

Note: The package is not yet published to PyPI (coming soon). During the development period, install from source:
git clone https://github.com/taiman724/gh-pr-trust-scan
cd gh-pr-trust-scan
pip install -e ".[dev]"

Requirements: Python 3.10+ and the GitHub CLI (gh) authenticated via gh auth login.

Running a scan

# Basic scan — prints a human-readable verdict
gh-pr-trust-scan owner/repo

# Full GitHub URL also works
gh-pr-trust-scan https://github.com/owner/repo

# JSON output for scripting or CI integration
gh-pr-trust-scan owner/repo --json

The tool produces one of three verdicts:

Verdict	When it fires
`SAFE`	No explicit AI contribution policy detected (all findings LOW or none)
`WARN`	Discouraging policy language or rejection labels found, but no automated gate
`AVOID`	At least one HIGH-severity finding — an automated rejection gate is present

A SAFE verdict on a repo with an actively maintained codebase and no policy signals is a reasonable green light. A WARN verdict calls for reading the actual CONTRIBUTING.md carefully before investing time. An AVOID verdict means a bot will likely close your PR before a human sees it.

Here is what the output looks like for a repo with multiple signals:

Scanning example-org/example-repo ...

Repo:    example-org/example-repo
Verdict: AVOID  (trust-gate detected)

Findings:
  [HIGH  ] Trust-score gate detected in workflow (.github/workflows/pr-review.yml)
  [MEDIUM] 'human-written' requirement found (line 18): All submissions must be human-written. (CONTRIBUTING.md)
  [MEDIUM] Label 'human-only' found

Stats:
  Last commit: 1 day ago
  Open PRs: 23
  Closed-no-merge PRs (last 30): 9

And the equivalent JSON, useful for scripting:

{
  "repo": "example-org/example-repo",
  "verdict": "AVOID",
  "findings": [
    {
      "severity": "HIGH",
      "category": "trust_gate",
      "evidence": "trust-score gate detected in workflow",
      "file": ".github/workflows/pr-review.yml"
    },
    {
      "severity": "MEDIUM",
      "category": "human_only_requirement",
      "evidence": "'human-written' requirement found (line 18)",
      "file": "CONTRIBUTING.md"
    },
    {
      "severity": "MEDIUM",
      "category": "label",
      "evidence": "Label 'human-only' found",
      "file": "labels"
    }
  ],
  "stats": {
    "last_commit": "1 day ago",
    "open_prs": 23,
    "closed_no_merge_last_30d": 9,
    "flagged_closed_prs": 0
  }
}

Adding custom patterns

All detection keywords live in a single file: src/gh_pr_trust_scan/patterns.py. Adding a new trust-gate or policy phrase requires no changes to the scanner logic — just append an entry to the appropriate list:

# patterns.py — adding a custom workflow pattern
WORKFLOW_PATTERNS.append({
    "pattern": r"my-org/custom-trust-gate-action",
    "severity": "HIGH",
    "category": "trust_gate",
    "description": "Custom trust gate action detected",
})

# Adding a new text-file pattern (e.g. a new policy phrase)
TEXT_PATTERNS_HIGH.append({
    "pattern": r"\bno\s+generated\s+code\b",
    "severity": "HIGH",
    "category": "ai_ban_explicit",
    "description": "'no generated code' policy found",
})

The pattern values are Python regexes compiled case-insensitively, so you can handle variations with standard regex syntax. The community is especially interested in patterns for emerging tools and newly observed policy phrases — if you encounter a rejection mechanism that the tool misses, a PR adding the pattern is a concise and high-value contribution.

Closing thoughts

gh-pr-trust-scan is a static signal detector. It catches what is written down and what is visible in the repository's public API. It cannot tell you whether a maintainer will appreciate your change, whether the project is actively maintained, or whether your implementation approach aligns with the project's unstated conventions. Those questions still require reading the repo: scanning open issues, reviewing recent merged PRs, and — when in doubt — opening an issue to discuss before writing code.

The broader advice stands regardless of what tools you use: invest a few minutes of research before you invest hours of implementation. OSS contribution policies are increasingly explicit and machine-enforced. Treating due diligence as part of your workflow, rather than an afterthought, is what separates PRs that get merged from PRs that get closed by bots.

Contributions to gh-pr-trust-scan are welcome. The highest-value PRs are new detection patterns for trust-gate tools or policy language not yet covered. If you encounter a rejection signal that the tool misses, please open an issue first — especially for patterns that touch specific third-party tools, where context matters.

This article was researched and drafted with AI assistance (Claude Sonnet 4.6 via Claude Code). Pattern data and tool behavior are based on the gh-pr-trust-scan codebase as of May 2026. Repository policies change — always verify current CONTRIBUTING.md content before acting on a scan result.