Bounty Scout: I gave Hermes the job of finding work that pays — and it wrote its own skill to do it

#hermesagentchallenge #devchallenge #agents #opensource

Hermes Agent Challenge Submission: Build With Hermes Agent

This is a submission for the Hermes Agent Challenge: Build With Hermes Agent.

What I Built

Bounty Scout — a small agent that finds funded open-source bounties worth
actually working on, and gets better at judging them every time it runs.

I didn't want to build another "wrap an LLM in a loop" demo. Hermes Agent's
defining feature is a closed learning loop: after doing a task it can write a
reusable skill, and then improve that skill the next time. So I built the
smallest project that makes that loop the whole point.

The job I gave it is one I genuinely care about: which open-source bounties can an
AI-assisted developer realistically win and get paid for? In 2026 that's a real
filtering problem — lots of funded issues now explicitly ban AI contributions or
demand human-only proof, and a naive scraper happily wastes your time on them.

The self-improving loop (the actual demo)

Run	What Hermes did
Run 1	Scouted GitHub for funded bounties, triaged 20 of them against a 7-axis rubric, wrote a ranked shortlist — and authored a `bounty-triage` skill from scratch.
Run 2	Loaded the skill it wrote, scored fresh bounties, appended new finds — then edited its own skill, tightening the dollar-amount parsing it found brittle.

That second row is the magic. Here's the end of Run 2's transcript, in its own words:

4. I improved the `bounty-triage` skill by updating its SKILL.md...
   - "Funded?" score 2 → "Clear cash payout explicitly stated
     (now robustly parsed from title, including decimals)."
   - "Dollars-vs-effort?" → "scoring now includes type check for
     numerical estimated dollar amount."

It noticed its own weakness and patched its own playbook. Run 3 starts smarter than
Run 1 did — with zero changes from me.

A slice of what it actually surfaced (it correctly VETO'd a security/PIN bounty
as out of an AI's safe zone, and flagged AI-friendly ones as pursue):

Title	Verdict	Est.	Why
Attachment Summarizer Service	pursue	$960	High payout, AI-friendly, good stack fit
Low Hanging Fruit Automation	pursue	$700	Explicitly AI-friendly, small tasks
Note Locking — Biometrics/PIN	avoid	$660	Security topic; needs careful human review

How I Used Hermes Agent

Skill creation + self-improvement — the core. Hermes wrote bounty-triage and then revised it across runs. The skill file in the repo is Hermes's, not mine.
Terminal tool — it runs gh search issues to pull live bounty data itself.
Autonomous multi-step execution (--yolo) — fetch → triage → write the shortlist → author/refine the skill, all unattended in one shot.
OpenRouter backend — model-agnostic; this demo runs on google/gemini-2.5-flash.

The whole two-run demo cost about $0.25 in inference.

Demo

demo-run-2.txt in the repo is the raw run-2 transcript (skill reuse + the
self-edit). SKILL.bounty-triage.md is the skill Hermes authored and then improved.

Code

👉 Repo: https://github.com/emaadshamsi/bounty-scout

# prereqs: uv, gh (authenticated), OPENROUTER_API_KEY
./scout.sh   # installs Hermes, configures OpenRouter, runs both passes

My Tech Stack

Hermes Agent (Nous Research, MIT)
OpenRouter → google/gemini-2.5-flash
GitHub CLI (gh) as the live data source
uv for an isolated Python 3.11 env
Bash glue (scout.sh)

Honest notes

On a cheap fast model the triage prose is solid-but-templated — a stronger model
sharpens the verdicts, but the architecture is the point. Scouting is
GitHub-label-based, so it's broad, not exhaustive. This is a focused demo of the
self-improving loop, not a finished bounty-hunter.

But that loop is the part I'll keep using: an agent that writes down what it learns
and gets sharper on its own is exactly what you want pointed at a messy,
ever-changing problem like "where's the work that pays?"

Top comments (1)

Harjot Singh • May 31

An agent writing its own skill to accomplish a goal is the genuinely interesting (and slightly unnerving) frontier - self-extending capability. It's powerful because the agent isn't limited to the tools you pre-built; it's risky for exactly the same reason, since a self-authored skill is unreviewed code the agent then runs with its own privileges. The thing I'd build a hard boundary around: a self-written skill should be a proposal that gets validated/sandboxed before it's allowed to execute, not auto-trusted because the agent wrote it. Self-extension + a gate = useful; self-extension + auto-run = the agent can grant itself arbitrary new powers, which is the plot of every cautionary tale.

This is squarely the tension I design around in Moonshift, the thing I build - a multi-agent pipeline that takes a prompt to a deployed SaaS, where agents can generate capability but a verify layer gates what actually runs, so "the agent wrote it" never means "the agent gets to run it unchecked." Bounty Scout is a cool concrete example of the pattern. Multi-model routing keeps a build ~$3 flat, first run free no card. Genuinely fun project. When Hermes wrote its own skill, did you gate/review it before it ran, or did it self-execute? That's the line between an impressive demo and something I'd let loose unsupervised.