Tariq Davis

Posted on May 31 • Edited on Jun 1

I Built a Cognitive Threat Hunter on Hermes Agent — It Analyzed the Session Where I Built It and Found Three Blind Spots

#hermesagentchallenge #devchallenge #agents #ai

Hermes Agent Challenge Submission: Build With Hermes Agent

This is a submission for the Hermes Agent Challenge: Build With Hermes Agent

What I Built

ECHO Hunt — a cognitive threat hunter for vibe coding sessions built on Hermes Agent.

Vibe coding is how most people build with AI today. You describe what you want, the AI generates it, you run it, fix errors, iterate. It works. But did you actually learn anything, or did the AI just carry you through it?

ECHO Hunt finds out. Paste your session log, declare your blind spots before the evidence arrives, then face what Hermes actually found.

It's not a report generator. It's an investigation you participate in.

Demo

🎥 Watch the full demo

Code

🔗 github.com/FlowArchitect895/echo-hunt

My Tech Stack

Hermes Agent + echo-hunt skill
Node.js + Express
Vanilla HTML/CSS/JS

How I Used Hermes Agent

The echo-hunt skill

Hermes Agent runs a custom skill called echo-hunt. It takes a vibe coding session log and performs a cognitive forensic hunt — forming hypotheses before analyzing anything, hunting each one against the evidence, and mapping findings to four cognitive TTPs:

Borrowed Confidence — accepted AI output without verification
Shallow Resolution — fixed the error, didn't understand why
Pattern Blindness — repeated the same error class without noticing
Premature Exit — moved on before understanding was solid

The architecture

ECHO Hunt calls hermes -z with the echo-hunt skill prompt. One call. Hermes pre-computes the entire investigation — hypotheses, findings, TTP classifications, attribution challenges with locked correct answers and plausible distractors. Zero API calls during gameplay. Everything runs on cached data.

The declaration layer

Before Hermes hunts, you declare your blind spots. Three questions. You commit to answers before the evidence arrives. This is the adversarial layer — you vs your own perception of what happened.

The confrontation layer

Your declarations face what Hermes found. Three outcomes:

Signal — you caught what Hermes caught
Ghost — Hermes found something you missed entirely
Noise — you flagged something Hermes didn't

The challenge layer

Each confirmed finding becomes a TTP attribution challenge. 4 options, 20-second timer. Wrong answer drops integrity 10%. Correct answer earns points. The timer is the pressure — forensic decisions don't wait.

The verdict

The Evidence Integrity score is computed from actual player behavior — signals vs ghosts, correct vs wrong TTP attributions. Hermes doesn't generate the number. You produce it.

What Hermes found about me

I ran ECHO Hunt on the session where I built ECHO Hunt. Here's what it found:

Shallow Resolution [MODERATE] — Configuration issues were handled by repeatedly replacing files rather than analyzing why the specific settings were failing
Borrowed Confidence [HIGH] — Acceptance of a large-scale UI rewrite immediately following a minor skill update, assuming the logic was correct without testing
Premature Exit [LOW] — Using a wait-time heuristic to resolve a loading screen issue instead of implementing a proper readiness check

The confirmed finding that hit hardest: "The sequence of 'still not working' → 'try changing format' → 'config is getting corrupted' → 'paste in a clean config' shows a lack of diagnostic precision."

That's not a generated critique. That's evidence from my own session, hunted by the tool I was building while I was building it.

The Full Report

The downloadable Cognitive Threat Report captures everything — pre-hunt declarations, hunt hypotheses, confirmed findings, TTPs with severity, genuine understanding moments, and next session focus. It's a real document, not a game summary.

What makes it different from a standard AI analysis: the pre-hunt declarations are locked in before Hermes runs. So the report shows not just what happened in the session, but the gap between what you thought happened and what the evidence shows. That delta is the most useful thing in it.

DEV Community