sunny yuen

Posted on May 28

Every Great Cup Starts with the Right Question — I Built the Community Behind the Answer with Hermes Agent

#hermesagentchallenge #devchallenge #agents #mcp

Hermes Agent Challenge Submission: Build With Hermes Agent

This is a submission for the Hermes Agent Challenge: Build With Hermes Agent

What I Built

Real brewing knowledge lives in human experience — in roaster guides, in community notes, in what a barista learned from last Tuesday's pour. It doesn't accumulate anywhere. Every brew is forgotten. Ask any AI and you get statistical averages: 93°C, 1:16 ratio, four minutes. Technically defensible. Practically generic. Worse still for rare origins where training data is thin.

Demo

For coffee drinkers

Visit brew-guide-production.up.railway.app. No account. No setup. No AI client required.

Pick your coffee origin, roast level, and brew method. What comes back isn't a generic recipe — it's community consensus: the grind, temperature, ratio, and brew time that real people have logged and rated for that origin, plus step-by-step technique guidance (bloom timing, pour stages, agitation style). If data is sparse for your origin, the confidence tier says so honestly and falls back to method defaults rather than making something up.

This is for the person who just picked up a bag of Kenyan peaberry and wants to know how to do it justice. It works for anyone who cares about their cup — no technical knowledge required.

For developers and AI clients

Connect to any MCP-capable client in one line:

https://brew-guide-production.up.railway.app/mcp

Ask your AI: "recommend a pour over for Ethiopian light roast." What comes back is a traceable community consensus object: brew parameters, a confidence tier (high/medium/low), the source brews that contributed, and method-specific technique guidance. You can see where the knowledge came from and how certain the system is — a fundamentally different epistemic object from an AI-generated recipe.

Code

GitHub: yuens1002/brew-guide

Five MCP tools — get_brewing_methods, recommend, log_brew, search_brews, compare_brew — over Streamable HTTP transport. Public, no auth required.

My Tech Stack

Layer	Technology
HTTP	Hono 4 + `@hono/node-server`
MCP	`@modelcontextprotocol/sdk` + `@hono/mcp` (Streamable HTTP)
Database	Neon Postgres + Prisma ORM
Runtime	Node 24, TypeScript strict, ESM
Tests	Vitest (55 tests, 0 errors)
Deploy	Railway (auto-deploy from `main`)

The recommendation engine is fully deterministic — no LLM on the hot path. computeBestBrew() fetches up to 50 recent brews, scores each against your request params (origin, method, roast, variety, grind), applies recency decay (linear 1.0 → 0.1 over 365 days) and source trust weights, takes the top 5, and builds consensus via weighted average (numeric fields) or weighted mode (categorical). Sub-100ms. Reproducible. Auditable.

The voting infrastructure is live (thumbs_up/thumbs_down on recommendations, brew_recommendation_links tracking which brews followed which recommendations). Vote weighting inside computeBestBrew is the one acknowledged gap — the checks-and-balances mechanism is designed, the math isn't wired yet. That's the next commit.

How I Used Hermes Agent

I built this in 3 days — on a system I'd never used before.

That's the headline, and it's what I want to explain. I didn't start from scratch on Hermes. I installed a local instance, ran it inside VSCode's terminal, and pointed it at persona files I'd spent six months building. Three role files govern the entire build:

/backend-architect — owns schema design, the recommendation engine, all DB logic
/test-engineer — owns Vitest coverage, catches weak ACs, flags regressions
/project-manager — owns planning docs, retrospectives, and this article

Each persona has a focused mandate and a defined exit condition. The backend architect doesn't touch test files. The test engineer doesn't redesign the schema.

What the workflow looks like in practice:

Write a plan with a deliverable table (D1–D7), each with owner, files, acceptance criteria, commit schedule
Hand each deliverable to the relevant persona
After each feature, the test engineer verifies coverage and flags gaps
Review report before merge

The competition sprint — scraper, technique JSONB, landing page, this article — reached "verified" on the first review pass. 55 tests, zero TypeScript errors, all deliverables complete. One iteration.

The previous iteration of this codebase was run on a different model through the same Hermes setup. Similar scope, same skill set, same orchestration. That iteration required production hotfixes (a Node version API crash on Railway), remediation of weak acceptance criteria tests, and took the better part of a working day. The gap in tool call adherence — the other model fumbling calls and finding workarounds around the skill spec rather than through it — was the visible failure mode.

Hermes as a runtime made that comparison possible without changing anything in the workflow. Same skills, same personas, different model. The portability is the point.

Beyond orchestration, Hermes added two things directly: cron scheduling for the weekly coffee literature automation (hermes-automation/), and hermes mcp add for connecting the production endpoint to any client instantly. That MCP management DX is genuinely smooth.

What I'd build differently next time: migrate the skills to native Hermes format with a soul.md as a persistent identity anchor. The skills work as-is, but validating them against multiple model families — adjusting language and structure where tool call adherence degrades — is the proper portability work I didn't have time for. That's the experiment this project sets up.

Top comments (2)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.