Skip to content

DEV Community

Lewis Sawe

Posted on May 31

I taught Hermes Agent to predict which API changes will break my system

#hermesagentchallenge #devchallenge #agents

Hermes Agent Challenge Submission: Build With Hermes Agent

This is a submission for the Hermes Agent Challenge: Build With Hermes Agent

What I Built

Drift Detective is an MCP server that turns Hermes Agent into an API contract mutation tracker. It probes your microservices on a cron schedule, stores response shapes (fields, types, nesting depth), and classifies changes when they happen: additive, breaking, or cosmetic.

The interesting part: it learns. After you mark a few changes as "safe" or "breaking," it starts predicting. Week one it's noisy. Week three it knows that your payments service adding nullable fields is always fine, but your auth service changing any field name will break downstream consumers.

Demo

The demo runs against a local API server with four mutation stages:

Stage 1 (baseline): Agent records the shape of /api/users, /api/payments, /api/health.

Stage 2 (additive change): New fields appear. Agent flags them as low-urgency additive drift. I mark them "safe."

Stage 3 (breaking change): Fields get renamed and removed. Agent flags these as high-urgency breaking drift. I mark them "breaking."

Stage 4 (prediction fires): More fields get removed. This time the agent predicts "likely breaking" before I say anything. It recognized the removal pattern from my stage 3 feedback.

After a few interactions the alert quality is visibly different from probe 1. That's the whole point.

Code

lewisawe / drift-detective

Drift Detective

API contract mutation tracker that learns what breaks things.

Drift Detective probes your APIs on a schedule, stores response shapes, detects structural changes, and classifies them. It learns YOUR system's patterns from your feedback. Alerts get smarter, not noisier.

What It Does

Probes API endpoints, extracts JSON response shape (field names, types, nesting)
Detects shape changes between probes
Classifies changes: additive (new field) or breaking (removed/renamed/type-changed)
Learns from your feedback. Mark changes as "safe" or "breaking" and it remembers.
Predicts future changes using accumulated knowledge

Hermes Features Used

Feature	How
MCP Server	Custom stdio server providing probe/classify/learn tools
Cron Scheduler	Periodic endpoint probing, no manual intervention
Persistent Memory	Endpoints, shapes, and learned patterns survive across sessions
Learning Loop / Skills	Writes skill docs about your system's change patterns
AGENTS.md Context	Defines alert behavior and classification rules

Demo Walkthrough

1. Start the demo API

python demo/api_server.py

Local API with…

My Tech Stack

Drift Detective's stack:

Python 3.11+ (runtime)
MCP SDK (mcp>=1.0.0) for the stdio server protocol
httpx for probing API endpoints
SQLite for persistence (shapes, history, learned patterns)
Hermes Agent as the orchestrator (MCP client, cron, memory)
Demo API: stdlib http.server (no dependencies)

How I Used Hermes Agent

This isn't a wrapper that calls the LLM once. Five Hermes capabilities do actual work here:

MCP Server (custom stdio): The core engine. Five tools: probe_endpoint, list_endpoints, get_drift_history, record_verdict, get_learned_patterns. All state lives in SQLite. The agent reasons about when and how to call them.

Cron Scheduler: Fires every 30 minutes (configurable). The agent probes all registered endpoints, compares shapes, and delivers a report to Telegram/Discord/wherever you talk to it. No human in the loop for routine checks.

Persistent Memory: Endpoint registry, shape history, and learned patterns survive across sessions. The agent picks up where it left off even after a restart.

Learning Loop / Skills: When the agent accumulates enough feedback, it writes a skill document describing your system's change patterns. That skill loads into future sessions, giving the agent prior context before it even runs a probe.

AGENTS.md Context: Defines classification rules, alert urgency levels, and when to include predictions vs. ask for feedback. Shapes the agent's behavior without touching code.

How It Works (Technical)

The MCP server extracts a structural "shape" from any JSON response:

{"users": [{"id": 1, "name": "Alice"}], "total": 1}

Becomes:

$.total         → integer
$.users[]       → array
$.users[].id    → integer  
$.users[].name  → string

When a shape changes, the diff engine classifies each field-level change:

New field added → additive
Field removed or renamed → breaking
Type changed (string→integer) → breaking

The learning system stores verdicts keyed by endpoint + change category. After one verdict for a pattern, predictions fire on the next similar change. It generalizes: if "removed field: name" was breaking, then "removed field: email" on the same endpoint gets the same prediction. The patterns are simple and domain-specific, so one data point is enough to be useful.

What I'd Build Next

Webhook mode: listen for deploy events from CI/CD, probe immediately after deploys
Consumer registry: know which downstream services depend on which fields, route alerts accordingly
Schema diffing beyond JSON: gRPC protobuf changes, GraphQL schema introspection
Multi-endpoint correlation: "every time auth-service changes, payments-service breaks 2 hours later"

Source Code

drift-detective/
├── mcp_server/server.py          # MCP server with probe/classify/learn tools
├── demo/api_server.py            # Mutable demo API
├── skills/drift-detective-patterns.md
├── AGENTS.md
└── pyproject.toml

Install: pip install -e ., add the MCP config to ~/.hermes/config.yaml, done.

Top comments (0)

Subscribe