Rohith Pavithran

Posted on May 31 • Edited on Jun 3

mnemo — A local-first learning agent that remembers, without leaking

#hermesagentchallenge #devchallenge #agents #hermesagent

Hermes Agent Challenge Submission: Build With Hermes Agent

This is a submission for the Hermes Agent Challenge: Build With Hermes Agent

What I Built

mnemo is a learning-management agent for kids and self-directed learners. It tracks topics, sequences them by prerequisites, schedules spaced-repetition reviews (SM-2), generates quizzes, and adapts the plan over time through a Hermes-driven feedback loop.

The problem: hosted AI tutors send a child's mistakes, struggles, and reading level to someone else's servers. Browser-only chat loops forget everything the moment the tab closes — no cross-session adaptation, no long-term plan. And LLMs happily hallucinate URLs and quiz answers, so nothing stops an unreviewed link from reaching a 6th-grader.

mnemo's answer is a three-part stance baked into the schema, not bolted on:

Local-first by default. SQLite (WAL) on-device, 7 tables, 3 migrations. Hermes-3 runs locally via Ollama. A HuggingFace API path exists for constrained hardware but requires explicit opt-in and is clearly labelled in the UI — every hosted call is logged to the events table.
Two gates before any content reaches the learner. Every external resource starts approved = 0 in the schema. A second gate joins a source_allowlist table at the SQL layer — even an approved resource is suppressed if its origin isn't a trusted source (khan-academy, ck12, internal-curriculum). Both gates have dedicated scenario tests.
The agent proposes; a human approves. Pace changes, resource-type shifts, and any preference write goes through a confirmation gate. The agent literally cannot mutate learner preferences without a human checkbox — enforced via an MCP deny-list, not just UI.

Demo

Pitch deck (single-file Reveal.js, no build step):
pitch.html — 6 slides covering the problem, the Hermes differentiator, architecture, the two-gate safety model, and a realistic dashboard mockup of the shipped UI.

Quick local run:

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
sqlite3 learning.db < migrations/001_init.sql
sqlite3 learning.db < migrations/002_quiz_bank.sql
sqlite3 learning.db < migrations/003_source_allowlist.sql
python -m uvicorn webapp.app:app --reload --port 8000

Open http://localhost:8000 for the dashboard, or run a Hermes session end-to-end:

ollama pull hermes3
curl -X POST http://localhost:8000/api/agent/session \
  -H "Content-Type: application/json" \
  -d '{"learner_id": 1, "backend": "ollama"}'

The dashboard streams each tool call the agent makes, then refreshes progress and the review queue automatically.

Code

Repository: gitea.com/rohithtp/mnemo

Key directories:

webapp/ — FastAPI REST + SSE + static dashboard (25+ endpoints)
mcp_server/ — stdio MCP server exposing 7 tools to Hermes
src/agent/ — Hermes session orchestrator, allowlist, memory, insights
src/learning/ — SM-2 algorithm, sequencing logic, safety/privacy gates
migrations/ — 3 SQL files, schema-version tracked
tests/ — 69 passing tests (SM-2 unit + integration + API + scenario)
docs/ — full build plan, decisions log, 25-risk register, two-profile docs

My Tech Stack

Layer	Choice
Database	SQLite (WAL mode) on-device · libSQL/Turso documented as forward path
Backend	FastAPI + uvicorn + SSE streaming
Agent runtime	Hermes-3 via Ollama (local) · HuggingFace Inference API fallback
Tool surface	stdio MCP (Hermes) + WebMCP progressive enhancement (Chrome 146+)
Frontend	Vanilla JS + plain CSS — no framework, no build step
Spaced repetition	SM-2 algorithm, implemented as pure functions
Tests	pytest — 15 SM-2 unit + 18 integration + 31 API + 5 scenario = 69 passing
Deployment	Dockerfile + systemd unit
Language	Python 3.11+ (3.14 tested)

All Python deps are pinned in requirements.txt. The Hermes-3 model digest is pinned by SHA256 and asserted at startup so a silent upstream change can't drift the agent's behaviour mid-session (R02 mitigation).

How I Used Hermes Agent

Hermes-3 is the orchestration brain for the whole adaptive loop. The reason it had to be Hermes specifically, not a stateless prompt loop, comes down to four capabilities:

1. Tool-calling over a real schema. src/agent/session.py runs a tool-calling loop where Hermes drives a study session by calling the 7 MCP tools (get_progress_summary, get_ready_topics, start_topic, get_next_review_items, record_review_result, recommend_resources, generate_quiz). Each call is logged to the events table with arguments and result. The agent isn't generating advice in prose — it's writing to SQLite through audited tools.

2. Cross-session memory. Every session writes a memory_note event summarising patterns the agent noticed. The next session's system prompt is injected with the most recent notes for that learner, so Hermes "remembers" that this kid struggles with unlike denominators across sessions, not just within one tab. Memory notes are per-learner and stay on-device.

3. Pattern analysis + human-approvable proposals. Phase 6 added POST /api/agent/insights/{learner_id}. Hermes reads the event trail across a configurable window, computes avg_quality, days_active, resource-type variety, and emits structured proposals like {"type": "pace", "direction": "increase", "reason": "avg quality above 4.0"}. Each proposal is stored with resolved_at = NULL and surfaces in the dashboard with Approve / Dismiss buttons. The agent never silently applies pace changes — a human resolves each proposal.

4. Allowlist-enforced tool boundary. src/agent/allowlist.py defines AGENT_ALLOWLIST and AGENT_DENY_LIST as constants. set_learner_preferences is on the deny-list, so even if the model is jailbroken into trying to mutate preferences directly, the session orchestrator raises PermissionError before the tool runs. This is the difference between "the UI hides it" and "the agent cannot do it."

The combination — local model + persistent memory + audited tool calls + human-in-the-loop on every state-mutating proposal — is what makes mnemo trustable as a coach for a child, not just for an adult tinkerer. That posture is what the Hermes agent makes practical; a hosted stateless chatbot can't enforce any of it.

DEV Community