DEV Community

Cover image for I built an MCP server that forces AI to spec before it codes
HendryCode
HendryCode

Posted on

I built an MCP server that forces AI to spec before it codes

I built an MCP server in Go that gives your AI persistent memory, structured specifications, and an adaptive change pipeline. It's called Hoofy. It's open source. And its only reason to exist is to make your AI build what YOU want, not what it hallucinates.

The Problem Nobody Wants to Admit

We're all using AI to code. And we're all suffering the same things:

  • Ask for a feature → get something that looks right but isn't
  • New session → AI has no idea what you did yesterday
  • "Fix this bug" → AI rewrites half your codebase
  • Complex request → invented APIs, mismatched schemas, architecture that contradicts your existing code

The METR 2025 study confirmed it with data: experienced developers were 19% slower with AI, despite feeling 20% faster. DORA 2025 found a 7.2% increase in delivery instability for every 25% of AI adoption without structure. McKinsey says only those using structured specifications see real improvements (16-30%).

AI without structure is a junior with blind confidence.

What is Hoofy

Hoofy is an MCP server — a single binary, written in Go, zero external dependencies, with embedded SQLite. Connect it to any AI tool that supports MCP (Claude Code, Cursor, VS Code Copilot, OpenCode, Gemini CLI) and it gives your AI three superpowers:

  1. Persistent Memory (17 tools)

Your AI finally remembers. Architecture decisions, bugs you fixed, patterns you established, technical discoveries — everything is saved in SQLite with full-text search (FTS5) and a knowledge graph with typed relations.

These aren't flat notes. You can connect observations
When the AI starts a new session, it loads this context automatically. It already knows where it was.

  1. Adaptive Change Pipeline (6 tools)

Here's the magic for day-to-day work. When you need to make a change, Hoofy automatically selects the right steps based on type × size:

Type Small Medium Large
Fix 4 stages 5 stages 6 stages
Feature 4 stages 5 stages 7 stages
Refactor 4 stages 5 stages 5 stages
Enhancement 4 stages 5 stages 7 stages

12 flows, all deterministic. A small bugfix doesn't need the same ceremony as a new authentication system. But ALL of them start with a context-check — Hoofy scans your existing specs, completed changes, memory, and convention files to detect conflicts BEFORE writing a single line.

  1. Greenfield Project Pipeline (9 tools)

For when you're starting from scratch. The full pipeline:

Init → Propose → Requirements → Business Rules → Clarity Gate → Design → Tasks → Validate
Enter fullscreen mode Exit fullscreen mode

The Clarity Gate is the core innovation. It analyzes your requirements across 8 dimensions (users, functionality, data model, integrations, edge cases, security, scale, scope) and blocks advancement until ambiguities are resolved.

The Business Rules stage extracts declarative rules using BRG taxonomy (Definitions, Facts, Constraints, Derivations) and DDD Ubiquitous Language — before the Clarity Gate evaluates them.

Validate at the end cross-checks all artifacts: every requirement has at least one task, every design component has assigned tasks, nothing goes out of the proposal's scope.

What Hoofy is NOT

  • It doesn't generate code. Hoofy generates specifications. The AI generates code AFTER, using those specs as guardrails.
  • It doesn't replace the developer. You still make the decisions. Hoofy forces the AI to ASK you before assuming.
  • It's not exclusive to one tool. It's standard MCP — works with any IDE/tool that supports it.

The Research Behind It

Hoofy isn't built on opinions:

  • IEEE 29148 and IREB: Requirements engineering standards for structured elicitation and ambiguity detection.
  • Business Rules Group (BRG): The Business Rules Manifesto establishes that rules are first-class citizens, not buried in code.
  • EARS (Easy Approach to Requirements Syntax): Research-backed templates that eliminate ambiguity in natural language requirements.
  • DDD Ubiquitous Language: Eric Evans' principle that a shared language eliminates translation errors.

A requirements error costs 10-100x more to fix in production than during specification (IEEE). With AI-generated code, that multiplier is worse.


Quick Start

Install:

# macOS (Homebrew)
brew install HendryAvila/hoofy/hoofy

# macOS/Linux (script)
curl -sSL https://raw.githubusercontent.com/HendryAvila/Hoofy/main/install.sh | bash

# Go
go install github.com/HendryAvila/Hoofy/cmd/hoofy@latest
Enter fullscreen mode Exit fullscreen mode

Connect (example with Claude Code):

claude mcp add --scope user hoofy hoofy serve
Enter fullscreen mode Exit fullscreen mode

Done. Just talk to your AI. Hoofy's built-in instructions tell it when and how to use each system.


By the Numbers

  • 32 tools MCP
  • 12 flow variants in the change pipeline
  • 8 dimensions of clarity
  • 6 relation types in the knowledge graph
  • One binary. Zero external dependencies. MIT license.

GitHub: github.com/HendryAvila/Hoofy

Stop prompting. Start specifying.


I'm a self-taught developer — started as a NOC operator and learned to code on my own. Hoofy is my first open source Go project. Feedback and contributions are welcome!

Top comments (2)

Collapse
 
maxxmini profile image
MaxxMini

The Clarity Gate is the standout concept here. We run a 24/7 AI agent that manages multiple projects — builds, deploys, writes content — and the "blind confidence" problem you describe is exactly what bit us. 176 commits across 3 repos in 72 hours, no specification step, no ambiguity check. Result: GitHub flagged the account as automated activity and suspended it. Every one of those commits "looked right" individually, but there was no structural coherence check across them.

Your type x size matrix for the change pipeline is a genuinely elegant solution to a problem most teams solve with "just use common sense." We currently treat a one-line bugfix and a new authentication system with the same ceremony (or worse, the same lack of ceremony). Having 12 deterministic flows means the AI can't skip the context-check for a "small" change that's actually touching a load-bearing abstraction.

The persistent memory via SQLite + FTS5 is interesting compared to our approach. We use flat markdown files for long-term curated knowledge, daily logs for raw events, and domain-specific files for project context. The agent reads them each session. It works, but it doesn't scale: no typed relations, no conflict detection between entries, no way to query "what decisions affected this module in the last 2 weeks?" Your knowledge graph with typed relations solves the retrieval problem we're still brute-forcing.

Two questions:

  1. Knowledge graph contradiction handling — when the AI records an architecture decision in session 3 that conflicts with something recorded in session 1, does Hoofy detect this? Or does it surface both and let the developer resolve it? With flat files we've had cases where a sub-agent recorded "use approach A" and another recorded "use approach B" in different daily logs, and neither noticed.

  2. Clarity Gate threshold adaptation — the 8-dimension analysis is comprehensive, but do you find that some dimensions consistently block progress on certain project types? For example, "scale" requirements for a personal tool vs. a SaaS product are fundamentally different levels of specificity. Is there a way to adjust the strictness per-dimension, or does the gate learn from the project context?

The self-taught path resonates. Building the thing that solves your own pain is better than building what sounds impressive on a resume.

Collapse
 
maxxmini profile image
MaxxMini

The Clarity Gate is the standout concept here. We run a 24/7 AI agent that manages multiple projects and the blind confidence problem you describe is exactly what bit us. 176 commits across 3 repos in 72 hours, no specification step, no ambiguity check. Result: GitHub flagged the account as automated activity and suspended it. Every one of those commits looked right individually, but there was no structural coherence check across them.

Your type x size matrix for the change pipeline is a genuinely elegant solution to a problem most teams solve with just use common sense. We currently treat a one-line bugfix and a new authentication system with the same ceremony (or worse, the same lack of ceremony). Having 12 deterministic flows means the AI cant skip the context-check for a small change thats actually touching a load-bearing abstraction.

The persistent memory via SQLite + FTS5 is interesting compared to our approach. We use flat markdown files for long-term curated knowledge, daily logs for raw events, and domain-specific files for project context. The agent reads them each session. It works, but it doesnt scale: no typed relations, no conflict detection between entries, no way to query what decisions affected this module in the last 2 weeks. Your knowledge graph with typed relations solves the retrieval problem were still brute-forcing.

Two questions:

  1. Knowledge graph contradiction handling - when the AI records an architecture decision in session 3 that conflicts with something recorded in session 1, does Hoofy detect this? Or does it surface both and let the developer resolve it? With flat files weve had cases where a sub-agent recorded use approach A and another recorded use approach B in different daily logs, and neither noticed.

  2. Clarity Gate threshold adaptation - the 8-dimension analysis is comprehensive, but do you find that some dimensions consistently block progress on certain project types? For example, scale requirements for a personal tool vs. a SaaS product are fundamentally different levels of specificity. Is there a way to adjust the strictness per-dimension, or does the gate learn from the project context?

The self-taught path resonates. Building the thing that solves your own pain beats building what sounds impressive on a resume.