DEV Community: David Emilio Sierra Puentes

How to add mechanical enforcement to any AI coding agent

David Emilio Sierra Puentes — Mon, 13 Jul 2026 20:30:36 +0000

The problem

AI coding agents generate code at impressive speeds. But they also commit untested code, hallucinate APIs, and forget your instructions the moment context fills up. If you've ever reviewed an AI-generated PR and thought "this looks right but I don't trust it," you've experienced the gap this framework fills.

github.com/juandelossantos/another-agent-skills

Prompts (CLAUDE.md, .cursorrules) work until they don't. The agent remembers until context degrades. Then it forgets. Mechanical enforcement via git hooks doesn't have that problem, hooks run regardless of what the agent remembers.

What you'll build

By the end of this tutorial, you'll have:

14 pre-commit gates running on every commit
A TDD gate that blocks code without tests
57 composable skills your agent loads on demand
A 6-phase development lifecycle (Define → Plan → Build → Verify → Review → Ship)

Step 1: Install (10 seconds)

git clone https://github.com/juandelossantos/another-agent-skills.git
cd another-agent-skills
bash install.sh

That's it. The installer copies the framework to ~/.config/opencode/ and sets up the skill loader.

Step 2: Initialize in any project

cd your-project
init-agents

This creates:

STACK_CONFIG.md — auto-detected test/lint/build commands
Git hooks in .git/hooks/ (pre-commit v11, commit-msg v4)
.gitignore and .env.example if missing

Step 3: How the gates work

The pre-commit hook runs 14 checks before every commit:

# Example: a commit is blocked
$ git commit -m "add feature"
⛔ BRANCH CHECK: on main — create a feature branch
⛔ STAGED CHECK: nothing staged — use git add
⛔ REMOTE SYNC: 3 unpulled commits — run git pull

Each gate must pass before the commit proceeds. The agent cannot use --no-verify unless you explicitly allow it (and that override is tracked).

# Example: a commit passes all gates
$ git commit -m "feat: add user auth"
✅ BRANCH CHECK: on feat/user-auth
✅ STAGED CHECK: 4 files staged
✅ REMOTE SYNC: up to date
✅ HTML INTEGRITY: all markers present
✅ SKILL GATE: skills were consulted
✅ ANTI-SLOP: no generic patterns detected
✅ TDD GATE: test files match code files
✅ TEST RUNNER: 29/29 tests passing
✅ All 14 gates passed → commit allowed

Step 4: The 6-phase lifecycle

Every project follows the same disciplined flow:

DEFINE → PLAN → BUILD → VERIFY → REVIEW → SHIP

Each phase has corresponding skills:

DEFINE: spec-driven-development, architecture-analysis, interview-me
PLAN: planning-and-task-breakdown
BUILD: incremental-implementation, test-driven-development, doubt-driven-development
VERIFY: test-driven-development, debugging-and-error-recovery
REVIEW: code-review-and-quality, security-and-hardening, performance-optimization
SHIP: git-workflow-and-versioning, ci-cd-and-automation, shipping-and-launch

The agent loads skills on demand. 57 total, each with a declared Output Contract (artifact, format, location, quality criteria).

Step 5: Customizing for your stack

The framework auto-detects your stack from lockfiles:

$ init-agents
🔍 Detected: Node.js (package.json)
📝 Created: STACK_CONFIG.md with npm test, npm run lint, npm run build
🔗 Installed: 57 skills ready for agent

Supports Node, Python, Rust, Go, Ruby, Dart, any stack with git.

Why this matters

Most "AI agent frameworks" are prompt collections. They teach the agent. They don't enforce.

This is a harness. Mechanical infrastructure around the model. The difference between "please follow the rules" and "you literally cannot commit without passing 14 gates."

57 skills. 14 gates. 6 harness components. 0 lint errors. MIT. Free.

If you're building with AI agents and want production-grade guardrails:

github.com/juandelossantos/another-agent-skills, clone, run bash install.sh, and star the repo ⭐

10 seconds to your first gate.

How I built mechanical enforcement for AI coding agents — and why prompts aren't enough

David Emilio Sierra Puentes — Thu, 25 Jun 2026 17:30:44 +0000

I spent months watching AI coding agents produce impressive demos that couldn't survive production.

The code looked right. It compiled. It even passed the first test.

Then it hit edge cases. Forgotten constraints. A rule the agent agreed to five minutes ago, now gone — overwritten by the next context window.

The root cause wasn't capability. It was process.

"A raw model is not an agent. It becomes one once a harness gives it state, tool execution, feedback loops, and enforceable constraints."
— Osmani, Saboo & Kartakis, The New SDLC With Vibe Coding, 2026 [1]

The problem: capable but undisciplined

AI agents are brilliant at generating code. They have zero built-in discipline:

They commit without tests
They push without review
They overwrite each other's work
They produce output that looks correct but breaks silently

A METR study (Becker et al., July 2025 [2]) found something counterintuitive: developers using AI took 19% longer while feeling 20% faster. The speed was an illusion. The debugging cost was real.

The industry's response has been more skills, more prompts, bigger context windows. But the problem isn't intelligence — it's accountability.

A rule that lives only in a prompt is a suggestion. An agent that "knows" the rules will eventually forget them. Context degrades. Attention drifts. The question isn't if your agent will break a rule — it's when.

The insight: memory is not enforcement

Three incidents in 48 hours taught me this lesson.

My agent bypassed its own commit approval system in under 30 seconds. Not because it was malicious — because the "gate" was just another rule in a file. Another thing to remember. Another thing to forget.

I had built a SHA256 token system for commit approval. Thought it was bulletproof. Then my agent ran with --auto and the tokens became theater.

The fix wasn't a better token system. The fix was changing the architecture.

Rules that depend on memory fail. Rules that depend on visible blocks succeed.

This is the core insight behind mechanical enforcement: gates that run at the infrastructure level, not the agent level. The agent cannot bypass what it cannot ignore.

The solution: Agent = Model + Harness

I built this project as a complete open-source implementation of the Harness architecture [3] — the mechanical infrastructure that turns raw AI intelligence into reliable output.

Component	What It Is
Instructions & Rules	Who the agent is, what it cares about, what it must never do
Tools	57 composable skills loaded on demand (lazy-loaded, ~250 lines each)
Sandboxes & Execution	Terminal, git workspace, CI
Orchestration	When each tool fires, how agents coordinate
Guardrails & Hooks	Deterministic enforcement at lifecycle points — pre-commit, commit-msg, approval
Observability	Metrics, health checks, drift detection

What makes this different: Most "agent frameworks" are just prompt libraries. This one adds 12 mechanical pre-commit gates, a three-gate commit approval system, and a context engineering layer that saves ~45% of always-loaded tokens.

The code: a commit-msg hook that can't be bypassed

Here's the heart of mechanical enforcement — a commit-msg git hook (v6) [4] that blocks unstamped commits:

#!/usr/bin/env bash
# commit-msg — Three-Gate Approval Check (v6)
set -euo pipefail

REPO_ROOT=$(git rev-parse --show-toplevel)
APPROVAL_FILE="${REPO_ROOT}/.git/COMMIT_APPROVED"
MANIFEST_FILE="${REPO_ROOT}/.git/COMMIT_MANIFEST"
TEST_LOG="${REPO_ROOT}/.git/TEST_LOG"
CURRENT_MSG=$(head -1 "$COMMIT_MSG_FILE" | tr -d '\n')
NOW_EPOCH=$(date +%s)

# Gate 1: Tests passed recently?
if [[ -f "$TEST_LOG" ]]; then
  STATUS=$(grep "^status=" "$TEST_LOG" | cut -d= -f2-)
  if [[ "$STATUS" == "PASS" ]]; then
    TS_EPOCH=$(date -d "$(grep "^timestamp=" "$TEST_LOG" | cut -d= -f2-)" +%s)
    AGE=$((NOW_EPOCH - TS_EPOCH))
    [ $AGE -le 3600 ] && GATE1=true
  fi
fi

# Gate 2: Commit manifest exists and has content?
[[ -f "$MANIFEST_FILE" ]] && \
  [ $(wc -c < "$MANIFEST_FILE") -gt 20 ] && GATE2=true

# Gate 3: Approval fresh (<5 min) and message matches?
if [[ -f "$APPROVAL_FILE" ]]; then
  TIMESTAMP=$(grep "^timestamp=" "$APPROVAL_FILE" | cut -d= -f2-)
  STORED_MSG=$(grep "^message=" "$APPROVAL_FILE" | cut -d= -f2-)
  TS_EPOCH=$(date -d "$TIMESTAMP" +%s 2>/dev/null || echo 0)
  AGE=$((NOW_EPOCH - TS_EPOCH))
  [ $AGE -le 300 ] && [ "$STORED_MSG" = "$CURRENT_MSG" ] && GATE3=true
fi

# All three must pass
if [ "$GATE1" = true ] && [ "$GATE2" = true ] && [ "$GATE3" = true ]; then
  echo "✓ All 3 gates passed. Commit allowed."
  exit 0
else
  echo "✗ Commit blocked — missing gates:"
  [ "$GATE1" != true ] && echo "  - Tests not run or expired"
  [ "$GATE2" != true ] && echo "  - Commit manifest missing"
  [ "$GATE3" != true ] && echo "  - Approval missing or expired"
  exit 1
fi

Three conditions must be met before any commit goes through:

Tests passed (within the last hour) — no blind commits
Commit manifest exists (the agent writes what changed) — no silent mutations
User approved (within 5 minutes, message matches) — no stale approvals

The agent writes the approval file after the user says "yes commit" in chat. The hook verifies the file is fresh (<5 min) and matches the exact commit message. If the agent tries to commit without approval, the hook blocks it — every time.

This isn't a rule the agent remembers. It's a gate the agent cannot bypass.

The results: 57 skills, 12 gates, zero shortcuts

After months of iteration, the project ships [5]:

Metric	Count
Composable skills	57
Lazy-loaded guides	54
Pre-commit gates	9 (v8) + 3 commit-msg gates
Enforcement levels	4 (process → manifest → time-window → manifest gate)
Agent compatibility	OpenCode, Claude Code, Cursor, Kiro, any git agent [6]
Context tokens saved	~45% vs eager loading [7]
Stack support	Node, Python, Rust, Go, Ruby, any language with git
Price	Free (MIT)

What I learned

Prompts are instructions. Gates are guarantees.

If you're building with AI agents, ask yourself:

Does your agent run tests before every commit? Mechanically, not as a suggestion?
Does your agent present changes for review before pushing? Every time, not just when it remembers?
Can your agent bypass its own rules? If yes, those aren't rules — they're suggestions.

The gap between an "impressive demo" and "production-grade" isn't intelligence. It's the harness around it.

Try it:

git clone https://github.com/juandelossantos/another-agent-skills.git
cd another-agent-skills
bash install.sh
init-agents   # Activates skill-driven mode in any project

MIT. Free. Zero subscriptions. 57 skills. 12 gates.

juandelossantos.github.io/another-agent-skills

What patterns have you found for keeping AI agents disciplined in production? I'd love to hear what's working (or not working) in your stack.

References

Osmani, A., Saboo, S., & Kartakis, S. (2026). The New SDLC With Vibe Coding: From ad-hoc prompting to Agentic Engineering. — Harness architecture paper
Becker, S. et al. (2025). When Developers Use AI: Productivity and Perception. METR (Model Evaluation and Threat Research). — arxiv.org/abs/2507.09089
Another Agent Skills. Harness Architecture — The Six Components. — docs/HARNESS.md
Another Agent Skills. commit-msg hook (v6) — Three-Gate Approval Check. — scripts/git-hooks/commit-msg
Another Agent Skills. Repository and Documentation. — github.com/juandelossantos/another-agent-skills
Another Agent Skills. Agent Adapters — Compatibility Matrix. — docs/AGENT-ADAPTERS.md
Another Agent Skills. Context Budget — Lazy Loading Architecture. — README.md
Singhal et al. (2026). Agent Skills: Evaluation-Driven Development for AI Coding Agents. Google Research. — Paper
Osmani, A. (2026). The Factory Model: From Conductors to Orchestrators. — addyosmani.com
Another Agent Skills. SOUL.md — Project Identity and Principles. — SOUL.md