I spent months watching AI coding agents produce impressive demos that couldn't survive production.
The code looked right. It compiled. It even passed the first test.
Then it hit edge cases. Forgotten constraints. A rule the agent agreed to five minutes ago, now gone — overwritten by the next context window.
The root cause wasn't capability. It was process.
"A raw model is not an agent. It becomes one once a harness gives it state, tool execution, feedback loops, and enforceable constraints."
— Osmani, Saboo & Kartakis, The New SDLC With Vibe Coding, 2026 [1]
The problem: capable but undisciplined
AI agents are brilliant at generating code. They have zero built-in discipline:
- They commit without tests
- They push without review
- They overwrite each other's work
- They produce output that looks correct but breaks silently
A METR study (Becker et al., July 2025 [2]) found something counterintuitive: developers using AI took 19% longer while feeling 20% faster. The speed was an illusion. The debugging cost was real.
The industry's response has been more skills, more prompts, bigger context windows. But the problem isn't intelligence — it's accountability.
A rule that lives only in a prompt is a suggestion. An agent that "knows" the rules will eventually forget them. Context degrades. Attention drifts. The question isn't if your agent will break a rule — it's when.
The insight: memory is not enforcement
Three incidents in 48 hours taught me this lesson.
My agent bypassed its own commit approval system in under 30 seconds. Not because it was malicious — because the "gate" was just another rule in a file. Another thing to remember. Another thing to forget.
I had built a SHA256 token system for commit approval. Thought it was bulletproof. Then my agent ran with --auto and the tokens became theater.
The fix wasn't a better token system. The fix was changing the architecture.
Rules that depend on memory fail. Rules that depend on visible blocks succeed.
This is the core insight behind mechanical enforcement: gates that run at the infrastructure level, not the agent level. The agent cannot bypass what it cannot ignore.
The solution: Agent = Model + Harness
I built this project as a complete open-source implementation of the Harness architecture [3] — the mechanical infrastructure that turns raw AI intelligence into reliable output.
| Component | What It Is |
|---|---|
| Instructions & Rules | Who the agent is, what it cares about, what it must never do |
| Tools | 57 composable skills loaded on demand (lazy-loaded, ~250 lines each) |
| Sandboxes & Execution | Terminal, git workspace, CI |
| Orchestration | When each tool fires, how agents coordinate |
| Guardrails & Hooks | Deterministic enforcement at lifecycle points — pre-commit, commit-msg, approval |
| Observability | Metrics, health checks, drift detection |
What makes this different: Most "agent frameworks" are just prompt libraries. This one adds 12 mechanical pre-commit gates, a three-gate commit approval system, and a context engineering layer that saves ~45% of always-loaded tokens.
The code: a commit-msg hook that can't be bypassed
Here's the heart of mechanical enforcement — a commit-msg git hook (v6) [4] that blocks unstamped commits:
#!/usr/bin/env bash
# commit-msg — Three-Gate Approval Check (v6)
set -euo pipefail
REPO_ROOT=$(git rev-parse --show-toplevel)
APPROVAL_FILE="${REPO_ROOT}/.git/COMMIT_APPROVED"
MANIFEST_FILE="${REPO_ROOT}/.git/COMMIT_MANIFEST"
TEST_LOG="${REPO_ROOT}/.git/TEST_LOG"
CURRENT_MSG=$(head -1 "$COMMIT_MSG_FILE" | tr -d '\n')
NOW_EPOCH=$(date +%s)
# Gate 1: Tests passed recently?
if [[ -f "$TEST_LOG" ]]; then
STATUS=$(grep "^status=" "$TEST_LOG" | cut -d= -f2-)
if [[ "$STATUS" == "PASS" ]]; then
TS_EPOCH=$(date -d "$(grep "^timestamp=" "$TEST_LOG" | cut -d= -f2-)" +%s)
AGE=$((NOW_EPOCH - TS_EPOCH))
[ $AGE -le 3600 ] && GATE1=true
fi
fi
# Gate 2: Commit manifest exists and has content?
[[ -f "$MANIFEST_FILE" ]] && \
[ $(wc -c < "$MANIFEST_FILE") -gt 20 ] && GATE2=true
# Gate 3: Approval fresh (<5 min) and message matches?
if [[ -f "$APPROVAL_FILE" ]]; then
TIMESTAMP=$(grep "^timestamp=" "$APPROVAL_FILE" | cut -d= -f2-)
STORED_MSG=$(grep "^message=" "$APPROVAL_FILE" | cut -d= -f2-)
TS_EPOCH=$(date -d "$TIMESTAMP" +%s 2>/dev/null || echo 0)
AGE=$((NOW_EPOCH - TS_EPOCH))
[ $AGE -le 300 ] && [ "$STORED_MSG" = "$CURRENT_MSG" ] && GATE3=true
fi
# All three must pass
if [ "$GATE1" = true ] && [ "$GATE2" = true ] && [ "$GATE3" = true ]; then
echo "✓ All 3 gates passed. Commit allowed."
exit 0
else
echo "✗ Commit blocked — missing gates:"
[ "$GATE1" != true ] && echo " - Tests not run or expired"
[ "$GATE2" != true ] && echo " - Commit manifest missing"
[ "$GATE3" != true ] && echo " - Approval missing or expired"
exit 1
fi
Three conditions must be met before any commit goes through:
- Tests passed (within the last hour) — no blind commits
- Commit manifest exists (the agent writes what changed) — no silent mutations
- User approved (within 5 minutes, message matches) — no stale approvals
The agent writes the approval file after the user says "yes commit" in chat. The hook verifies the file is fresh (<5 min) and matches the exact commit message. If the agent tries to commit without approval, the hook blocks it — every time.
This isn't a rule the agent remembers. It's a gate the agent cannot bypass.
The results: 57 skills, 12 gates, zero shortcuts
After months of iteration, the project ships [5]:
| Metric | Count |
|---|---|
| Composable skills | 57 |
| Lazy-loaded guides | 54 |
| Pre-commit gates | 9 (v8) + 3 commit-msg gates |
| Enforcement levels | 4 (process → manifest → time-window → manifest gate) |
| Agent compatibility | OpenCode, Claude Code, Cursor, Kiro, any git agent [6] |
| Context tokens saved | ~45% vs eager loading [7] |
| Stack support | Node, Python, Rust, Go, Ruby, any language with git |
| Price | Free (MIT) |
What I learned
Prompts are instructions. Gates are guarantees.
If you're building with AI agents, ask yourself:
- Does your agent run tests before every commit? Mechanically, not as a suggestion?
- Does your agent present changes for review before pushing? Every time, not just when it remembers?
- Can your agent bypass its own rules? If yes, those aren't rules — they're suggestions.
The gap between an "impressive demo" and "production-grade" isn't intelligence. It's the harness around it.
Try it:
git clone https://github.com/juandelossantos/another-agent-skills.git
cd another-agent-skills
bash install.sh
init-agents # Activates skill-driven mode in any project
MIT. Free. Zero subscriptions. 57 skills. 12 gates.
juandelossantos.github.io/another-agent-skills
What patterns have you found for keeping AI agents disciplined in production? I'd love to hear what's working (or not working) in your stack.
References
- Osmani, A., Saboo, S., & Kartakis, S. (2026). The New SDLC With Vibe Coding: From ad-hoc prompting to Agentic Engineering. — Harness architecture paper
- Becker, S. et al. (2025). When Developers Use AI: Productivity and Perception. METR (Model Evaluation and Threat Research). — arxiv.org/abs/2507.09089
-
Another Agent Skills. Harness Architecture — The Six Components. —
docs/HARNESS.md -
Another Agent Skills. commit-msg hook (v6) — Three-Gate Approval Check. —
scripts/git-hooks/commit-msg - Another Agent Skills. Repository and Documentation. — github.com/juandelossantos/another-agent-skills
-
Another Agent Skills. Agent Adapters — Compatibility Matrix. —
docs/AGENT-ADAPTERS.md -
Another Agent Skills. Context Budget — Lazy Loading Architecture. —
README.md - Singhal et al. (2026). Agent Skills: Evaluation-Driven Development for AI Coding Agents. Google Research. — Paper
- Osmani, A. (2026). The Factory Model: From Conductors to Orchestrators. — addyosmani.com
-
Another Agent Skills. SOUL.md — Project Identity and Principles. —
SOUL.md
Top comments (0)