SeungHo Lee

Posted on Jul 3

I stopped babysitting my AI frontend agent — so I built it guardrails

#showdev #ai #frontend #opensource

Claude writes frontend code faster than I can read it. That turned out to be the problem.

Here is a normal afternoon from a few weeks ago. I ask for a feature, watch a dozen files show up, and then spend the next twenty minutes playing security guard. Did it just run git add . and sweep my .env into the commit? Is that a real component or another screen of purple gradients and drop shadows? Did it write the tests, or just tell me it did? Did it set strict: false somewhere to make a red squiggle go away?

Generating the code was basically free. The watching wasn't.

So I built fe-rail: a Claude Code plugin that runs my frontend work through a fixed spec → build → review → PR loop, blocks the moves I'm scared of, and only stops to ask me two things.

What I actually wanted

Not more autonomy. Less.

Most of the agent tooling I looked at was trying to give the model more room to run. I wanted the opposite. I already have a process I trust when I write frontend by hand: write a short spec, build the types first, review for the things that actually bite (types, performance, accessibility, plain code quality), then open a clean PR. I just wanted the agent to follow it every time, without me hovering over its shoulder.

The other half was fear. Some mistakes are cheap to undo. A force push over a colleague's branch, or a committed .env, is not. I wanted those to be impossible, not just discouraged.

What a run looks like

One command, two questions:

$ claude
> /fe-rail:fe-spec

[fe-spec] Analyzing requirements... (fe-analyst, fe-architect)
✔ feature.md generated — 7 sections, 3 open questions resolved

Next step?
  ❯ Full auto (recommended) — hand off to fe-start automatically
    Build only — I'll review the spec first
    Revise spec

> Full auto

[fe-start] Phase 2 — implementing types → hooks → components → tests
✔ 12 files created, tsc clean, lint clean, 8 tests passing

Commit and open a PR?
  ❯ Yes — split by type (feat/fix/test) and open a draft PR
    No — leave changes uncommitted

> Yes

✔ 2 commits created
✔ Pushed to feat/product-search-autocomplete
✔ Draft PR: https://github.com/you/your-app/pull/42

That is the whole shape of it. Two human check-ins, "Implement?" and "Commit?". Everything between them runs on its own.

Here is roughly what moved off my plate:

Before	With fe-rail
Re-reading every diff for a stray `git add .` or force push	Blocked at the hook, before it runs
"Did it actually write tests?"	The build ends in tests; the quality gate re-checks changed files on stop
Catching generic gradient-and-shadow UI by eye	`design-nudge` warns, or a `DESIGN.md` sets the rules
Splitting commits and writing the PR body myself	`fe-git-operator` and `fe-pr-author` do it

The pipeline

fe-rail is five slash commands:

/fe-rail:fe-spec turns a rough request into a structured spec, then asks whether to keep going.
/fe-rail:fe-build implements it in a fixed order: types, then logic, then components, then tests.
/fe-rail:fe-review runs a four-axis review: types, performance, a11y, and quality.
/fe-rail:fe-start feature.md chains all of that through to a PR.
/fe-rail:fe-doc-sync reads your project and suggests updates to its CLAUDE.md and README.

Run them one at a time when you want to drive, or let fe-start take the wheel.

The spec step is not limited to text. If the feature lives in a Figma file or a screenshot, fe-vision reads it into concrete screens and states. If it arrives as a slide deck, fe-deck-reader breaks it into screens and the flows between them. Later, at review time, fe-vision can also compare a screenshot of what got built against the original reference and flag where the two drift apart. That one still surprises me when it works.

There is a small detail I care about in how the two check-ins stay at two. Picking "Full auto" at the spec gate counts as the "Implement?" yes, so fe-start only has to ask "Commit?" later on. I did not want the convenience of a one-command run to quietly cost me a checkpoint.

The part I actually care about: hooks

Speed was never the hard part. Trust was. So the piece I spent the most time on is the hook layer, and it runs on one rule.

Block dangers. Warn on quality.

Blocking means the tool call never happens (the hook exits with code 2). Warning means it happens and I get a note on stderr. A few of the blockers:

guard.sh stops git add ., force pushes, --no-verify, git reset --hard, git checkout/restore ., and rm -rf / before they run.
write-guard.sh refuses to create or edit .env files, private keys, and credential dumps. It still lets through the source files you would expect, like .env.example or a CredentialForm.tsx.
config-protection.sh blocks edits that weaken your setup. The agent cannot flip strict: false, drop in a @ts-nocheck, or switch a linter's recommended rules off to make the errors disappear.

The warnings are lighter. design-nudge.sh pings me when an edit reaches for the default AI look: heavy arbitrary shadows, the obligatory indigo gradient. quality-gate.sh runs the linter and type check on changed files when the session ends and prints what it finds.

There is an escape hatch, because there has to be one. You can turn the whole thing down with FE_RAIL_HOOK_PROFILE=minimal, or switch off individual hooks by name with FE_RAIL_DISABLED_HOOKS. But minimal does not touch the safety blockers. If you want those gone, you have to name them one by one. Quality is negotiable. Overwriting your .env is not.

Multiple agents, on purpose

Each phase hands off to a small sub-agent in its own context instead of stuffing everything into one long session. Requirements analysis goes to fe-analyst. Architecture questions go to fe-architect. The review fans out to fe-reviewer, plus a dedicated fe-a11y-auditor and fe-perf-auditor. The main thread stays readable because the noisy work happens off to the side.

One choice I still go back and forth on: the model tiers are aliases, not pinned versions. High-judgment agents ask for opus, cheap exploration runs on haiku, and the rest use sonnet. When a new tier ships, the agents pick it up on their own. Free upgrades, sure. It also means the same plugin version can behave a little differently a month from now, which is a strange thing to sign up for.

What I learned

That aliasing decision is what forced the most useful part of the project: a regression harness. eval/run.sh checks the hooks, the profile toggles, and the plugin's own config without calling a live model, so I find out when a model update or a config change quietly breaks a guard. It exits non-zero on failure, so CI can gate on it.

The design guard taught me something too. design-nudge nagged me constantly at first, until I added a DESIGN.md to the project. That is the actual design: once a project states its own rules, the generic nudging goes quiet and the reviewer enforces your rules instead of my defaults.

I will be honest about the limits. It is opinionated: TypeScript in strict mode, Next.js App Router or a Vite SPA, Tailwind, shadcn/ui. Off that stack, most of the value drains away. It is young (v1.10 as I write this), and it is frontend only by design. It is really just my own frontend process, written down and made hard to skip.

Try it

If you are on Claude Code and roughly that stack:

/plugin marketplace add sh5623/fe-rail
/plugin install fe-rail@fe-rail-market

Two things are worth doing right after install. Give the project a CLAUDE.md (run /init) so the agents are not reasoning blind about your stack. And allow Bash(git *) and Bash(gh pr *) in your settings so PR creation does not prompt you at every step.

It is open source under MIT:

sh5623 / fe-rail

Claude Code plugin — spec → build → review → PR harness for Next.js, Vite SPA & TypeScript (Tailwind, shadcn/ui)

fe-rail

Frontend-focused Claude Code plugin Automated spec → build → review → PR workflow for Next.js App Router / Vite SPA (TanStack Router · React Router 7) + TypeScript, with full Tailwind v3/v4 / shadcn/ui support.

Installation

claude

/plugin marketplace add sh5623/fe-rail
/plugin install fe-rail@fe-rail-market

Usage

$ claude
> /fe-rail:fe-spec

[fe-spec] Analyzing requirements... (fe-analyst, fe-architect)
✔ feature.md generated — 7 sections, 3 open questions resolved

Next step?
  ❯ Full auto (recommended) — hand off to fe-start automatically
    Build only — I'll review the spec first
    Revise spec

> Full auto

[fe-start] Phase 2 — implementing types → hooks → components → tests
✔ 12 files created, tsc clean, lint clean, 8 tests passing

Commit and open a PR?
  ❯ Yes — split by type (feat/fix/test) and open a draft PR
    No — leave changes uncommitted

> Yes

✔ 2 commits created
✔ Pushed to feat/product-search-autocomplete
✔ Draft PR: https://github.com/you/your-app/pull/42

Two…

View on GitHub

I would like to know where it breaks for you. What is the one thing your AI coding setup does that you wish it would ask you about first?