Suraj Khaitan

Posted on Apr 4

🤖 We Gave an AI Agent Our Design System and Let It Build Our Frontend — Here's What Happened

#ai #agents #aws #frontend

How a custom GitHub Copilot agent with strict architectural guardrails turned feature delivery from days into hours on a multi-tenant enterprise platform

The Problem Nobody Talks About in Enterprise Frontend

Enterprise frontend development is slow. Not because developers can't write React components — they can — but because 90% of the work isn't writing code. It's alignment.

Which design tokens do I use? Where does this component go? How do I wire the API? What's the naming convention for hooks? Which state manager handles this? How do I handle dark mode? Did I forget the MSW handler for tests?

On our team building an enterprise multi-tenant GenAI platform — managing agents, tools, and knowledge bases across a large manufacturing conglomerate — the friction was even worse. We have:

A custom corporate design system with 360+ Tailwind tokens (no generic gray-500 allowed)
8 feature modules with strict feature-first architecture
OpenAPI codegen that generates TypeScript types from a FastAPI backend
MSW (Mock Service Worker) for development and testing
A 7-tier RBAC system with route-level access guards
Light/dark mode using class-based Tailwind (dark: variants on everything)
i18n for English and German

Every new component is a decision tree. Every junior developer ramp-up takes weeks. Every code review catches the same "you used bg-white instead of bg-background-base" mistake.

So we did something different: we encoded our entire frontend architecture into an AI agent and let it build features for us.

TL;DR (If You Skim, Skim This)

Problem: Enterprise frontend velocity bottlenecked by architectural complexity, design system compliance, and cross-cutting concerns (auth, theming, mocking, i18n).
Move: Built a custom VS Code agent (.github/agents/FrontendAgent.agent.md) that knows our design system, file structure, state management strategy, and API codegen pipeline.
Result: Feature scaffolding that used to take a day now takes minutes. The agent produces design-system-compliant, dark-mode-ready, MSW-wired, type-safe code on the first pass.
Tradeoff: You need to invest upfront in writing precise agent instructions. Vague prompts produce vague code — garbage in, garbage out.

Why Not Just Use Copilot Out of the Box?

We did. Here's what vanilla Copilot (without custom instructions) gave us:

// ❌ What generic Copilot produced
<div className="bg-white dark:bg-gray-900 p-4 rounded-lg shadow-md">
  <h1 className="text-gray-900 dark:text-white text-xl font-bold">
    Tenants
  </h1>
</div>

Every single token is wrong. bg-white should be bg-background-base. text-gray-900 should be text-text-normal. p-4 should be p-400. rounded-lg should be rounded-m. font-bold should be font-bold font-primary.

Multiply that across 18 shared components, 8 feature modules, and hundreds of sub-components, and you're spending more time fixing AI output than you saved generating it.

The realization: an AI assistant is only as good as its context. Generic Copilot doesn't know your design system. It doesn't know your file conventions. It doesn't know that you use TanStack Query with a 5-minute stale time and 2 retries, not SWR or Redux Toolkit Query.

So we gave it all of that context. Explicitly. In a single agent definition file.

The Architecture: A 200-Line Agent That Knows Everything

GitHub Copilot supports custom agents via markdown files in .github/agents/. Ours lives at:

.github/agents/FrontendAgent.agent.md

It's a single file that encodes every architectural decision our team has made. Think of it as a machine-readable engineering handbook — the same document that would take a new hire two weeks to internalize, distilled into structured instructions an AI can execute against.

Here's how we structured it:

1. Design System as Code (Not Suggestions)

We don't tell the agent "try to use our design tokens." We tell it these are the only tokens that exist:

DESIGN SYSTEM & THEMING (MANDATORY)
- Use corporate design tokens only (NO generic Tailwind colors like gray-500/blue-600).
- Always include dark mode variants (class-based: darkMode: 'class').
- Semantic tokens examples:
  - Colors: bg-background-base, bg-background-surface, text-text-normal,
            border-line-weak, bg-action, bg-status-error
  - Spacing: p-400 (16px), m-600 (24px), gap-300 (12px)
  - Typography: text-400, font-primary, font-secondary, font-bold
  - Borders: rounded-m, border-s
  - Transitions: duration-medium-1, ease-in-out
- Reference: src/frontend/THEME_GUIDE.md

The word "MANDATORY" isn't decoration. The agent treats sections labeled as mandatory as hard constraints, not preferences. When it generates a card component now:

// ✅ What the custom agent produces
<div className="bg-background-surface dark:bg-dark-background-surface
                p-400 rounded-m shadow-card
                border border-line-weak dark:border-dark-line-weak
                transition-all duration-medium-1 ease-in-out">
  <h1 className="text-text-normal dark:text-dark-text-normal
                 text-400 font-primary font-bold">
    Tenants
  </h1>
</div>

Every token is from our design system. Dark mode is included. Transitions use our timing tokens. No manual corrections needed.

2. Feature-First File Structure (Encoded, Not Implied)

We explicitly map the file tree so the agent places files correctly:

FRONTEND ARCHITECTURE & CONVENTIONS
- Feature-first organization:
  src/frontend/src/
    features/{feature}/
      api/          // Axios client functions
      components/   // UI components
      hooks/        // Feature hooks
      pages/        // Route-level pages
    components/     // Shared components
    contexts/       // Auth, Theme, Tenant contexts
    lib/            // Utilities
- Import alias: @/ → src/
- Naming: Components = PascalCase, Hooks = camelCase with 'use',
          API files = {feature}Api.ts, Contexts = {Name}Context.tsx

When we ask the agent to build a "knowledge base management feature," it doesn't create a flat KnowledgeBase.tsx in the root. It scaffolds:

src/features/knowledgebase/
├── api/
│   └── knowledgebaseApi.ts
├── components/
│   ├── KnowledgeBaseList.tsx
│   └── CreateKnowledgeBaseDialog.tsx
├── hooks/
│   └── useKnowledgeBases.ts
├── pages/
│   └── KnowledgeBasePage.tsx
└── types/
    └── index.ts

Correct directory. Correct naming. Correct separation of concerns. Every time.

3. State Management: Pick the Right Tool Automatically

We encode our state management decision tree:

STATE & DATA
- Server state: TanStack Query (staleTime 5 min, retries: 2)
- Global auth: UserInfoProvider (contexts/AuthContext.tsx)
- Theme: ThemeProvider
- Local state: useState/useReducer (NO Redux/Zustand)
- Error handling:
  - Wrap TanStack Query errors with Sonner toasts
  - ErrorBoundary component with design tokens
  - "Access Lost" interceptor: clear tenant, redirect, show toast

Now when the agent generates a data-fetching hook, it doesn't reach for useEffect + fetch or SWR. It produces exactly what our codebase expects:

export const useKnowledgeBases = () => {
  const { sessionId } = useAuth();

  return useQuery<KnowledgeBase[], Error>({
    queryKey: ['knowledgebases'],
    queryFn: () => knowledgebaseApi.getKnowledgeBases(sessionId as string),
    enabled: !!sessionId,
    retry: 2,
  });
};

Session-aware. Query-key namespaced. Auth-gated with enabled. Retry count matching our standard. This is exactly what our human-written hooks look like — because the agent learned from the same conventions.

The Secret Weapon: MSW-First Development

Here's where it gets interesting. Our agent doesn't just generate UI components — it generates the entire mock layer alongside them.

MSW-FIRST DEVELOPMENT
- Use MSW (Mock Service Worker) during UI work—dev server and tests.
- Location: src/frontend/src/mocks/
- Handlers:
  - Realistic delays: 300–800ms
  - Simulate ~5% errors
  - Validate required fields and return error shapes consistent with backend

When we ask the agent to build a new feature, the output includes MSW handlers with realistic data:

// Generated MSW handler for knowledge bases
http.get('/api/knowledgebases', async () => {
  // Simulate realistic network delay
  await delay(Math.random() * 500 + 300);

  // 5% error rate simulation
  if (Math.random() < 0.05) {
    return HttpResponse.json(
      { detail: 'Internal server error' },
      { status: 500 }
    );
  }

  return HttpResponse.json({
    items: [
      {
        id: 'kb-001',
        name: 'Production Manual - North Plant',
        type: 'S3',
        status: 'ACTIVE',
        documentCount: 1247,
        lastSynced: '2026-04-03T14:30:00Z',
      },
      // ... more realistic domain-contextualized data
    ],
  });
}),

This means the agent produces runnable features from the first prompt. No waiting for the backend team. No dummy setTimeout hacks. The UI renders with realistic data, realistic latency, and realistic error states immediately.

Backend as Source of Truth: The Codegen Bridge

One of our strongest architectural decisions was making the agent aware of our OpenAPI codegen pipeline:

BACKEND AS SOURCE OF TRUTH (SPEC SYNC)
- Backend is authoritative. FastAPI + Pydantic (code-first).
- Frontend must use generated TypeScript types and API client only.
- Codegen: pnpm api:codegen
- After codegen, run git diff:
  - If there is a diff, surface: "Frontend types are stale relative
    to backend OpenAPI" and include diff summary.

Our codegen setup (openapi-ts.config.ts) generates types, SDK methods, and even TanStack Query hooks directly from the backend's OpenAPI spec:

// openapi-ts.config.ts
import { defineConfig } from '@hey-api/openapi-ts';

export default defineConfig({
  client: '@hey-api/client-fetch',
  input: 'http://localhost:8000/openapi.json',
  output: { path: 'src/client', format: 'prettier' },
  plugins: [
    {
      name: '@tanstack/react-query',
      queryOptions: true,
      mutationOptions: true,
    },
    {
      name: '@hey-api/typescript',
      enums: 'javascript',
    },
  ],
});

When the agent starts a task, it checks whether the generated types are current. If they've drifted, it flags it:

⚠️ SPEC MISMATCH: Frontend types are stale.
  - Missing field: `retryCount` on PromotionEvent
  - New enum value: `ROLLED_BACK` in PromotionStatus
  Running `pnpm api:codegen` to sync...

This prevents the classic "the UI expects a field the API doesn't send" bug that usually surfaces at 11 PM on a Friday.

Autonomy Levels: Controlling the Blast Radius

We don't always want the agent to write production code. Sometimes we want a plan. Sometimes a scaffold. Sometimes the full implementation.

So we built three autonomy levels:

AUTONOMY LEVELS (Default = Level 2)
- Level 1: Plan Only → Step-by-step plan, file paths, component
            signatures. No code changes.
- Level 2: Plan + Scaffold → Create files, stubs, routing/context
            wiring, MSW handlers. Minimal UI with tokens; TODO comments.
- Level 3: Full Implementation → Complete feature including styling,
            tests, mocks, docs, and ready-to-run commands.

Level 1 is for architecture discussions. "How would you build a promotion approval workflow?" The agent produces a plan, lists affected files, and maps component relationships — without touching a single file.

Level 2 (the default) is our workhorse. The agent creates the file structure, wires routes and contexts, sets up MSW handlers, and builds minimal UI with correct tokens. Developers fill in the business logic.

Level 3 is for well-defined features with clear specs. The agent produces everything: components, hooks, API functions, MSW handlers, unit tests, and even the pnpm commands to verify the output.

The Agent Lifecycle: Not Just "Generate Code"

What separates this from a glorified code generator is the end-to-end lifecycle:

END-TO-END AGENT LIFECYCLE
Phase A — Plan
- Outline goals, dependencies, spec sync (codegen), and scope.
- Note any backend spec gaps (SPEC MISMATCH section).

Phase B — Implement
- Apply scaffolding/implementation per autonomy level.
- Add MSW handlers and tests.

Phase C — Validate
- Run typecheck, build, tests; verify codegen freshness.

Phase D — Deliver
- Provide diffs, test plan, run commands, and follow-up concerns.

The agent doesn't just output code and walk away. It:

Plans — analyzing the request against the existing codebase
Syncs — running codegen to ensure types are fresh
Implements — generating code compliant with every convention
Validates — running pnpm frontend:quality (typecheck + lint + format)
Delivers — providing exact commands to test its output

That validation step is key. If the agent generates code with a type error, it catches it in the same session and fixes it. The developer receives working code, not a first draft.

Real Output: What It Looks Like in Practice

Here's a real interaction. We asked the agent:

"Build a deployment management page for the tenant feature. It should show a table of deployments with status badges, and a dialog to trigger new deployments."

The agent produced:

8 files created:

src/features/tenants/pages/DeploymentsPage.tsx
src/features/tenants/components/DeploymentTable.tsx
src/features/tenants/components/DeployAgentDialog.tsx
src/features/tenants/hooks/useDeployments.ts
src/features/tenants/types/deployment.ts
src/mocks/handlers/deployments.ts
src/features/tenants/components/__tests__/DeploymentTable.test.tsx

Every file followed conventions:

Design tokens, not raw Tailwind
Dark mode variants on every element
TanStack Query with proper query keys
MSW handlers with realistic delays and 5% error simulation
Radix Dialog for the deployment trigger
Sonner toasts for success/error feedback
Route guard with RequireDeveloperAccess

Zero manual corrections to the design system usage. One adjustment to a business logic edge case (handling a deployment state we hadn't documented). Total time from prompt to PR-ready code: ~20 minutes including review. Previous estimate for the same feature: 1–2 days.

The Pitfalls (A.K.A. What Bit Us So It Doesn't Bite You)

1. Vague Instructions = Vague Code

Our first agent definition was 40 lines. It produced code that was "close but not quite." The spacing tokens were right but the color tokens were generic. The file structure was feature-first but the naming was inconsistent.

Fix: We expanded to 200+ lines with explicit examples, explicit anti-patterns ("NO generic Tailwind"), and references to real files in the repo. The more specific your instructions, the more accurate the output.

2. The Agent Doesn't Know What Changed Yesterday

If you add a new design token or change a convention and don't update the agent file, it'll use the old pattern. The agent definition is a living document — it needs to be maintained alongside the codebase.

Fix: We added agent definition updates to our PR checklist. Changed a convention? Update FrontendAgent.agent.md in the same PR.

3. MSW Handlers Can Drift from Reality

The agent generates mock handlers based on its understanding of the API. But if the real API has quirks (pagination cursors, non-standard error shapes, optional fields), the mocks might not match.

Fix: We added the SPEC MISMATCH protocol. The agent explicitly flags when it's making assumptions about the API, so developers know which mocks need validation against the real backend.

4. Over-Reliance Kills Understanding

The fastest way to create a team that doesn't understand its own codebase is to let the agent write everything without review. We use the agent as a force multiplier, not a replacement.

Fix: We default to Level 2 (scaffold), not Level 3 (full implementation). Developers fill in remaining business logic, which ensures they understand the code they're shipping.

5. Token Stuffing — There's a Context Window Limit

Our agent instructions are 200+ lines, the theme guide is another 300+, and the copilot instructions are 150+. Some LLMs struggle with this much context.

Fix: We keep the agent file focused on rules and patterns, not exhaustive token lists. The agent references THEME_GUIDE.md for the full token catalogue rather than embedding it inline.

The Numbers

Before the custom agent:

Feature scaffolding: 4–8 hours (file creation, routing, context wiring, mock setup)
Design system violations per PR: 3–5 (wrong tokens, missing dark mode)
Time to first rendered component: 2–4 hours (waiting for mock data setup)
New developer ramp-up: 2–3 weeks to internalize conventions

After the custom agent:

Feature scaffolding: 15–30 minutes
Design system violations per PR: 0–1
Time to first rendered component: Under 10 minutes (MSW handlers generated alongside UI)
New developer ramp-up: Days — they read the agent file and see the patterns

The scaffolding speedup alone is 10–15x. But the real win is consistency. Every feature looks like every other feature. Every hook follows the same pattern. Every mock handler has the same structure. The codebase feels like it was written by one very disciplined developer, not a rotating team of six.

When You Should Not Use This Pattern

Greenfield prototypes — if you're still deciding on conventions, you don't have enough patterns to encode. The agent amplifies consistency; it can't create it from nothing.
Small teams with one frontend developer — if one person owns the entire frontend, the conventions live in their head. The agent adds overhead without proportional benefit.
Frequently changing architecture — if you're rewriting your state management strategy every sprint, the agent definition will always be stale. Stabilize first, then encode.

A Practical Implementation Checklist

If you want to build your own frontend agent:

[ ] Document your design system in a machine-readable format (we use a Tailwind config + theme guide)
[ ] Map your file structure explicitly — feature directories, naming conventions, import aliases
[ ] Encode your state management rules — which tool for which type of state, and why
[ ] Define your API integration pattern — codegen pipeline, client library, error handling
[ ] Include anti-patterns — what NOT to do is as important as what to do
[ ] Add autonomy levels — give developers control over how much the agent does
[ ] Wire in validation — the agent should run your lint/typecheck/build as part of its output
[ ] Reference, don't embed — point to config files rather than duplicating 360 lines of tokens
[ ] Add a lifecycle — plan, implement, validate, deliver — not just "generate code"
[ ] Maintain it like code — update the agent file in the same PR as convention changes
[ ] Start with scaffold mode — let developers fill in business logic to maintain understanding
[ ] Include MSW patterns — mock-first development is essential for frontend agent velocity

The Deeper Insight: Agents Are Architecture Documentation That Executes

The most unexpected benefit wasn't speed. It was documentation.

Our FrontendAgent.agent.md file is the most accurate, most up-to-date description of our frontend architecture. Not because we wrote documentation — we hate writing documentation — but because if the agent file is wrong, the generated code is wrong, and someone fixes the agent file.

It's documentation with a built-in feedback loop. When the agent produces a component with the wrong token, the developer who catches it updates the agent instructions. The next generation is correct. Over time, the agent file converges on a precise description of how the codebase actually works.

Compare that to a Confluence page that was last updated eight months ago.

What's Next: The Agent Becomes the PR Reviewer

We're exploring using the same agent instructions as a code review agent. If the agent knows every convention, it should be able to flag violations in PRs automatically:

"This component uses bg-gray-100 — should be bg-background-surface"
"This hook is in src/components/ — should be in src/features/tenants/hooks/"
"Missing dark mode variant on text-text-normal"
"MSW handler missing for new /api/promotions/:id/approve endpoint"

Same knowledge, different mode. Build in one direction, verify in the other.

Closing: The Best Frontend Engineer on Your Team Doesn't Sleep

An AI agent with the right instructions isn't a replacement for your frontend team. It's the most consistent member of your frontend team. It never forgets a dark mode variant. It never uses the wrong spacing token. It never puts a hook in the wrong directory.

But it also doesn't make product decisions. It doesn't architect from scratch. It doesn't push back on a bad spec.

The sweet spot is composing human judgment with machine consistency. You decide what to build. The agent scaffolds how — following every convention, every token, every pattern your team has established.

And when it's 4 PM on a Friday and the PM says "we need one more feature page before the demo," you can spin up a complete, design-system-compliant, dark-mode-ready, MSW-wired, type-safe scaffold in 15 minutes instead of 4 hours.

That's not magic. That's architecture, encoded.

How are you using AI agents in your frontend workflow? Are you encoding project-specific knowledge, or using generic assistants? I'd love to hear what patterns are working for teams at scale — drop a comment.

Resources

GitHub Copilot: Custom Instructions — how to add project-specific context
MSW: Mock Service Worker — API mocking for browser and Node.js
Hey API: OpenAPI TypeScript Codegen — generate types and clients from OpenAPI specs
TanStack Query: React Query — server state management
Tailwind CSS: Design Tokens — custom theme configuration
Radix UI: Headless Primitives — accessible UI components without default styles

About the Author

Suraj Khaitan — Gen AI Architect | Building scalable platforms and AI-augmented engineering workflows

Connect on LinkedIn | Follow for more engineering and architecture write-ups

DEV Community