How a custom GitHub Copilot agent with strict architectural guardrails turned feature delivery from days into hours on a multi-tenant enterprise platform
The Problem Nobody Talks About in Enterprise Frontend
Enterprise frontend development is slow. Not because developers can't write React components β they can β but because 90% of the work isn't writing code. It's alignment.
Which design tokens do I use? Where does this component go? How do I wire the API? What's the naming convention for hooks? Which state manager handles this? How do I handle dark mode? Did I forget the MSW handler for tests?
On our team building an enterprise multi-tenant GenAI platform β managing agents, tools, and knowledge bases across a large manufacturing conglomerate β the friction was even worse. We have:
- A custom corporate design system with 360+ Tailwind tokens (no generic
gray-500allowed) - 8 feature modules with strict feature-first architecture
- OpenAPI codegen that generates TypeScript types from a FastAPI backend
- MSW (Mock Service Worker) for development and testing
- A 7-tier RBAC system with route-level access guards
-
Light/dark mode using class-based Tailwind (
dark:variants on everything) - i18n for English and German
Every new component is a decision tree. Every junior developer ramp-up takes weeks. Every code review catches the same "you used bg-white instead of bg-background-base" mistake.
So we did something different: we encoded our entire frontend architecture into an AI agent and let it build features for us.
TL;DR (If You Skim, Skim This)
- Problem: Enterprise frontend velocity bottlenecked by architectural complexity, design system compliance, and cross-cutting concerns (auth, theming, mocking, i18n).
-
Move: Built a custom VS Code agent (
.github/agents/FrontendAgent.agent.md) that knows our design system, file structure, state management strategy, and API codegen pipeline. - Result: Feature scaffolding that used to take a day now takes minutes. The agent produces design-system-compliant, dark-mode-ready, MSW-wired, type-safe code on the first pass.
- Tradeoff: You need to invest upfront in writing precise agent instructions. Vague prompts produce vague code β garbage in, garbage out.
Why Not Just Use Copilot Out of the Box?
We did. Here's what vanilla Copilot (without custom instructions) gave us:
// β What generic Copilot produced
<div className="bg-white dark:bg-gray-900 p-4 rounded-lg shadow-md">
<h1 className="text-gray-900 dark:text-white text-xl font-bold">
Tenants
</h1>
</div>
Every single token is wrong. bg-white should be bg-background-base. text-gray-900 should be text-text-normal. p-4 should be p-400. rounded-lg should be rounded-m. font-bold should be font-bold font-primary.
Multiply that across 18 shared components, 8 feature modules, and hundreds of sub-components, and you're spending more time fixing AI output than you saved generating it.
The realization: an AI assistant is only as good as its context. Generic Copilot doesn't know your design system. It doesn't know your file conventions. It doesn't know that you use TanStack Query with a 5-minute stale time and 2 retries, not SWR or Redux Toolkit Query.
So we gave it all of that context. Explicitly. In a single agent definition file.
The Architecture: A 200-Line Agent That Knows Everything
GitHub Copilot supports custom agents via markdown files in .github/agents/. Ours lives at:
.github/agents/FrontendAgent.agent.md
It's a single file that encodes every architectural decision our team has made. Think of it as a machine-readable engineering handbook β the same document that would take a new hire two weeks to internalize, distilled into structured instructions an AI can execute against.
Here's how we structured it:
1. Design System as Code (Not Suggestions)
We don't tell the agent "try to use our design tokens." We tell it these are the only tokens that exist:
DESIGN SYSTEM & THEMING (MANDATORY)
- Use corporate design tokens only (NO generic Tailwind colors like gray-500/blue-600).
- Always include dark mode variants (class-based: darkMode: 'class').
- Semantic tokens examples:
- Colors: bg-background-base, bg-background-surface, text-text-normal,
border-line-weak, bg-action, bg-status-error
- Spacing: p-400 (16px), m-600 (24px), gap-300 (12px)
- Typography: text-400, font-primary, font-secondary, font-bold
- Borders: rounded-m, border-s
- Transitions: duration-medium-1, ease-in-out
- Reference: src/frontend/THEME_GUIDE.md
The word "MANDATORY" isn't decoration. The agent treats sections labeled as mandatory as hard constraints, not preferences. When it generates a card component now:
// β
What the custom agent produces
<div className="bg-background-surface dark:bg-dark-background-surface
p-400 rounded-m shadow-card
border border-line-weak dark:border-dark-line-weak
transition-all duration-medium-1 ease-in-out">
<h1 className="text-text-normal dark:text-dark-text-normal
text-400 font-primary font-bold">
Tenants
</h1>
</div>
Every token is from our design system. Dark mode is included. Transitions use our timing tokens. No manual corrections needed.
2. Feature-First File Structure (Encoded, Not Implied)
We explicitly map the file tree so the agent places files correctly:
FRONTEND ARCHITECTURE & CONVENTIONS
- Feature-first organization:
src/frontend/src/
features/{feature}/
api/ // Axios client functions
components/ // UI components
hooks/ // Feature hooks
pages/ // Route-level pages
components/ // Shared components
contexts/ // Auth, Theme, Tenant contexts
lib/ // Utilities
- Import alias: @/ β src/
- Naming: Components = PascalCase, Hooks = camelCase with 'use',
API files = {feature}Api.ts, Contexts = {Name}Context.tsx
When we ask the agent to build a "knowledge base management feature," it doesn't create a flat KnowledgeBase.tsx in the root. It scaffolds:
src/features/knowledgebase/
βββ api/
β βββ knowledgebaseApi.ts
βββ components/
β βββ KnowledgeBaseList.tsx
β βββ CreateKnowledgeBaseDialog.tsx
βββ hooks/
β βββ useKnowledgeBases.ts
βββ pages/
β βββ KnowledgeBasePage.tsx
βββ types/
βββ index.ts
Correct directory. Correct naming. Correct separation of concerns. Every time.
3. State Management: Pick the Right Tool Automatically
We encode our state management decision tree:
STATE & DATA
- Server state: TanStack Query (staleTime 5 min, retries: 2)
- Global auth: UserInfoProvider (contexts/AuthContext.tsx)
- Theme: ThemeProvider
- Local state: useState/useReducer (NO Redux/Zustand)
- Error handling:
- Wrap TanStack Query errors with Sonner toasts
- ErrorBoundary component with design tokens
- "Access Lost" interceptor: clear tenant, redirect, show toast
Now when the agent generates a data-fetching hook, it doesn't reach for useEffect + fetch or SWR. It produces exactly what our codebase expects:
export const useKnowledgeBases = () => {
const { sessionId } = useAuth();
return useQuery<KnowledgeBase[], Error>({
queryKey: ['knowledgebases'],
queryFn: () => knowledgebaseApi.getKnowledgeBases(sessionId as string),
enabled: !!sessionId,
retry: 2,
});
};
Session-aware. Query-key namespaced. Auth-gated with enabled. Retry count matching our standard. This is exactly what our human-written hooks look like β because the agent learned from the same conventions.
The Secret Weapon: MSW-First Development
Here's where it gets interesting. Our agent doesn't just generate UI components β it generates the entire mock layer alongside them.
MSW-FIRST DEVELOPMENT
- Use MSW (Mock Service Worker) during UI workβdev server and tests.
- Location: src/frontend/src/mocks/
- Handlers:
- Realistic delays: 300β800ms
- Simulate ~5% errors
- Validate required fields and return error shapes consistent with backend
When we ask the agent to build a new feature, the output includes MSW handlers with realistic data:
// Generated MSW handler for knowledge bases
http.get('/api/knowledgebases', async () => {
// Simulate realistic network delay
await delay(Math.random() * 500 + 300);
// 5% error rate simulation
if (Math.random() < 0.05) {
return HttpResponse.json(
{ detail: 'Internal server error' },
{ status: 500 }
);
}
return HttpResponse.json({
items: [
{
id: 'kb-001',
name: 'Production Manual - North Plant',
type: 'S3',
status: 'ACTIVE',
documentCount: 1247,
lastSynced: '2026-04-03T14:30:00Z',
},
// ... more realistic domain-contextualized data
],
});
}),
This means the agent produces runnable features from the first prompt. No waiting for the backend team. No dummy setTimeout hacks. The UI renders with realistic data, realistic latency, and realistic error states immediately.
Backend as Source of Truth: The Codegen Bridge
One of our strongest architectural decisions was making the agent aware of our OpenAPI codegen pipeline:
BACKEND AS SOURCE OF TRUTH (SPEC SYNC)
- Backend is authoritative. FastAPI + Pydantic (code-first).
- Frontend must use generated TypeScript types and API client only.
- Codegen: pnpm api:codegen
- After codegen, run git diff:
- If there is a diff, surface: "Frontend types are stale relative
to backend OpenAPI" and include diff summary.
Our codegen setup (openapi-ts.config.ts) generates types, SDK methods, and even TanStack Query hooks directly from the backend's OpenAPI spec:
// openapi-ts.config.ts
import { defineConfig } from '@hey-api/openapi-ts';
export default defineConfig({
client: '@hey-api/client-fetch',
input: 'http://localhost:8000/openapi.json',
output: { path: 'src/client', format: 'prettier' },
plugins: [
{
name: '@tanstack/react-query',
queryOptions: true,
mutationOptions: true,
},
{
name: '@hey-api/typescript',
enums: 'javascript',
},
],
});
When the agent starts a task, it checks whether the generated types are current. If they've drifted, it flags it:
β οΈ SPEC MISMATCH: Frontend types are stale.
- Missing field: `retryCount` on PromotionEvent
- New enum value: `ROLLED_BACK` in PromotionStatus
Running `pnpm api:codegen` to sync...
This prevents the classic "the UI expects a field the API doesn't send" bug that usually surfaces at 11 PM on a Friday.
Autonomy Levels: Controlling the Blast Radius
We don't always want the agent to write production code. Sometimes we want a plan. Sometimes a scaffold. Sometimes the full implementation.
So we built three autonomy levels:
AUTONOMY LEVELS (Default = Level 2)
- Level 1: Plan Only β Step-by-step plan, file paths, component
signatures. No code changes.
- Level 2: Plan + Scaffold β Create files, stubs, routing/context
wiring, MSW handlers. Minimal UI with tokens; TODO comments.
- Level 3: Full Implementation β Complete feature including styling,
tests, mocks, docs, and ready-to-run commands.
Level 1 is for architecture discussions. "How would you build a promotion approval workflow?" The agent produces a plan, lists affected files, and maps component relationships β without touching a single file.
Level 2 (the default) is our workhorse. The agent creates the file structure, wires routes and contexts, sets up MSW handlers, and builds minimal UI with correct tokens. Developers fill in the business logic.
Level 3 is for well-defined features with clear specs. The agent produces everything: components, hooks, API functions, MSW handlers, unit tests, and even the pnpm commands to verify the output.
The Agent Lifecycle: Not Just "Generate Code"
What separates this from a glorified code generator is the end-to-end lifecycle:
END-TO-END AGENT LIFECYCLE
Phase A β Plan
- Outline goals, dependencies, spec sync (codegen), and scope.
- Note any backend spec gaps (SPEC MISMATCH section).
Phase B β Implement
- Apply scaffolding/implementation per autonomy level.
- Add MSW handlers and tests.
Phase C β Validate
- Run typecheck, build, tests; verify codegen freshness.
Phase D β Deliver
- Provide diffs, test plan, run commands, and follow-up concerns.
The agent doesn't just output code and walk away. It:
- Plans β analyzing the request against the existing codebase
- Syncs β running codegen to ensure types are fresh
- Implements β generating code compliant with every convention
-
Validates β running
pnpm frontend:quality(typecheck + lint + format) - Delivers β providing exact commands to test its output
That validation step is key. If the agent generates code with a type error, it catches it in the same session and fixes it. The developer receives working code, not a first draft.
Real Output: What It Looks Like in Practice
Here's a real interaction. We asked the agent:
"Build a deployment management page for the tenant feature. It should show a table of deployments with status badges, and a dialog to trigger new deployments."
The agent produced:
8 files created:
src/features/tenants/pages/DeploymentsPage.tsx
src/features/tenants/components/DeploymentTable.tsx
src/features/tenants/components/DeployAgentDialog.tsx
src/features/tenants/hooks/useDeployments.ts
src/features/tenants/types/deployment.ts
src/mocks/handlers/deployments.ts
src/features/tenants/components/__tests__/DeploymentTable.test.tsx
Every file followed conventions:
- Design tokens, not raw Tailwind
- Dark mode variants on every element
- TanStack Query with proper query keys
- MSW handlers with realistic delays and 5% error simulation
- Radix Dialog for the deployment trigger
- Sonner toasts for success/error feedback
- Route guard with
RequireDeveloperAccess
Zero manual corrections to the design system usage. One adjustment to a business logic edge case (handling a deployment state we hadn't documented). Total time from prompt to PR-ready code: ~20 minutes including review. Previous estimate for the same feature: 1β2 days.
The Pitfalls (A.K.A. What Bit Us So It Doesn't Bite You)
1. Vague Instructions = Vague Code
Our first agent definition was 40 lines. It produced code that was "close but not quite." The spacing tokens were right but the color tokens were generic. The file structure was feature-first but the naming was inconsistent.
Fix: We expanded to 200+ lines with explicit examples, explicit anti-patterns ("NO generic Tailwind"), and references to real files in the repo. The more specific your instructions, the more accurate the output.
2. The Agent Doesn't Know What Changed Yesterday
If you add a new design token or change a convention and don't update the agent file, it'll use the old pattern. The agent definition is a living document β it needs to be maintained alongside the codebase.
Fix: We added agent definition updates to our PR checklist. Changed a convention? Update FrontendAgent.agent.md in the same PR.
3. MSW Handlers Can Drift from Reality
The agent generates mock handlers based on its understanding of the API. But if the real API has quirks (pagination cursors, non-standard error shapes, optional fields), the mocks might not match.
Fix: We added the SPEC MISMATCH protocol. The agent explicitly flags when it's making assumptions about the API, so developers know which mocks need validation against the real backend.
4. Over-Reliance Kills Understanding
The fastest way to create a team that doesn't understand its own codebase is to let the agent write everything without review. We use the agent as a force multiplier, not a replacement.
Fix: We default to Level 2 (scaffold), not Level 3 (full implementation). Developers fill in remaining business logic, which ensures they understand the code they're shipping.
5. Token Stuffing β There's a Context Window Limit
Our agent instructions are 200+ lines, the theme guide is another 300+, and the copilot instructions are 150+. Some LLMs struggle with this much context.
Fix: We keep the agent file focused on rules and patterns, not exhaustive token lists. The agent references THEME_GUIDE.md for the full token catalogue rather than embedding it inline.
The Numbers
Before the custom agent:
- Feature scaffolding: 4β8 hours (file creation, routing, context wiring, mock setup)
- Design system violations per PR: 3β5 (wrong tokens, missing dark mode)
- Time to first rendered component: 2β4 hours (waiting for mock data setup)
- New developer ramp-up: 2β3 weeks to internalize conventions
After the custom agent:
- Feature scaffolding: 15β30 minutes
- Design system violations per PR: 0β1
- Time to first rendered component: Under 10 minutes (MSW handlers generated alongside UI)
- New developer ramp-up: Days β they read the agent file and see the patterns
The scaffolding speedup alone is 10β15x. But the real win is consistency. Every feature looks like every other feature. Every hook follows the same pattern. Every mock handler has the same structure. The codebase feels like it was written by one very disciplined developer, not a rotating team of six.
When You Should Not Use This Pattern
- Greenfield prototypes β if you're still deciding on conventions, you don't have enough patterns to encode. The agent amplifies consistency; it can't create it from nothing.
- Small teams with one frontend developer β if one person owns the entire frontend, the conventions live in their head. The agent adds overhead without proportional benefit.
- Frequently changing architecture β if you're rewriting your state management strategy every sprint, the agent definition will always be stale. Stabilize first, then encode.
A Practical Implementation Checklist
If you want to build your own frontend agent:
- [ ] Document your design system in a machine-readable format (we use a Tailwind config + theme guide)
- [ ] Map your file structure explicitly β feature directories, naming conventions, import aliases
- [ ] Encode your state management rules β which tool for which type of state, and why
- [ ] Define your API integration pattern β codegen pipeline, client library, error handling
- [ ] Include anti-patterns β what NOT to do is as important as what to do
- [ ] Add autonomy levels β give developers control over how much the agent does
- [ ] Wire in validation β the agent should run your lint/typecheck/build as part of its output
- [ ] Reference, don't embed β point to config files rather than duplicating 360 lines of tokens
- [ ] Add a lifecycle β plan, implement, validate, deliver β not just "generate code"
- [ ] Maintain it like code β update the agent file in the same PR as convention changes
- [ ] Start with scaffold mode β let developers fill in business logic to maintain understanding
- [ ] Include MSW patterns β mock-first development is essential for frontend agent velocity
The Deeper Insight: Agents Are Architecture Documentation That Executes
The most unexpected benefit wasn't speed. It was documentation.
Our FrontendAgent.agent.md file is the most accurate, most up-to-date description of our frontend architecture. Not because we wrote documentation β we hate writing documentation β but because if the agent file is wrong, the generated code is wrong, and someone fixes the agent file.
It's documentation with a built-in feedback loop. When the agent produces a component with the wrong token, the developer who catches it updates the agent instructions. The next generation is correct. Over time, the agent file converges on a precise description of how the codebase actually works.
Compare that to a Confluence page that was last updated eight months ago.
What's Next: The Agent Becomes the PR Reviewer
We're exploring using the same agent instructions as a code review agent. If the agent knows every convention, it should be able to flag violations in PRs automatically:
- "This component uses
bg-gray-100β should bebg-background-surface" - "This hook is in
src/components/β should be insrc/features/tenants/hooks/" - "Missing dark mode variant on
text-text-normal" - "MSW handler missing for new
/api/promotions/:id/approveendpoint"
Same knowledge, different mode. Build in one direction, verify in the other.
Closing: The Best Frontend Engineer on Your Team Doesn't Sleep
An AI agent with the right instructions isn't a replacement for your frontend team. It's the most consistent member of your frontend team. It never forgets a dark mode variant. It never uses the wrong spacing token. It never puts a hook in the wrong directory.
But it also doesn't make product decisions. It doesn't architect from scratch. It doesn't push back on a bad spec.
The sweet spot is composing human judgment with machine consistency. You decide what to build. The agent scaffolds how β following every convention, every token, every pattern your team has established.
And when it's 4 PM on a Friday and the PM says "we need one more feature page before the demo," you can spin up a complete, design-system-compliant, dark-mode-ready, MSW-wired, type-safe scaffold in 15 minutes instead of 4 hours.
That's not magic. That's architecture, encoded.
How are you using AI agents in your frontend workflow? Are you encoding project-specific knowledge, or using generic assistants? I'd love to hear what patterns are working for teams at scale β drop a comment.
Resources
- GitHub Copilot: Custom Instructions β how to add project-specific context
- MSW: Mock Service Worker β API mocking for browser and Node.js
- Hey API: OpenAPI TypeScript Codegen β generate types and clients from OpenAPI specs
- TanStack Query: React Query β server state management
- Tailwind CSS: Design Tokens β custom theme configuration
- Radix UI: Headless Primitives β accessible UI components without default styles
About the Author
Suraj Khaitan β Gen AI Architect | Building scalable platforms and AI-augmented engineering workflows
Connect on LinkedIn | Follow for more engineering and architecture write-ups
Top comments (0)