Gary Dotzlaw

Posted on Mar 10

An Agent Swarm That Builds Agent Swarms: Automating Claude Code Infrastructure

#ai #claude #security #productivity

What if Claude Code agents could configure Claude Code infrastructure for any project -- automatically? We built exactly that: a 12-step pipeline where AI agents analyze a codebase and generate complete agent teams, hooks, skills, and slash commands in 30-55 minutes. Three production migrations later, the second was harder but completed faster.

The Problem

Claude Code ships with powerful infrastructure: agent definitions, hooks, skills, slash commands, and settings. Most developers use none of it. Configuring a proper Claude Code project takes a full day for an expert. Most people write a basic CLAUDE.md and stop -- getting maybe 20% of Claude Code's potential.

Migration is even harder. An existing codebase has established patterns, implicit conventions, and domain knowledge buried in code that needs to be extracted into Claude Code infrastructure. We asked: what if Claude Code agents could do this work themselves?

What We Built

A meta-framework: Claude Code agents that generate Claude Code agent infrastructure. Point it at an existing codebase (migration mode) or give it a plain-English project description (greenfield mode), and it produces a complete .claude/ configuration tailored to that specific project in 30-55 minutes.

The framework contains no project-specific code. It contains knowledge about how to build Claude Code configurations: 17 reusable skills, 12 slash commands, 17 hook templates, and over 1,000 lines of methodology refined through real production use.

The Three-Folder Architecture: The framework reads the source project but never modifies it. All generated infrastructure lands in a fresh target project. This READ-ONLY invariant held across 18 sessions and was never violated.

Before and After

Before	After
Full day of expert configuration per project	30-55 minute automated pipeline
Zero security infrastructure on most projects	Full OWASP Top 10 for Agentic Applications coverage
No domain knowledge retention between sessions	Skills provide 140x token efficiency via progressive disclosure
No quality enforcement beyond "remember to lint"	Hooks enforce linting, testing, and security on every tool call
Each project starts from scratch	Each migration makes the framework smarter

Key Results

Metric	Value
Production migrations validated	3 (textToSql-metabase, obsidian-youtube-agent, dotzlaw.com)
Hook templates	17 covering safety, quality, and security
Reusable skills	17 with progressive disclosure architecture
Pipeline steps	12 with parallel execution paths
OWASP coverage	10/10 items addressed
Validation checks	50+ structural and coherence checks before delivery
File conflicts across 18 sessions	0 thanks to agent ownership boundaries

Deep Dive: Compound Returns Across Three Migrations

The framework's core thesis: each migration makes the next one faster, even when the next project is more complex.

Migration 1: textToSql-metabase -- A text-to-SQL dashboard (FastAPI, React, Metabase, Qdrant, MS SQL Server). 45 Python files ported across 10 sessions. 168 print() statements eliminated. 223 unit tests created from zero. 7 anti-pattern categories fixed during migration. The framework itself was built during this migration -- 3 sessions just for framework knowledge base construction.

Migration 2: obsidian-youtube-agent -- A YouTube-to-Obsidian AI pipeline (FastAPI, React, PostgreSQL, Qdrant, Anthropic Claude). 67 Python files -- more complex than Migration 1. Completed in 8 sessions, not 10. The framework build phase (3 sessions in Migration 1) dropped to zero on reuse. The most dramatic change: the Anthropic Batch API (4+ hour waits, opaque failures) was replaced entirely with asyncio.TaskGroup parallel processing -- seconds per video instead of hours per batch.

Migration 3: dotzlaw.com -- A WordPress-to-Astro migration. 41 articles extracted from a SQL backup file (no live admin access), 187 images redistributed from WordPress's flat upload structure to per-article co-located folders, a design-matched dark theme rebuilt from scratch. The framework contributed methodology and skills but the bulk was content transformation and visual design -- domains the framework guides rather than automates.

Compound Returns: Migration 2 was more complex (67 files vs 45, AI/ML integration, full architectural redesign) but completed in fewer sessions. The 3-session framework investment from Migration 1 paid for itself immediately and continues paying on every subsequent project.

Deep Dive: Defense-in-Depth Security

After two production migrations, a security audit against the OWASP Top 10 for Agentic Applications found 11 concrete gaps -- not theoretical risks, specific vulnerabilities with concrete attack paths. We closed all 11 across 14 tasks in 4 phases.

The security architecture uses four concentric defense rings:

Ring 1 (Per-call): Input sanitization (22 patterns), security scan (17 patterns, two-tier enforcement), rate limiting (per-tool thresholds), artifact validation (JSON Schema), audit logging (JSONL metadata-only)
Ring 2 (Trajectory): Heartbeat checkpoint every 25 calls detecting 5 anomaly patterns, watchdog timers per pipeline step, optional trajectory analysis agent
Ring 3 (Structural): File ownership boundaries, tool restrictions, 72 blocked commands, three-folder architecture
Ring 4 (Session): Pre-commit secrets scanning, 5 hygiene checks, stop hooks, security review step

Per-archetype security patterns cover all 7 project types: Python FastAPI, React Vite, SSG/Astro, Node.js Express, AI/ML, Fullstack, and CLI tools. Each archetype gets security hooks tailored to its specific threat surface.

Defense in Depth: Four concentric rings protect the pipeline. Each ring catches what the others miss. The architecture operates across 4 timescales -- from sub-millisecond per-call hooks to session-level pre-commit scans.

Lessons Learned

The highest-leverage improvement is improving the framework itself. Every capability added benefits every future project. The cost is paid once; the return compounds indefinitely.
Migration is an opportunity to fix architecture, not just port code. When a component is demonstrably failing, redesign it during migration rather than porting the failure and planning a future rewrite that never happens.
Hooks are the only deterministic control in a probabilistic system. Prompt instructions achieve ~90% compliance. Hooks achieve 100%. For security-critical behavior, "usually works" is not acceptable.
Information asymmetry must be enforced by architecture, not by prompts. If you tell an agent "don't look at another agent's files," it eventually will. If a hook blocks the file read, it cannot.
Honesty builds more credibility than perfection. We found 11 security gaps in our own production framework. Publishing the gaps and the fixes earned more trust than claiming it was secure from the start.

Read the Full Series

This cross-post covers the highlights. The full 4-part article series goes deep on architecture, self-improvement, security hardening, and a real WordPress-to-Astro migration case study.

Part 1: An Agent Swarm That Builds Agent Swarms -- Two production migrations prove the concept
Part 2: From Prototype to Platform -- The framework improves itself using its own methodology
Part 3: Securing Agentic AI -- 11 gaps found, 11 gaps closed, 10/10 OWASP coverage
Part 4: WordPress to Astro -- The third migration and an honest assessment of what worked

Built by Gary, Katrina, and Ryan Dotzlaw

DEV Community