Seven Agents Before a Line of Code: The Week 0 Agentic Planning Pipeline

#ai #devjournal #buildinpublic #webdev

Before writing a single line of contract code, I had seven agent specifications. Each one had a defined scope, a list of inputs it consumes, specific outputs it produces, and an explicit list of things it does not do.

That last part matters. An agent with clear boundaries is more useful than one that tries to cover everything.

This is what Week 0 actually was: not "I planned the project," but a structured pipeline of specialist AI agents — each given a specific brief and a bounded scope — coordinated by me as the Orchestrator. Most run on Claude. One deliberately doesn't. The pipeline produced concrete artefacts. Those artefacts feed the next agent. Nothing advances until the handoff is ready.

The agents are specialists, not generalists

The pipeline that runs every week of the build is: Research → Dev → Security → Deploy → QA → Copywriter → Distribution. Those are hard sequential dependencies. Before Week 1, I needed to establish the infrastructure every agent in that chain would rely on.

A few examples of how specific the scope definitions get, from AGENTS.md:

The Dev Agent "takes the research doc and the agreed contract interface and produces working code." It does NOT write the retrospective or the blog post. The devlog it produces is raw material, not polished prose — that's explicitly stated.

The Security Agent "reviews the smart contract for vulnerabilities before it is deployed with real funds." It does NOT rewrite or fix the contract. If it finds a blocker, it raises the issue to the Dev Agent. It also runs on a different model than the Dev Agent — deliberately, and for a specific reason I'll get to.

The Copywriter Agent "turns raw devlogs and research notes into the weekly blog post." It does NOT execute distribution — that's a separate agent with its own scope.

This isn't fine-grained bureaucracy. It's the difference between a pipeline that runs cleanly and one where something quietly absorbs work it shouldn't be doing, or where a step gets skipped because the previous agent assumed the next one would handle it.

What Week 0 produced

The pipeline generated seven concrete artefacts before any testnet was touched:

Canonical contract interface. Three functions: sendSupport, getMessages, withdraw. Two events: SupportSent, Withdrawn. OpenZeppelin Ownable and ReentrancyGuard. Named constants for name and message length limits. Chain-agnostic — valid for any week's implementation on any chain.
Scoring rubric. Eight dimensions with individual weights: Developer Tooling and Contract Authoring both at ×2, Frontend/Wallet Integration at ×2, Documentation and Deployment Experience at ×1.5, and Getting Started, Transaction Cost, and Community each at ×1. Maximum weighted score: 60 points. The Research Agent fills in estimated scores before each week; the Copywriter fills in actuals after. The delta between estimate and reality is part of the story.
Security checklist. Four review sections — access control, reentrancy, input validation, state and event integrity — plus chain-specific addenda for standard EVM, ZK-EVM, and non-EVM chains. Non-negotiable gate before mainnet deployment.
QA checklist. Seven verification checks against the live deployment: message wall loads, test transaction goes through end-to-end, block explorer confirms, contract is verified and readable, UI shows the correct chain badge, withdraw() is callable by owner, mobile layout is usable. Issues block the Copywriter from starting.
Tone guide. Voice principles, a "What to Avoid" table, format rules per content type and platform.
Distribution strategy. Platform priority order, format requirements per platform, canonical URL discipline (every dev.to cross-post sets canonical_url to the Paragraph post).
Series intro post. This is what readers encounter before Week 1.

Two of those artefacts go deeper than the bullets suggest. The Dev Agent worked from a structured task list of 35 dependency-ordered tasks — each with an explicit file path, parallelism markers ([P]), and checkpoint gates between phases. That's "defined inputs and outputs" made concrete, not abstract. Alongside it, a formal adapter contract document was a spec artefact produced before any code was written: five methods with JSDoc signatures and behavioral constraints. The UI shell was designed against this document, not the other way around. That's "no ambiguous handoffs" made concrete. Both were generated with Speckit.

That's a lot of infrastructure before a compiler has been invoked. That's the point.

One more output lands at the end of Week 0 rather than before it: the live frontend. React + Vite, deployed on Cloudflare Pages, accessible at https://proof-of-support.pages.dev. The shared adapter interface sits empty — nothing chain-specific yet. That's Week 1's job. But the URL exists, which means every article and retrospective in this series can link to something real from day one.

The parallelism model

Not everything in the pipeline runs sequentially. Research for Week N+1 starts while Dev is building Week N. The Marketing Agent works on its own cadence and rarely blocks anyone. Distribution publishes Week N while Dev is already building Week N+1.

But within a given week, the chain is hard. Research must finish before Dev starts. Security must pass before deployment. QA must pass before the Copywriter starts writing. The retrospective must be approved before Distribution cross-posts it.

Knowing this upfront means there are no ambiguous handoffs. Every agent knows what it's waiting for and what it's handing off.

Honest about what this is

Most agents run on Claude. The Security Agent is the exception: it runs on ChatGPT, with a locally-run model as a secondary check. The reason is specific. If the same model that writes the contract also reviews it for vulnerabilities, it may not catch its own errors — the same reasoning patterns that produced a flaw could also rationalise it away. Running security on a different model breaks that loop. It's not distrust of any one tool; it's a structural decision about where single-model risk is unacceptable.

I am the Orchestrator — the person who reviews all output, rejects what doesn't meet the spec, and approves what does. The AI didn't plan this project. I ran a structured process using AI agents as specialist workers, with model selection made per-agent where it matters.

Without the structure, every week risks scope creep, missed security steps, retrospectives that drift in quality, or a broken publishing pipeline. The structure enforces discipline. That's the product management brain applied to the development process — which is, honestly, what this whole project is about.

Does the pre-work pay off?

That's what the coming weeks answer. The rubric will score each chain. The retrospectives will document what actually happened. If the pipeline holds up under a week of real build pressure, the pre-work was worth it.

If it doesn't, that's also worth writing about.

→ The full Week 1 build — deploy experience, faucet reality, rubric scores — is in the retrospective: Week 1: Base — 56/60

→ The live app is at https://proof-of-support.pages.dev