DEV Community

Nilofer 🚀
Nilofer 🚀

Posted on

SPEC-TO-SHIP: A Multi-Agent Pipeline That Turns Feature Ideas Into Production Code

Writing a feature spec and getting it to production involves a lot of steps, architecture decisions, task planning, implementation, testing, and code review. In a real engineering team, these are handled by different people with different specializations. Most AI coding tools collapse all of that into a single step and ask one model to do everything.

SPEC TO SHIP takes a different approach. It orchestrates five specialized AI agents Architect, Planner, Engineer, QA, and Reviewer within a single Node.js process to simulate a complete startup engineering team workflow. Raw feature ideas go in. Committed, tested, reviewed code comes out.

The Five Agents

The pipeline follows a sequential flow where each agent's output informs the next, with a tight loop between Engineering and QA. Each agent has a defined role, a specific output format, and a clear handoff point - so no single agent is asked to do more than it is designed for.

ArchitectAgent-Senior Software Architect: The first agent in the pipeline. Takes the raw feature idea and generates a comprehensive technical specification covering Overview, Goals, API Contracts, Data Models, and Security sections. Output is a Markdown spec file that every downstream agent works from. Model:google/gemini-2.0-flash-001via OpenRouter.

PlannerAgent-Staff Engineering Manager: Receives the spec from the Architect and breaks it into actionable, dependency-aware development tasks. Output is a JSON array of tasks with topological ordering and acceptance criteria - so the Engineer knows exactly what to build and in what order.

EngineerAgent-Principal Software Engineer: Takes each task from the Planner and implements production-grade TypeScript code for it. Output is source files with proper typing, error handling, and JSDoc. This is where the actual code gets written.

QAAgent-Senior QA Engineer: Receives the Engineer's output and writes exhaustive Vitest test suites for each task. Output is test files covering acceptance criteria and edge cases. The tight loop between Engineer and QA means the implementation is always tested before the Reviewer sees it.

ReviewerAgent-Principal Engineer Reviewer: The final stage, conducts an audit across security, performance, and correctness across everything the previous agents produced. Output is a score from 0 to 100 and an approval status that tells you whether the output is ready to ship.

Quality and Resilience

The pipeline is built for production reliability, not just happy-path execution. Several resilience patterns are built in at the infrastructure level so agent failures do not cascade into full pipeline failures:

Strict TypeScript - No any types allowed anywhere in the generated code.

Exponential Backoff - Retries on 429/529 errors at 1s, 2s, 4s, 8s, and 16s intervals. Rate limit hits do not kill the pipeline.

JSON Robustness - When an agent returns malformed JSON, the pipeline automatically retries with explicit instructions to fix the format rather than failing immediately.

Timeout - Ahard 20-minute limit per pipeline run prevents runaway executions.

Getting Started

Prerequisites

  • Node.js 20+
  • OpenRouter API Key

Installation
Clone the repository, then install dependencies:

npm install
Enter fullscreen mode Exit fullscreen mode

Configure the environment. The only required variable is your OpenRouter API key - everything else has sensible defaults:

cp .env.example .env
# Edit .env and add your OPENROUTER_API_KEY
Enter fullscreen mode Exit fullscreen mode

Configuration

The system uses envalid for robust configuration management:

OPENROUTER_API_KEY - required for LLM access
DEFAULT_MODEL - set to google/gemini-2.0-flash-001
PORT - API server port, default: 3000
DB_PATH - SQLite database path, default: spec-to-ship.db

Usage

Terminal UI
Run the interactive CLI to start a new pipeline:

npm run start
Enter fullscreen mode Exit fullscreen mode

The CLI uses Ink to provide real-time status updates and token streaming as each agent works through its stage. You can watch the pipeline progress in real time - each agent's output appears as it is generated rather than waiting for the full run to complete.

Industrial Dashboard
Start the API server:

npm run dev
Enter fullscreen mode Exit fullscreen mode

Then open dashboard/index.html in your browser. The dashboard features an Industrial Command Center aesthetic with Dark Charcoal and Amber Glow styling and uses Server-Sent Events for real-time observability of the pipeline as it runs. This gives a visual view of the same pipeline that the CLI runs, useful for sharing progress with others or monitoring longer runs.

Output Structure

Every pipeline run writes its artifacts to ./output/{runId}/. Each file maps directly to one agent's output, so you can inspect any stage independently:

spec.md : Architectural specification from the Architect agent. The source of truth every downstream agent works from.

tasks.json : Task breakdown from the Planner agent. The dependency-ordered list of what gets built.

src/ : mplementation code from the Engineer agent.

tests/ : Vitest tests from the QA agent.

review.md : Final review report from the Reviewer agent, including the score and approval status.

meta.json : Token usage, cost, and timing for the full run.

pipeline.log : NDJSON event log of the entire pipeline execution.

How I Built This Using NEO

This project was built using NEO. NEO is a fully autonomous AI engineering agent that can write code and build solutions for AI/ML tasks including AI model evals, prompt optimization and end to end AI pipeline development.

The idea was a multi-agent pipeline that mirrors how a real engineering team works each role specialized, each handoff structured, and the whole thing running autonomously from a feature idea through to reviewed, committed code. The requirements included five distinct agent roles with clear responsibilities, a sequential handoff structure with a QA loop, production-grade TypeScript output, real-time observability via both a CLI and a web dashboard, and resilience patterns like exponential backoff and JSON retry logic.

NEO built the full system: the five agent implementations with their respective prompts and output schemas, the pipeline orchestration layer coordinating sequential handoffs, the Ink-based CLI with real-time token streaming, the Node.js API server with SSE for dashboard observability, the Industrial Command Center dashboard in HTML, the SQLite-backed database, the artifact output structure, and the envalid configuration layer.

How You Can Use and Extend This With NEO

Use it to go from idea to working code in a single command.
Write a feature description, run the pipeline, and get a complete implementation with architecture docs, TypeScript source, Vitest tests, and a reviewer score without manually coordinating any of the steps. The five-agent structure ensures each stage is handled by a role optimised for that specific task.

Use the reviewer score as a quality gate.
The ReviewerAgent scores every run from 0 to 100 across security, performance, and correctness. Teams can use this score as a threshold before accepting generated code - only promoting runs that clear a minimum score into the codebase.

Use the NDJSON event log for pipeline observability.
Every run writes a structured pipeline.log in NDJSON format. This can be parsed by any log processing tool to track pipeline performance, token costs, and approval rates across runs over time.

Extend it with additional agent roles.
The five-agent structure is sequential and modular. A new agent that receives the previous stage's output and produces its own artifact can be added without restructuring the existing pipeline - the handoff pattern is already established for each stage.

Final Notes

SPEC TO SHIP compresses the gap between a feature idea and production-ready code by distributing the work across five specialized agents, each focused on what it does best. Architecture, planning, implementation, testing, and review - all coordinated automatically, with structured handoffs and resilience built in at every stage.

The code is at https://github.com/dakshjain-1616/Spec-To-Ship
You can also build with NEO in your IDE using the VS Code extension or Cursor. 
You can use NEO MCP with Claude Code: https://heyneo.com/claude-code

Top comments (0)