I Built an AI System That Delivers PR-Ready Code Across 80+ Microservices

#ai #automation #claude #learning

I'm a software engineer at a fintech company with 80+ microservices. I built a structured AI system around Claude Code that takes an epic from description to PR-ready code — while I stay in control of every decision.

Here's how it works, why I built it, and what it actually changed.

--- The Problem With Microservices at Scale A single product epic at Propelld can touch 5 to 10 services simultaneously. Before I built this system, the pain was predictable:

Dependencies got missed mid-sprint
Architecture decisions lived in Slack and died there
Code review surfaced patterns that should have been caught in design
The same mistakes repeated across epics because nobody remembered what was decided three months ago Standard tools don't solve this. Copilot suggests the next line. ChatGPT gives you a plan. Neither knows your codebase, your service boundaries, or your team's conventions.

I needed something that could reason about my system — not a generic one.

What I Built

An AI-assisted delivery system built on top of Claude Code (Anthropic's CLI for Claude). Not a side project. Not open source. Just something the team needed.

The system takes an epic from description → code analysis → architecture decision → implementation → PR-ready code.

My role: make decisions, review diffs, raise the PR.

The system's role: everything else.

--- The 6-Phase Pipeline

Every phase has a hard human gate. The AI cannot proceed until I explicitly approve.

Phase 1 — Service Identification

The system reads a service-index.json — a curated map of what each of the 80+ services does — and identifies which services the epic touches. No source code read yet. Just topology. Then it stops and asks clarifying questions.

Phase 2 — Deep Code Analysis

A sub-agent dives into actual source code. It finds patterns, import conventions, error handling styles, and inter-service dependencies. Everything gets saved as checkpoint files — JSON and Markdown artifacts that carry context forward without re-running the analysis. Then it stops and surfaces a summary.

Phase 3 — Architecture Options

The system presents two or three architecture approaches with trade-offs. Pros, cons, complexity, risk. I choose. The AI never picks for me — this is a hard rule in the system prompt.

Phase 3.5 — Decision Records

My choice gets saved as an Architecture Decision Record (ADR). Future epics reference past decisions automatically. This is how institutional knowledge stops dying in Slack.
Phase 4 — Implementation Guide

A second agent generates a full implementation guide: BEFORE/AFTER code diffs, file paths, service mapping, story points per task. This happens before a single line of production code is written.
Phase 5 — Handoff Options

I choose how to proceed:

AI-guided execution with me driving
TDD mode (AI writes tests first, then code)
Take the guide and go fully manual Phase 6 — Code Generation (TDD) If I pick TDD mode, the AI writes code in an isolated workspace — never directly in the microservices directory. I review the diff. I raise the PR. No autonomous commits. No auto-formatting. No surprises.

The Architecture That Made It Work: Agents and Skills
The heavy lifting doesn't happen in the main conversation thread. That was the key insight.
I built custom agents and skills that run in parallel — like a lead engineer delegating to specialists:

One agent explores the service graph
Another analyzes source patterns
Another generates the implementation guide

Each runs in its own context. Only the summary returns to the main thread.

Result: token usage dropped from ~75,000 to ~15,000 per epic. The main conversation stays clean enough to reason about.

--- The Part People Underestimate: Pattern Matching
This isn't vibe coding.

Before generating any code, the system reads the existing codebase and extracts:

Brace style (Allman — braces on new lines, mandatory at Propelld)
Import patterns (import { logger } from "../config/logger" — not CommonJS requires)
Error handling conventions (formatException(error) — not console.error)
Service client usage (@propelld/service-clients — never raw HTTP calls)

Code that comes out matches what's already there. It doesn't invent new conventions. It follows what exists.

--- What Actually Changed After a Month

Dependencies caught before a single line of code is written
Architecture decisions documented and automatically referenced in future epics
One engineer delivering PR-ready code across multiple services per sprint
End-to-end implementation flow for all major feature types
No need to download each service manually, I have created start.sh which download all the service into a folder and letter Agents can refer those services for analysis. --- What I'd Tell Anyone Thinking About This The system is only as good as the structure you impose on it. Unconstrained AI in a complex codebase is noise. Constrained AI with explicit phase gates, checkpoint artifacts, and pattern-matching is leverage.

The AI isn't replacing judgment. It's handling the grunt work — so judgment is the only thing I'm spending time on.

That's the trade worth making.

Have you built structured AI workflows into your engineering process? I'm curious what's working and what isn't — drop it in the comments.