SATINATH MONDAL

Posted on May 14

I Built 40 AI Agents That Execute the Entire Software Development Lifecycle

#ai #productivity #agents

You paste a JIRA story into your IDE. Five minutes later, you have structured requirements, architecture decisions, interface contracts, a working implementation, test suites, a security audit, CI/CD pipelines, and monitoring dashboards — all generated autonomously by 40 coordinated AI agents.

No, this isn't a demo. It's a framework called autonomous-sdlc, and it's open source.

pip install git+https://github.com/bitbitcodes/autonomous-sdlc.git
sdlc init .

Let me explain what it is, why I built it, and what I learned about making AI agents actually work together.

The Problem: AI Coding Assistants Are Powerful but Unstructured

Every developer has experienced this: you open Copilot or Claude, paste a feature request, and start coding. The AI is helpful — brilliant, even — but it operates without a process. There's no requirements analysis. No architecture review. No security scan. No quality gate preventing you from shipping a broken abstraction.

You're essentially pair programming with a genius who has no discipline.

What if, instead of a single assistant, you had an entire engineering organization — product managers, architects, developers, testers, security engineers, and reviewers — all as AI agents, all following a rigorous SDLC?

That's what autonomous-sdlc does.

The Architecture: 40 Markdown Agents, Zero Dependencies

The framework has zero runtime dependencies. No server. No process. No Docker container. Every agent is a markdown file.

40 agents = 1 orchestrator + 10 stage agents + 29 subagents

The Three Tiers

Tier 1: Orchestrator — The parent agent that controls the entire lifecycle. It reads your spec, determines complexity, selects the right team of agents, and drives 11 sequential phases. It enforces quality gates between every phase and manages retries when things fail.

Tier 2: Stage Agents (10) — Each owns a phase: Product, Story-Tasks, Architecture, Design, Development, Testing, Security, Review, DevOps, Observability. They dispatch specialized subagents to do the actual work.

Tier 3: Subagents (29) — The workers. A Requirement Parser that structures your messy PRD. A Tech Stack Advisor that evaluates options with trade-off analysis. A Code Generator that implements features. A Secret Scanner that finds hardcoded API keys. And 25 others.

No Magic — Just Structured Prompts

Every agent follows the same format:

## GOAL
[What success looks like — measurable outcome]

## CONSTRAINTS
[Hard limits — what you cannot do]

## CONTEXT
[Files to read, previous attempts, related decisions]

## OUTPUT
[Exact deliverables expected — file paths, formats]

The AI IDE reads the agent's .md file, adopts that persona, executes, and hands off to the next agent. That's it. The "multi-agent orchestration" is just structured markdown and a state machine.

The 11 Phases

The framework executes a complete SDLC in 11 phases:

Phase	What Happens	Key Output
0. Bootstrap	Parse input spec, detect complexity, select team	Normalized spec, project config
1. Product	Requirements analysis, risks, acceptance criteria	Structured requirements, Given/When/Then criteria
2. Story-Tasks	Decompose into epics, stories, tasks with dependencies	Story map, task queue, dependency graph
3. Architecture	High-level design, tech stack selection, ADRs	System design, Architecture Decision Records
4. Design	Interface contracts, data models, integration plans	OpenAPI specs, DB schemas, NFR assessment
5. Development	Implement the codebase	Production code, written directly into your repo
6. Testing	Unit, integration, and regression tests	Test suites with ≥80% coverage target
7. Security	Secret scanning, dependency audit, OWASP review	Security report, remediated vulnerabilities
8. Review	3 blind reviewers assess the full codebase	Review verdicts, remediation
9. DevOps	CI/CD pipelines, Docker, deployment configs	GitHub Actions, Dockerfile, IaC
10. Observability	SLOs, alerting, health checks, runbooks	Monitoring config, dashboards, runbooks

Every phase has a quality gate. If the gate fails, the agent self-corrects: captures the error, analyzes root cause, updates learnings, and retries (max 3 attempts). If it still fails, it escalates to the human.

The Blind Review System

This is the feature I'm most proud of.

After every phase (except Bootstrap and Review itself), the orchestrator dispatches 3 independent reviewers:

Code Review Agent — SOLID principles, patterns, correctness
Maintainability Reviewer — Tech debt, complexity, readability
Performance Reviewer — Bottlenecks, N+1 queries, caching

They run simultaneously and blindly — no shared context between them. This prevents groupthink and sycophantic agreement. An anti-sycophancy check verifies they didn't just rubber-stamp the work.

If ≥2 reviewers find critical issues, the phase fails and gets reworked. This catches problems early — a flawed architecture in Phase 3 is caught before any code is written in Phase 5.

Memory That Persists Across Sessions

AI IDEs have token limits. Long conversations get truncated. Previous sessions are forgotten.

autonomous-sdlc solves this with a 3-tier memory system:

Episodic memory — Full traces of what happened in each phase
Semantic memory — Patterns extracted from the project (e.g., "this codebase uses repository pattern")
Learnings — Mistakes that were made and how they were fixed

Plus CONTINUITY.md — a working memory file the orchestrator reads at the start of every turn and updates at the end. When your IDE session expires and you start a new one, the orchestrator reads CONTINUITY.md and picks up exactly where it left off.

## Current State
- Phase: 5-development
- Status: IN_PROGRESS
- Current task: Implement user registration endpoint
- Blockers: None
- Last action: Generated database migration for users table

No context lost. No repeated work. Just seamless continuation.

Works With 9 AI IDEs

One sdlc init scaffolds everything into a .sdlc/ directory and configures your IDE automatically:

IDE	How It Works
GitHub Copilot	Agent appears in dropdown → select → paste spec
Windsurf	Type `/sdlc.orchestrator` → paste spec
Claude Code	Use `/sdlc-orchestrator` command
Cursor	Context auto-loads → just paste spec
Gemini CLI	Command available in `.gemini/commands/`
opencode	Command available in `.opencode/commands/`
Codex CLI	Command available in `.codex/commands/`
Amp	Command available in `.amp/commands/`
Kilo Code	Command available in `.kilocode/commands/`

The framework uses each IDE's native configuration system. No plugins. No extensions. Just markdown files in the directories your IDE already reads.

What Makes This Different

There are other AI coding tools. Here's why this approach is different:

1. Process, Not Just Code Generation

Most tools generate code. This framework executes an entire SDLC — requirements, architecture, design, implementation, testing, security, review, deployment, monitoring. The code generation is Phase 5 of 11.

2. Zero Vendor Lock-In

Agents are markdown files. The framework has no runtime. You can:

Switch IDEs freely (just re-run sdlc init with a different integration)
Fork and customize any agent prompt
Cherry-pick individual phases instead of running all 11
Read every agent's instructions in plain English

3. Quality Built In, Not Bolted On

Security scanning isn't an afterthought — it's Phase 7 with 4 specialized agents. Code review isn't optional — it's Phase 8 with 3 blind reviewers. Observability isn't "we'll add it later" — it's Phase 10 with SLOs, alerting, and runbooks.

4. Project-Type Agnostic

This isn't just for REST APIs. The framework handles:

APIs (REST, GraphQL, gRPC)
CLI tools (commands, flags, exit codes)
Frontends (React, Vue, component specs)
Mobile apps (screen flows, navigation)
ML pipelines (feature stores, model serving)
Libraries (public API surface, semver)
Desktop apps (window management, native APIs)
Embedded systems (hardware interfaces, protocols)

The Interface Designer adapts its output format to your project type. The Data Model Designer handles relational databases, NoSQL, file storage, in-memory state, and event stores.

Show Me the Money: A Real Example

Here's what happens when you paste this into your IDE:

PROJ-101 User Registration

As a new user, I want to register with email and password.

Acceptance Criteria:
- Valid email + password → 201 Created
- Duplicate email → 409 Conflict

Tech Stack: Python, FastAPI, PostgreSQL

The orchestrator kicks off and you get:

Structured requirements with functional/non-functional split
Risk analysis (what if email service is down?)
User stories with Given/When/Then criteria
Task breakdown with effort estimates
Architecture decisions (why FastAPI, why PostgreSQL, documented as ADRs)
OpenAPI spec for the registration endpoint
Database schema for the users table
Working FastAPI code with proper error handling
Unit and integration tests with ≥80% coverage
Security scan (no hardcoded secrets, dependencies audited, OWASP reviewed)
3 independent code reviews with remediation
Dockerfile + GitHub Actions pipeline
Health check endpoint + SLOs + alerting config

All from a 6-line JIRA story.

Lessons Learned Building Multi-Agent Systems

Agents need constraints, not capabilities

The biggest mistake in agent design is making them too capable. A "do everything" agent produces mediocre results. A narrowly-scoped agent with clear constraints produces excellent results. Each of our 29 subagents does exactly one thing.

Memory beats reasoning

An agent that remembers its mistakes outperforms one with better reasoning but no memory. Our learnings system captures every failure — "tried to use SQLAlchemy 2.0 async syntax but the project uses 1.4" — and prevents the same mistake in every future iteration.

Blind review prevents sycophancy

When one agent reviews another agent's work, it tends to agree. The fix: make reviewers blind (no shared context) and run them in parallel. Our 3-reviewer system catches issues that a single pass never would.

State machines over chains

LangChain-style prompt chains are fragile. A state machine with explicit phases, gates, and retry logic is far more robust. Our orchestrator is essentially a state machine driven by orchestrator.json — every phase transition is explicit and reversible.

Markdown is the right abstraction

Agents don't need Python classes or YAML configs. They need clear instructions in natural language. Markdown is human-readable, git-friendly, and every AI model understands it natively. Our entire framework is ~40 markdown files.

Try It

# No install required
uvx --from git+https://github.com/bitbitcodes/autonomous-sdlc.git sdlc init .

# Or install persistently
pip install git+https://github.com/bitbitcodes/autonomous-sdlc.git
sdlc init .

Then select the sdlc.orchestrator agent in your IDE and paste your spec.

GitHub: github.com/bitbitcodes/autonomous-sdlc

The framework is MIT licensed. Star it, fork it, break it, improve it. PRs welcome.

If you found this useful, follow me for more on AI-driven development, multi-agent systems, and developer tooling.

DEV Community