DEV Community

Cover image for I Built 40 AI Agents That Execute the Entire Software Development Lifecycle
SATINATH MONDAL
SATINATH MONDAL

Posted on

I Built 40 AI Agents That Execute the Entire Software Development Lifecycle

You paste a JIRA story into your IDE. Five minutes later, you have structured requirements, architecture decisions, interface contracts, a working implementation, test suites, a security audit, CI/CD pipelines, and monitoring dashboards — all generated autonomously by 40 coordinated AI agents.

No, this isn't a demo. It's a framework called autonomous-sdlc, and it's open source.

pip install git+https://github.com/bitbitcodes/autonomous-sdlc.git
sdlc init .
Enter fullscreen mode Exit fullscreen mode

Let me explain what it is, why I built it, and what I learned about making AI agents actually work together.


The Problem: AI Coding Assistants Are Powerful but Unstructured

Every developer has experienced this: you open Copilot or Claude, paste a feature request, and start coding. The AI is helpful — brilliant, even — but it operates without a process. There's no requirements analysis. No architecture review. No security scan. No quality gate preventing you from shipping a broken abstraction.

You're essentially pair programming with a genius who has no discipline.

What if, instead of a single assistant, you had an entire engineering organization — product managers, architects, developers, testers, security engineers, and reviewers — all as AI agents, all following a rigorous SDLC?

That's what autonomous-sdlc does.


The Architecture: 40 Markdown Agents, Zero Dependencies

The framework has zero runtime dependencies. No server. No process. No Docker container. Every agent is a markdown file.

40 agents = 1 orchestrator + 10 stage agents + 29 subagents
Enter fullscreen mode Exit fullscreen mode

The Three Tiers

Tier 1: Orchestrator — The parent agent that controls the entire lifecycle. It reads your spec, determines complexity, selects the right team of agents, and drives 11 sequential phases. It enforces quality gates between every phase and manages retries when things fail.

Tier 2: Stage Agents (10) — Each owns a phase: Product, Story-Tasks, Architecture, Design, Development, Testing, Security, Review, DevOps, Observability. They dispatch specialized subagents to do the actual work.

Tier 3: Subagents (29) — The workers. A Requirement Parser that structures your messy PRD. A Tech Stack Advisor that evaluates options with trade-off analysis. A Code Generator that implements features. A Secret Scanner that finds hardcoded API keys. And 25 others.

No Magic — Just Structured Prompts

Every agent follows the same format:

## GOAL
[What success looks like — measurable outcome]

## CONSTRAINTS
[Hard limits — what you cannot do]

## CONTEXT
[Files to read, previous attempts, related decisions]

## OUTPUT
[Exact deliverables expected — file paths, formats]
Enter fullscreen mode Exit fullscreen mode

The AI IDE reads the agent's .md file, adopts that persona, executes, and hands off to the next agent. That's it. The "multi-agent orchestration" is just structured markdown and a state machine.


The 11 Phases

The framework executes a complete SDLC in 11 phases:

Phase What Happens Key Output
0. Bootstrap Parse input spec, detect complexity, select team Normalized spec, project config
1. Product Requirements analysis, risks, acceptance criteria Structured requirements, Given/When/Then criteria
2. Story-Tasks Decompose into epics, stories, tasks with dependencies Story map, task queue, dependency graph
3. Architecture High-level design, tech stack selection, ADRs System design, Architecture Decision Records
4. Design Interface contracts, data models, integration plans OpenAPI specs, DB schemas, NFR assessment
5. Development Implement the codebase Production code, written directly into your repo
6. Testing Unit, integration, and regression tests Test suites with ≥80% coverage target
7. Security Secret scanning, dependency audit, OWASP review Security report, remediated vulnerabilities
8. Review 3 blind reviewers assess the full codebase Review verdicts, remediation
9. DevOps CI/CD pipelines, Docker, deployment configs GitHub Actions, Dockerfile, IaC
10. Observability SLOs, alerting, health checks, runbooks Monitoring config, dashboards, runbooks

Every phase has a quality gate. If the gate fails, the agent self-corrects: captures the error, analyzes root cause, updates learnings, and retries (max 3 attempts). If it still fails, it escalates to the human.


The Blind Review System

This is the feature I'm most proud of.

After every phase (except Bootstrap and Review itself), the orchestrator dispatches 3 independent reviewers:

  1. Code Review Agent — SOLID principles, patterns, correctness
  2. Maintainability Reviewer — Tech debt, complexity, readability
  3. Performance Reviewer — Bottlenecks, N+1 queries, caching

They run simultaneously and blindly — no shared context between them. This prevents groupthink and sycophantic agreement. An anti-sycophancy check verifies they didn't just rubber-stamp the work.

If ≥2 reviewers find critical issues, the phase fails and gets reworked. This catches problems early — a flawed architecture in Phase 3 is caught before any code is written in Phase 5.


Memory That Persists Across Sessions

AI IDEs have token limits. Long conversations get truncated. Previous sessions are forgotten.

autonomous-sdlc solves this with a 3-tier memory system:

  • Episodic memory — Full traces of what happened in each phase
  • Semantic memory — Patterns extracted from the project (e.g., "this codebase uses repository pattern")
  • Learnings — Mistakes that were made and how they were fixed

Plus CONTINUITY.md — a working memory file the orchestrator reads at the start of every turn and updates at the end. When your IDE session expires and you start a new one, the orchestrator reads CONTINUITY.md and picks up exactly where it left off.

## Current State
- Phase: 5-development
- Status: IN_PROGRESS
- Current task: Implement user registration endpoint
- Blockers: None
- Last action: Generated database migration for users table
Enter fullscreen mode Exit fullscreen mode

No context lost. No repeated work. Just seamless continuation.


Works With 9 AI IDEs

One sdlc init scaffolds everything into a .sdlc/ directory and configures your IDE automatically:

IDE How It Works
GitHub Copilot Agent appears in dropdown → select → paste spec
Windsurf Type /sdlc.orchestrator → paste spec
Claude Code Use /sdlc-orchestrator command
Cursor Context auto-loads → just paste spec
Gemini CLI Command available in .gemini/commands/
opencode Command available in .opencode/commands/
Codex CLI Command available in .codex/commands/
Amp Command available in .amp/commands/
Kilo Code Command available in .kilocode/commands/

The framework uses each IDE's native configuration system. No plugins. No extensions. Just markdown files in the directories your IDE already reads.


What Makes This Different

There are other AI coding tools. Here's why this approach is different:

1. Process, Not Just Code Generation

Most tools generate code. This framework executes an entire SDLC — requirements, architecture, design, implementation, testing, security, review, deployment, monitoring. The code generation is Phase 5 of 11.

2. Zero Vendor Lock-In

Agents are markdown files. The framework has no runtime. You can:

  • Switch IDEs freely (just re-run sdlc init with a different integration)
  • Fork and customize any agent prompt
  • Cherry-pick individual phases instead of running all 11
  • Read every agent's instructions in plain English

3. Quality Built In, Not Bolted On

Security scanning isn't an afterthought — it's Phase 7 with 4 specialized agents. Code review isn't optional — it's Phase 8 with 3 blind reviewers. Observability isn't "we'll add it later" — it's Phase 10 with SLOs, alerting, and runbooks.

4. Project-Type Agnostic

This isn't just for REST APIs. The framework handles:

  • APIs (REST, GraphQL, gRPC)
  • CLI tools (commands, flags, exit codes)
  • Frontends (React, Vue, component specs)
  • Mobile apps (screen flows, navigation)
  • ML pipelines (feature stores, model serving)
  • Libraries (public API surface, semver)
  • Desktop apps (window management, native APIs)
  • Embedded systems (hardware interfaces, protocols)

The Interface Designer adapts its output format to your project type. The Data Model Designer handles relational databases, NoSQL, file storage, in-memory state, and event stores.


Show Me the Money: A Real Example

Here's what happens when you paste this into your IDE:

PROJ-101 User Registration

As a new user, I want to register with email and password.

Acceptance Criteria:
- Valid email + password → 201 Created
- Duplicate email → 409 Conflict

Tech Stack: Python, FastAPI, PostgreSQL
Enter fullscreen mode Exit fullscreen mode

The orchestrator kicks off and you get:

  1. Structured requirements with functional/non-functional split
  2. Risk analysis (what if email service is down?)
  3. User stories with Given/When/Then criteria
  4. Task breakdown with effort estimates
  5. Architecture decisions (why FastAPI, why PostgreSQL, documented as ADRs)
  6. OpenAPI spec for the registration endpoint
  7. Database schema for the users table
  8. Working FastAPI code with proper error handling
  9. Unit and integration tests with ≥80% coverage
  10. Security scan (no hardcoded secrets, dependencies audited, OWASP reviewed)
  11. 3 independent code reviews with remediation
  12. Dockerfile + GitHub Actions pipeline
  13. Health check endpoint + SLOs + alerting config

All from a 6-line JIRA story.


Lessons Learned Building Multi-Agent Systems

Agents need constraints, not capabilities

The biggest mistake in agent design is making them too capable. A "do everything" agent produces mediocre results. A narrowly-scoped agent with clear constraints produces excellent results. Each of our 29 subagents does exactly one thing.

Memory beats reasoning

An agent that remembers its mistakes outperforms one with better reasoning but no memory. Our learnings system captures every failure — "tried to use SQLAlchemy 2.0 async syntax but the project uses 1.4" — and prevents the same mistake in every future iteration.

Blind review prevents sycophancy

When one agent reviews another agent's work, it tends to agree. The fix: make reviewers blind (no shared context) and run them in parallel. Our 3-reviewer system catches issues that a single pass never would.

State machines over chains

LangChain-style prompt chains are fragile. A state machine with explicit phases, gates, and retry logic is far more robust. Our orchestrator is essentially a state machine driven by orchestrator.json — every phase transition is explicit and reversible.

Markdown is the right abstraction

Agents don't need Python classes or YAML configs. They need clear instructions in natural language. Markdown is human-readable, git-friendly, and every AI model understands it natively. Our entire framework is ~40 markdown files.


Try It

# No install required
uvx --from git+https://github.com/bitbitcodes/autonomous-sdlc.git sdlc init .

# Or install persistently
pip install git+https://github.com/bitbitcodes/autonomous-sdlc.git
sdlc init .
Enter fullscreen mode Exit fullscreen mode

Then select the sdlc.orchestrator agent in your IDE and paste your spec.

GitHub: github.com/bitbitcodes/autonomous-sdlc

The framework is MIT licensed. Star it, fork it, break it, improve it. PRs welcome.


If you found this useful, follow me for more on AI-driven development, multi-agent systems, and developer tooling.

Top comments (0)