You paste a JIRA story into your IDE. Five minutes later, you have structured requirements, architecture decisions, interface contracts, a working implementation, test suites, a security audit, CI/CD pipelines, and monitoring dashboards — all generated autonomously by 40 coordinated AI agents.
No, this isn't a demo. It's a framework called autonomous-sdlc, and it's open source.
pip install git+https://github.com/bitbitcodes/autonomous-sdlc.git
sdlc init .
Let me explain what it is, why I built it, and what I learned about making AI agents actually work together.
The Problem: AI Coding Assistants Are Powerful but Unstructured
Every developer has experienced this: you open Copilot or Claude, paste a feature request, and start coding. The AI is helpful — brilliant, even — but it operates without a process. There's no requirements analysis. No architecture review. No security scan. No quality gate preventing you from shipping a broken abstraction.
You're essentially pair programming with a genius who has no discipline.
What if, instead of a single assistant, you had an entire engineering organization — product managers, architects, developers, testers, security engineers, and reviewers — all as AI agents, all following a rigorous SDLC?
That's what autonomous-sdlc does.
The Architecture: 40 Markdown Agents, Zero Dependencies
The framework has zero runtime dependencies. No server. No process. No Docker container. Every agent is a markdown file.
40 agents = 1 orchestrator + 10 stage agents + 29 subagents
The Three Tiers
Tier 1: Orchestrator — The parent agent that controls the entire lifecycle. It reads your spec, determines complexity, selects the right team of agents, and drives 11 sequential phases. It enforces quality gates between every phase and manages retries when things fail.
Tier 2: Stage Agents (10) — Each owns a phase: Product, Story-Tasks, Architecture, Design, Development, Testing, Security, Review, DevOps, Observability. They dispatch specialized subagents to do the actual work.
Tier 3: Subagents (29) — The workers. A Requirement Parser that structures your messy PRD. A Tech Stack Advisor that evaluates options with trade-off analysis. A Code Generator that implements features. A Secret Scanner that finds hardcoded API keys. And 25 others.
No Magic — Just Structured Prompts
Every agent follows the same format:
## GOAL
[What success looks like — measurable outcome]
## CONSTRAINTS
[Hard limits — what you cannot do]
## CONTEXT
[Files to read, previous attempts, related decisions]
## OUTPUT
[Exact deliverables expected — file paths, formats]
The AI IDE reads the agent's .md file, adopts that persona, executes, and hands off to the next agent. That's it. The "multi-agent orchestration" is just structured markdown and a state machine.
The 11 Phases
The framework executes a complete SDLC in 11 phases:
| Phase | What Happens | Key Output |
|---|---|---|
| 0. Bootstrap | Parse input spec, detect complexity, select team | Normalized spec, project config |
| 1. Product | Requirements analysis, risks, acceptance criteria | Structured requirements, Given/When/Then criteria |
| 2. Story-Tasks | Decompose into epics, stories, tasks with dependencies | Story map, task queue, dependency graph |
| 3. Architecture | High-level design, tech stack selection, ADRs | System design, Architecture Decision Records |
| 4. Design | Interface contracts, data models, integration plans | OpenAPI specs, DB schemas, NFR assessment |
| 5. Development | Implement the codebase | Production code, written directly into your repo |
| 6. Testing | Unit, integration, and regression tests | Test suites with ≥80% coverage target |
| 7. Security | Secret scanning, dependency audit, OWASP review | Security report, remediated vulnerabilities |
| 8. Review | 3 blind reviewers assess the full codebase | Review verdicts, remediation |
| 9. DevOps | CI/CD pipelines, Docker, deployment configs | GitHub Actions, Dockerfile, IaC |
| 10. Observability | SLOs, alerting, health checks, runbooks | Monitoring config, dashboards, runbooks |
Every phase has a quality gate. If the gate fails, the agent self-corrects: captures the error, analyzes root cause, updates learnings, and retries (max 3 attempts). If it still fails, it escalates to the human.
The Blind Review System
This is the feature I'm most proud of.
After every phase (except Bootstrap and Review itself), the orchestrator dispatches 3 independent reviewers:
- Code Review Agent — SOLID principles, patterns, correctness
- Maintainability Reviewer — Tech debt, complexity, readability
- Performance Reviewer — Bottlenecks, N+1 queries, caching
They run simultaneously and blindly — no shared context between them. This prevents groupthink and sycophantic agreement. An anti-sycophancy check verifies they didn't just rubber-stamp the work.
If ≥2 reviewers find critical issues, the phase fails and gets reworked. This catches problems early — a flawed architecture in Phase 3 is caught before any code is written in Phase 5.
Memory That Persists Across Sessions
AI IDEs have token limits. Long conversations get truncated. Previous sessions are forgotten.
autonomous-sdlc solves this with a 3-tier memory system:
- Episodic memory — Full traces of what happened in each phase
- Semantic memory — Patterns extracted from the project (e.g., "this codebase uses repository pattern")
- Learnings — Mistakes that were made and how they were fixed
Plus CONTINUITY.md — a working memory file the orchestrator reads at the start of every turn and updates at the end. When your IDE session expires and you start a new one, the orchestrator reads CONTINUITY.md and picks up exactly where it left off.
## Current State
- Phase: 5-development
- Status: IN_PROGRESS
- Current task: Implement user registration endpoint
- Blockers: None
- Last action: Generated database migration for users table
No context lost. No repeated work. Just seamless continuation.
Works With 9 AI IDEs
One sdlc init scaffolds everything into a .sdlc/ directory and configures your IDE automatically:
| IDE | How It Works |
|---|---|
| GitHub Copilot | Agent appears in dropdown → select → paste spec |
| Windsurf | Type /sdlc.orchestrator → paste spec |
| Claude Code | Use /sdlc-orchestrator command |
| Cursor | Context auto-loads → just paste spec |
| Gemini CLI | Command available in .gemini/commands/
|
| opencode | Command available in .opencode/commands/
|
| Codex CLI | Command available in .codex/commands/
|
| Amp | Command available in .amp/commands/
|
| Kilo Code | Command available in .kilocode/commands/
|
The framework uses each IDE's native configuration system. No plugins. No extensions. Just markdown files in the directories your IDE already reads.
What Makes This Different
There are other AI coding tools. Here's why this approach is different:
1. Process, Not Just Code Generation
Most tools generate code. This framework executes an entire SDLC — requirements, architecture, design, implementation, testing, security, review, deployment, monitoring. The code generation is Phase 5 of 11.
2. Zero Vendor Lock-In
Agents are markdown files. The framework has no runtime. You can:
- Switch IDEs freely (just re-run
sdlc initwith a different integration) - Fork and customize any agent prompt
- Cherry-pick individual phases instead of running all 11
- Read every agent's instructions in plain English
3. Quality Built In, Not Bolted On
Security scanning isn't an afterthought — it's Phase 7 with 4 specialized agents. Code review isn't optional — it's Phase 8 with 3 blind reviewers. Observability isn't "we'll add it later" — it's Phase 10 with SLOs, alerting, and runbooks.
4. Project-Type Agnostic
This isn't just for REST APIs. The framework handles:
- APIs (REST, GraphQL, gRPC)
- CLI tools (commands, flags, exit codes)
- Frontends (React, Vue, component specs)
- Mobile apps (screen flows, navigation)
- ML pipelines (feature stores, model serving)
- Libraries (public API surface, semver)
- Desktop apps (window management, native APIs)
- Embedded systems (hardware interfaces, protocols)
The Interface Designer adapts its output format to your project type. The Data Model Designer handles relational databases, NoSQL, file storage, in-memory state, and event stores.
Show Me the Money: A Real Example
Here's what happens when you paste this into your IDE:
PROJ-101 User Registration
As a new user, I want to register with email and password.
Acceptance Criteria:
- Valid email + password → 201 Created
- Duplicate email → 409 Conflict
Tech Stack: Python, FastAPI, PostgreSQL
The orchestrator kicks off and you get:
- Structured requirements with functional/non-functional split
- Risk analysis (what if email service is down?)
- User stories with Given/When/Then criteria
- Task breakdown with effort estimates
- Architecture decisions (why FastAPI, why PostgreSQL, documented as ADRs)
- OpenAPI spec for the registration endpoint
- Database schema for the users table
- Working FastAPI code with proper error handling
- Unit and integration tests with ≥80% coverage
- Security scan (no hardcoded secrets, dependencies audited, OWASP reviewed)
- 3 independent code reviews with remediation
- Dockerfile + GitHub Actions pipeline
- Health check endpoint + SLOs + alerting config
All from a 6-line JIRA story.
Lessons Learned Building Multi-Agent Systems
Agents need constraints, not capabilities
The biggest mistake in agent design is making them too capable. A "do everything" agent produces mediocre results. A narrowly-scoped agent with clear constraints produces excellent results. Each of our 29 subagents does exactly one thing.
Memory beats reasoning
An agent that remembers its mistakes outperforms one with better reasoning but no memory. Our learnings system captures every failure — "tried to use SQLAlchemy 2.0 async syntax but the project uses 1.4" — and prevents the same mistake in every future iteration.
Blind review prevents sycophancy
When one agent reviews another agent's work, it tends to agree. The fix: make reviewers blind (no shared context) and run them in parallel. Our 3-reviewer system catches issues that a single pass never would.
State machines over chains
LangChain-style prompt chains are fragile. A state machine with explicit phases, gates, and retry logic is far more robust. Our orchestrator is essentially a state machine driven by orchestrator.json — every phase transition is explicit and reversible.
Markdown is the right abstraction
Agents don't need Python classes or YAML configs. They need clear instructions in natural language. Markdown is human-readable, git-friendly, and every AI model understands it natively. Our entire framework is ~40 markdown files.
Try It
# No install required
uvx --from git+https://github.com/bitbitcodes/autonomous-sdlc.git sdlc init .
# Or install persistently
pip install git+https://github.com/bitbitcodes/autonomous-sdlc.git
sdlc init .
Then select the sdlc.orchestrator agent in your IDE and paste your spec.
GitHub: github.com/bitbitcodes/autonomous-sdlc
The framework is MIT licensed. Star it, fork it, break it, improve it. PRs welcome.
If you found this useful, follow me for more on AI-driven development, multi-agent systems, and developer tooling.
Top comments (0)