TL;DR: Most teams use AI tools to generate a function here and autocomplete a line there. This article walks through a structured, open-source workflow that brings Claude Code into every phase of software delivery — from discovery to post-deployment observability — and shows you where it actually saves time.
After reading, you think this can be useful? I've added some example prompts here https://dev.to/anderson_leite/prompt-examples-ai-sdlc-workflow-in-practice-42f0
The Problem with How We Use AI in Development Today
If you're on an engineering team in 2026, you're almost certainly using some form of AI assistance. GitHub Copilot, Claude, ChatGPT. There's no shortage of options.
But ask most engineers how they use it, and the answer looks something like: "I ask it to write boilerplate," "I use it to explain code I didn't write," or "I paste in an error message and see what it says."
That's fine. That's genuinely useful. But it's leaving most of the value on the table.
Real software delivery isn't just writing code. It's:
- Understanding the problem before writing a single line
- Designing for the right architecture (and documenting why)
- Planning the implementation so it doesn't blow up in week 3
- Doing security analysis that usually gets skipped under time pressure
- Planning a deployment that won't need a 2am rollback
- Setting up the observability to know if something broke after you shipped
- Capturing lessons learned so the next feature goes better
AI can assist with all of these. Almost nobody has set it up to do so.
What the Workflow Is
It started with DenizOkcu's claude-code-ai-development-workflow, a clean 4-phase slash-command workflow (Research → Plan → Execute → Review) for Claude Code. No memory system, no security pipeline, no intelligence layer. Just structured prompts that turned Claude Code into something more than an autocomplete engine.
I forked it and extended it into a 10-phase SDLC framework that covers discovery, architecture, security (with pentesting and AI threat modeling), deployment, observability, and retrospectives. Along the way I added a code-intelligence pipeline, semantic retrieval, n8n automation, web scraping via Firecrawl, and a self-improving memory system. You can see a full overview at ai-sdlc.andersonleite.me.
You drop a .claude/ directory into your project and immediately get access to commands like /discover, /research, /design-system, /plan, /implement, /review, /security, /deploy-plan, /observe, and /retro.
Each command generates structured markdown artifacts (decision records, implementation plans, security audits, observability specs) that live in your repo alongside your code.
It's not magic. It's a well-structured prompt system that gives AI the context and constraints it needs to do useful work at each stage of delivery.
What It Doesn't Do
Before diving into the details, let's set expectations:
It doesn't replace your engineers. The artifacts it produces are starting points, not final deliverables. The security audit is a baseline, not a penetration test, though the optional /security/pentest phase with Shannon gets much closer to one.
It doesn't integrate with your existing tools out of the box. It's a file-based workflow. It produces markdown and HTML. Connecting that to Jira, Confluence, or your CI/CD pipeline is on you, though the /n8n integration gives you a path to automating those connections.
It requires Claude Code. This isn't a plugin for your IDE or a GitHub Action. It runs through Claude Code's CLI. If your team isn't comfortable with terminal-based tools, there's a learning curve.
The optional layers have dependencies. claude-context needs Ollama and Milvus (or cloud equivalents). Shannon needs Docker. Firecrawl needs either a cloud API key or a self-hosted instance. The core workflow has zero dependencies, but the extras do.
With that out of the way, here's what each phase actually does.
The 10 Phases, Explained for Humans
Here's what each phase actually does, and more importantly, why you'd care.
Phase 1 — Discover (/discover)
You describe a feature in plain language. The workflow scans your project, detects your tech stack (TypeScript? Terraform? Kubernetes? FastAPI?), and produces a DISCOVERY.md that scopes the work, identifies risks, and creates a STATUS.md that acts as a progress dashboard for everything that follows.
But discovery now does more than scoping. It also generates a repository map and symbol index: a structural fingerprint of your codebase that subsequent phases use to navigate files intelligently instead of guessing. This is part of the code-intelligence layer described below.
Why it matters: How many projects have failed because the scope wasn't clear at the start? This turns a vague brief into a structured starting point — without a meeting.
Phase 2 — Research (/research)
Claude digs into your codebase to understand existing patterns, dependencies, and potential conflict areas before any design happens.
In projects with 50+ files, the research phase activates the full code-intelligence pipeline: building a dependency graph, running targeted searches scoped to the candidate files identified during discovery, reranking results by relevance, and assembling a context pack capped at 8 files. The result is that the AI reads the right files instead of wasting tokens on irrelevant ones.
Why it matters: LLMs that don't understand your existing architecture will suggest things that contradict it. This phase grounds the AI in your project, not a generic one.
Phase 3 — Design (/design-system)
This generates ARCHITECTURE.md, Architecture Decision Records (ADRs), and a PROJECT_SPEC.md. ADRs document why a choice was made — not just what the choice was.
Why it matters for managers: ADRs are one of the most valuable things a team can produce, and one of the most consistently skipped. Having them generated automatically means they actually exist.
Why it matters for engineers: You get a design document you can review and push back on before spending three weeks building the wrong thing.
Phase 4 — Plan (/plan)
Produces an IMPLEMENTATION_PLAN.md with phased tasks, acceptance criteria, and a test strategy.
Why it matters: Planning is where scope creep starts. A structured plan gives the team (and the AI) a contract to execute against.
Phase 5 — Implement (/implement)
Code generation, guided by everything produced in the previous phases. Multi-file, with tests.
Why it matters: This is where most AI workflows start. This one arrives at implementation with full context: the architecture decisions, the codebase patterns, the test strategy, and (if the code-intelligence layer is active) a precise understanding of which files to touch and which to leave alone. The output is dramatically better for it.
Phase 6 — Review (/review)
Generates a CODE_REVIEW.md with an approval or rejection status based on pattern matching and checklist verification. If blocking issues are found, the workflow enters an automatic fix loop (up to 3 iterations) before re-reviewing.
Why it matters for teams: It's not replacing human review. It's raising the floor, catching the obvious issues before a senior engineer has to spend time on them.
Phase 7 — Security (/security)
This is where the workflow has evolved the most since launch. Security is no longer a single phase — it's a four-stage DevSecOps pipeline:
7a — Static Security Audit (/security): Produces a SECURITY_AUDIT.md covering OWASP and STRIDE frameworks, plus a dependency vulnerability report. Every Critical or High finding must include a working proof of concept. "No exploit, no report" is the guiding principle.
7b — Dynamic Penetration Testing (/security/pentest): Runs Shannon, an autonomous AI pentester, against your staging environment inside Docker. The output is a PENTEST_REPORT.md containing only confirmed, reproduced exploits, not theoretical possibilities. This phase is optional but recommended for anything touching authentication, payments, or user data.
7c — AI/LLM Threat Modeling (/security/redteam-ai): Only activates if your stack includes LLM components. Produces an AI_THREAT_MODEL.md covering prompt injection surfaces, alignment constraints, and model-specific attack vectors. If you're not shipping AI features, this phase is automatically skipped.
8 — Hardening (/security/harden): Aggregates all findings from 7a through 7c, prioritizes them (P0 through P3), implements the P0 fixes immediately, and creates tracked issues for the rest. A HARDEN_PLAN.md documents the fix plan, regression tests, and re-verification results.
Why it matters: Security review is the phase that most often gets cut when a deadline approaches. Having a multi-layered automated baseline (from static analysis through dynamic pentesting to AI-specific threat modeling) means something meaningful gets checked even under pressure. The hardening loop ensures findings don't just get documented; they get fixed.
Phase 8 — Deploy (/deploy-plan)
Creates a DEPLOY_PLAN.md with rollout strategy, feature flag guidance, and a rollback playbook.
Why it matters for infra/DevOps teams: "We didn't have a rollback plan" is an embarrassingly common post-incident finding. This generates one by default.
Phase 9 — Observe (/observe)
Outputs OBSERVABILITY.md with metric definitions, alert thresholds, and dashboard specs.
Why it matters: Observability is designed at the start or retrofitted later at great pain. Having a structured spec generated at deployment time is when it's cheapest to implement.
Phase 10 — Retro (/retro)
Generates a RETROSPECTIVE.md and automatically appends lessons learned to CLAUDE.md, the project's AI instruction file.
Why it matters: This is the self-improving piece. The next feature the AI helps build benefits from the lessons of this one. The system gets better over time, even across sessions.
What's New: Code Intelligence, Retrieval, and Extra Capabilities
The workflow started as a 4-phase command set in the original DenizOkcu repo. The fork has grown it significantly. Here's what was added and why it matters.
The Code-Intelligence Layer — Better Results with Fewer Tokens
One of the biggest problems with AI-assisted development at scale is context. An LLM has a finite context window, and if you feed it the wrong files, you get wrong answers. The code-intelligence layer addresses this with a multi-level pipeline that's worth understanding even if you never touch the implementation:
Level 1 — Repo Map + Symbol Index (~3K tokens): During /discover, the workflow scans your project using Glob and Grep (no external tools required) and produces a structural map: file tree, exported symbols, type/name/file/line entries. This is stored in DISCOVERY.md and acts as the foundation for every subsequent phase.
Level 1b — Dependency Graph (repos with 50+ files): During /research, the workflow traces import/export statements to build a file-level adjacency list: who imports whom, who tests whom. This lives only in the LLM's context window (not persisted), so it costs nothing to store but dramatically improves accuracy.
Level 2 — Targeted Search: Instead of reading your entire codebase, the research phase queries only the candidate files identified in Level 1. If the optional claude-context retrieval server is configured, this uses hybrid BM25 + vector search over AST-indexed code chunks. If not, it falls back to Grep and Read. No degradation, just less precision on very large repos.
BM25 is a keyword-ranking algorithm used by ElasticSearch and Lucene; vector search matches by semantic meaning; AST-indexed means the code is split at function/class boundaries using Tree-sitter rather than arbitrary line ranges. See these links for deeper dives:
- https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables
- https://weaviate.io/blog/hybrid-search-explained and https://www.meilisearch.com/blog/hybrid-search-rag
- https://medium.com/@email2dineshkuppan/semantic-code-indexing-with-ast-and-tree-sitter-for-ai-agents-part-1-of-3-eb5237ba687a and the official source https://github.com/tree-sitter/tree-sitter
Level 2b — Reranking (repos with 50+ files, 5+ candidates): Raw search results get scored by keyword overlap (40%), dependency proximity (35%), and file-type relevance (25%). The top 8 candidates are re-ordered by composite score before the AI reads them.
Level 3 — Context Pack (max 8 files): The final step assembles the seed files plus their 1-hop dependency imports and test files, applying progressive read depth (full for core files, partial for supporting ones, sections-only for large files).
The payoff: Instead of feeding Claude 50 files and hoping for the best, the pipeline delivers 8 precisely-selected files with relationship context. This means better answers and lower token usage, which directly translates to lower cost if you're on a pay-per-token plan. On a Team/Max subscription, it means less context window pressure and fewer "lost in context" hallucinations.
The design philosophy is pragmatic: The entire Level 1 pipeline works with Glob and Grep. Zero external dependencies. The optional retrieval layer (claude-context) adds precision for large repos but is never required. If services are down, everything degrades gracefully to the file-based tools that Claude Code already has.
Semantic Code Retrieval (claude-context) — Optional, Powerful
For teams working with larger codebases (200+ files), the workflow supports an optional retrieval layer built on the claude-context MCP server. This isn't custom-built tooling — it's an adopted, maintained package that provides:
- Hybrid BM25 + dense vector search over AST-indexed code chunks
- Tree-sitter parsing that breaks code into semantic units (functions, classes, methods) rather than arbitrary line ranges
- Merkle tree tracking for incremental re-indexing — only changed files get re-processed
-
Embedding via Ollama (
nomic-embed-text) for fully local operation, or cloud providers (OpenAI, Voyage, Gemini) if you prefer - Storage via Docker Milvus (local) or Zilliz Cloud (managed)
Setup is handled by /retrieval/setup, which walks you through choosing your embedding provider and storage backend. Once configured, the research and implementation skills automatically use search_code for targeted retrieval, and the /retrieval command gives you manual search and index management.
The key design decision: retrieval is always optional. Every skill works identically without it. If claude-context isn't configured or the services are unavailable, skills silently fall back to Glob/Grep/Read. This means you can start using the workflow immediately and add retrieval later when your codebase grows large enough to benefit from it.
/visualize — Generate Rich HTML Diagrams, Not ASCII Art
The visualization layer is built on the Visual Explainer skill, and it solves a specific annoyance: when an AI assistant produces a complex comparison table, architecture diagram, or implementation plan, you get ASCII art in the terminal that's barely readable.
The /visual/ commands instead generate self-contained HTML pages (single files with inline CSS, Mermaid diagrams, Chart.js dashboards, and proper typography) that open in your browser. The output goes to ~/.agent/diagrams/, persists across sessions, and can be shared via /visual/share (one-command Vercel deployment).
Available commands include:
| Command | What It Does |
|---|---|
/visual/generate-web-diagram |
HTML diagram for any topic — architecture, flowcharts, ER diagrams |
/visual/diff-review |
Before/after architecture comparison with code review |
/visual/plan-review |
Compare a plan against the actual codebase with risk assessment |
/visual/project-recap |
Mental model snapshot for context-switching back to a project |
/visual/generate-slides |
Magazine-quality slide deck presentation |
/visual/generate-visual-plan |
Feature implementation visualized as a rich page |
/visual/fact-check |
Verify a document's accuracy against actual code |
/visual/share |
Deploy any HTML page to Vercel with one command |
The proactive table rule is worth highlighting: the skill is configured to automatically generate an HTML page whenever it would otherwise render an ASCII table with 4+ rows or 3+ columns. You don't have to ask — it just presents the data in a readable format by default.
The skill also includes opinionated anti-slop guardrails: forbidden fonts (Inter, Roboto as primary), forbidden color palettes (the cyan-magenta-purple neon combination), and forbidden patterns (emoji in section headers, gradient text on headings, glowing animated box shadows). These constraints exist because without them, AI-generated visualizations all look identical — and identically generic. The guardrails force distinctive, intentional design choices.
Why it matters: Documentation and planning artifacts that look good are more likely to be read, shared, and acted on. A rich HTML page with proper diagrams and typography communicates more effectively than markdown in a terminal.
/n8n — Workflow Automation from Inside Claude Code
If your team uses n8n for workflow automation, the /n8n command brings n8n's full capabilities into your Claude Code session via MCP (Model Context Protocol).
After running /n8n/setup (which configures the connection — hosted, Docker, npx, or local dev), you can:
- Search and explore n8n's node library and community templates
- Validate workflow configurations before deploying
- Build workflows by describing what you want in natural language
- Manage running workflows — list, activate, deactivate, trigger, debug
- Inspect executions — see what happened, why something failed
The integration operates in two modes: Basic (always available) gives you documentation, node search, template browsing, and validation. Full (requires an n8n instance connection with API key) adds workflow CRUD, execution management, and live triggering.
Why it matters: n8n workflows often interact with the same systems your code does — Slack notifications, database operations, CI/CD triggers, webhook handlers. Being able to search, build, and debug those workflows from the same terminal session where you're writing code removes a significant context-switch. For teams running the Ticket-to-Code agentic pipeline (where n8n orchestrates multi-agent workflows), this integration closes the loop between the AI development workflow and the automation layer.
/firecrawl — Web Scraping When WebFetch Isn't Enough
Claude Code has a built-in WebFetch tool, but it struggles with JavaScript-rendered pages, anti-bot protection (Cloudflare, etc.), and structured data extraction. The /firecrawl command integrates Firecrawl as a fallback via MCP.
After running /firecrawl/setup, you get:
| Tool | Purpose |
|---|---|
firecrawl_scrape |
Scrape a single URL to clean markdown or structured data |
firecrawl_crawl |
Recursively crawl a site with depth and limit control |
firecrawl_search |
Web search with automatic content extraction |
firecrawl_map |
Discover all URLs on a website (sitemap generation) |
firecrawl_extract |
LLM-powered structured data extraction with a schema |
Why it matters: Research phases often need to pull information from external documentation, API references, or competitor analysis. When the built-in fetch fails on a JS-heavy site, having Firecrawl as a configured fallback means you don't have to leave your terminal to go copy-paste from a browser. It also enables use cases like extracting structured data from product pages or crawling documentation sites for reference material.
Use Cases Worth Highlighting
For Engineering Managers
You're shipping features but struggling with documentation debt, inconsistent architecture decisions, and security reviews that happen too late. This workflow produces structured artifacts at every phase, not as a bureaucratic burden, but as a side effect of working. ADRs, specs, and security reports exist because the workflow generates them, not because someone had to find time to write them.
The STATUS.md dashboard also gives you a single place to see where any feature is in its lifecycle, which reduces the need for "what's the status?" interruptions.
The code-intelligence layer adds another dimension: cost predictability. By ensuring the AI reads 8 precisely-selected files instead of 50, you're spending tokens (and money) on relevant context, not noise. If your team is on a pay-per-token plan, the difference is measurable.
For Software Engineers
Not every ticket needs 10 phases. The repo itself is clear that a typo fix doesn't need a security audit. But for any feature of meaningful complexity, the structure helps you think more clearly, produce better-documented work, and catch problems earlier.
The /hotfix command compresses the workflow into a rapid Research → Fix → Review → Deploy loop for emergencies, keeping structure without slowing you down. The /sdlc/continue command handles session resumption: if you get interrupted mid-workflow, the next session detects incomplete work and offers to pick up where you left off.
The language-specific expert commands (/language/typescript-pro, /language/python-pro, etc.) activate best practices tailored to your stack — strict typing patterns, framework-specific conventions, linting recommendations.
And the /visual/ commands change how you present your work. Instead of pasting ASCII tables into a PR description, you can generate a rich HTML diff review or plan review that your team can actually read.
For Infrastructure and Platform Engineers
The infrastructure-specific commands are where this gets interesting for your world.
/language/terraform-pro enforces module patterns, for_each over count, state isolation, and security scanning with tflint and trivy. /language/kubernetes-pro covers Deployments, RBAC, NetworkPolicies, and GitOps patterns. /language/ansible-pro addresses idempotency, Vault encryption, and Molecule testing.
Cloud-specific commands exist for AWS, Azure, and GCP, each grounded in their respective Well-Architected Frameworks.
The expanded security pipeline (7a → 7b → 7c → 8) is particularly relevant here. Static analysis catches configuration issues; dynamic pentesting with Shannon catches what static analysis misses; the hardening phase ensures findings become fixes, not just entries in a report nobody reads. For infrastructure that handles sensitive data, this is the difference between "we checked the boxes" and "we actually tested the attack surface."
The /observe and /deploy-plan phases produce the documentation that ops teams often never receive from product teams. And the /n8n integration means you can build and debug automation workflows (CI/CD triggers, alerting pipelines, incident response flows) from the same terminal where you manage infrastructure code.
For Teams Adopting AI for the First Time
If your organization is just starting to integrate AI into development and you're not sure where to begin, this workflow gives you structure. Rather than everyone on the team experimenting individually with "just ask the AI," you get a consistent, repeatable process that produces auditable artifacts.
The model routing is also worth understanding: phases requiring deep reasoning (research, design, planning, implementation) use Claude Opus; checklist-style phases (review, security, deploy, observe, retro) use Claude Sonnet, which is significantly cheaper. This isn't just a cost optimization. It's a signal that you don't need the most powerful model for every task. The code-intelligence layer reinforces this: by selecting context precisely, even the checklist phases get better inputs without needing a more expensive model.
The graceful degradation design means you can adopt features incrementally. Start with the 10 phases and nothing else. Add the visualization layer when you want better artifacts. Configure claude-context retrieval when your codebase grows. Set up /n8n or /firecrawl integrations when you need them. Nothing breaks if an optional layer isn't configured — the workflow just uses the tools it has.
Getting Started in 5 Minutes
# Clone the repo
git clone https://github.com/vakaobr/claude-code-ai-development-workflow
# Copy the .claude directory into your project
cp -r claude-code-ai-development-workflow/.claude/ /path/to/your/project/.claude/
# Open your project in Claude Code and run your first discovery
/discover Add user authentication with email and password
That's it. The workflow detects your stack, generates a repo map and symbol index, creates a STATUS.md, and you're off.
If you want the extra capabilities:
/retrieval/setup # Configure semantic code search (optional)
/n8n/setup # Configure n8n workflow automation (optional)
/firecrawl/setup # Configure web scraping fallback (optional)
The Bigger Picture
The reason this workflow exists is captured in a simple observation: most AI-assisted development workflows stop at "write code → review code." But software delivery has ten-plus distinct activities that all benefit from structured AI assistance.
The value of a system like this isn't any single phase. It's the compounding effect: every feature leaves behind a trail of documented decisions, reviewed code, security findings, deployment plans, and lessons learned. Over time, that's a dramatically better-informed team — and a dramatically better-informed AI assistant, thanks to the self-updating CLAUDE.md.
The recent additions (code intelligence, semantic retrieval, visualization, n8n automation, Firecrawl scraping) all serve the same principle: give the AI better context so it produces better results, while keeping the human in control of what ships.
The engineers who get the most out of AI in 2026 won't be the ones who write the best prompts. They'll be the ones who build the best systems around AI: systems that accumulate context, enforce structure, and improve over time.
This is one such system. It's open source, it's free to use, and it might be the most impactful .claude/ directory you ever add to a project.
Found this useful? The repo is at github.com/vakaobr/claude-code-ai-development-workflow. Want to see it in a fancy web page, fully built using this very same workflow? it's here: ai-sdlc GitHub Pages Star it, fork it, adapt it to your team's workflow.
Top comments (0)