I keep getting the same DM:
"Cool, but does AI actually speed up shipping or is this just hype?"
So here's the table from one MVP build that ended last quarter. Numbers measured, not vibed.
Per-feature time, with and without agents
| Activity | Traditional senior team | With agentic SDLC | Speedup |
|---|---|---|---|
| Plan a feature (ARCH doc + tasks) | 2–4h human discussion | 15 min (architect agent + gate:plan) |
~10× |
| Code a small feature | 1–3 days senior dev | 1–2h human review of agent output | ~10–15× |
| Code review | 2–4h, async over 1–2 days | 30 min (5 reviewers in parallel) | ~10× |
| QA / test suite | 1 day | 15 min (qa-engineer agent + spot check) | ~25× |
| Deploy (canary + monitoring) | ~4h | ~10 min (auto-canary) | ~25× |
| End-to-end per feature | ~3–5 days | ~3–5 hours | ~10× |
Shipping one feature drops from "we'll have it next week" to "we'll have it after lunch." For a real working developer, that's the metric that matters more than any "55% cost reduction" headline.
The full MVP picture
OK, but a single-feature speedup doesn't necessarily mean the MVP ships faster. Sometimes you just spend the saving on more reviews. So here's the end-to-end:
| Work area | Traditional (1 PM + 4 eng, ~3 months) | With agents + voice-pack (1 PM + 2 eng + agents, ~6–8 weeks) |
|---|---|---|
| Architecture + ADRs | ~$20K | ~$10K |
| Backend (Twilio, OpenAI, call routing) | ~$80K | ~$30K |
| Frontend (operator dashboard) | ~$40K | ~$15K |
| Database + migrations | ~$15K | ~$5K |
| Test suite + QA | ~$25K | ~$10K |
| Security review + pen test | ~$20K | ~$15K (external pen test still required) |
| Compliance (voice-pack) | ~$42K | ~$22K |
| Deployment + CI/CD | ~$15K | ~$8K |
| Documentation | ~$10K | ~$3K |
| PM + buffer | ~$20K | ~$10K |
| Total | ~$287K | ~$128K |
| LLM compute | $0 | ~$500–$1,500 |
| Wall-clock | ~3 months | ~6–8 weeks |
| Headcount | 1 PM + 4 engineers | 1 PM + 2 engineers + agents |
Cost saving: ~55%. Time saving: ~40–50%. Headcount: 4 → 2 (not 0).
Two important honest details for working devs:
- LLM cost across the whole MVP is $500–$1,500. That's not a few cents – it's four-figure money burned across architecture drafting, code generation, parallel reviewers, deployment automation, and the memory feedback loop. Don't compare a single agent prompt to the full build.
- You still need engineers. "2 engineers + agents" means real humans operating the pipeline, reviewing agent output, fixing the bugs agents create, integrating Twilio (or whatever), and shipping the code. The startup that ships an MVP with zero humans in 2026 doesn't exist.
What is "the agents" actually doing?
This is the part where most posts wave hands. The reality: thirty-four specialist agents, eight stages, two human gates per feature. Architecture diagram here: greatcto.systems/architecture – every box on the SVG is clickable to that agent's source on GitHub.
Daily-driver agents you'll see fire most:
-
architect – drafts ARCH.md + ADR + cost estimate, before
gate:plan - pm – decomposes into beads tasks with explicit dependencies, parallel-friendly
- senior-dev (×N) – claims a task, TDD, isolated worktree, ships diff
- qa-engineer – type-check + lint + tests + coverage
- security-officer – OWASP, CVE scan, secret detection
- code-reviewer – 12-angle review on the final diff
- devops – canary + health checks + auto-rollback
- l3-support – production triage + postmortem
-
continuous-learner – extracts lessons →
.great_cto/lessons.md
Plus 26 archetype-specific reviewers that fire only when their domain triggers – voice-AI, healthcare, fintech, robotics, etc. The point isn't 34 always-on agents. The point is 5–7 fire on any given PR, and which 7 depend on what your repo looks like.
The compliance packs (10 of them)
If you ship into a regulated industry, agentic SDLC alone isn't enough – you also need the right reviewer agents to know which gates to wire. Hence: packs.
A pack triggers on industry signals in your repo (e.g. twilio in package.json → voice). It attaches a specialist reviewer agent, generates a threat model, and wires named human gates. One-line each:
-
voice-pack –
twilio,livekit,deepgram,elevenlabs→ TCPA + state recording consent + STIR/SHAKEN + PCI redaction -
clinical-pack –
clinical,PHI,SaMD,CDS→ FDA SaMD classification + HIPAA + 21 CFR Part 11 -
hr-ai-pack –
recruit,candidate,ATS→ NYC LL 144 AEDT bias audit + EEOC + EU AI Act Annex III -
api-platform-pack –
REST,GraphQL,webhook,OpenAPI→ OAuth 2.1 + RFC 8594 Sunset + HMAC webhook signing + idempotency -
lending-pack –
loan,BNPL,credit,FCRA,ECOA→ ECOA Reg B adverse-action + BISG fair-lending + NMLS state matrix -
clinical-trials-pack –
CTMS,EDC,eConsent,FHIR,HL7→ ICH-GCP + Part 11 audit trail + CDISC + IRB-ready -
robotics-pack –
cobot,ROS 2,surgical robot→ ISO 10218 + IEC 61508 + HARA + SROS2 -
em-fintech-pack –
RBI,CBN,BSP,UPI,PIX,M-Pesa→ India DPDP + cross-border + license strategy -
climate-pack –
Verra,Gold Standard,Scope 1/2/3,CDP,CSRD→ MRV methodology + biosecurity -
drug-discovery-pack –
binding affinity,ADMET,AlphaFold,LIMS,GLP→ applicability domain + IQ/OQ/PQ + ALCOA+
Each pack adds 1–4 reviewer agents, named human gates, eval fixtures, and a required-artefact list. Full breakdown with company catalogues at greatcto.systems/packs.
How detection works (the part HN readers will ask)
{
name: 'voice-pack',
signals: {
deps: ['twilio', '@livekit/agents', 'deepgram-sdk'],
keywords: ['voice agent', 'IVR', 'phone tree'],
files: ['twilio.config.*', 'livekit.yaml'],
},
attaches: {
archetypes: ['ai-system', 'agent-product'],
reviewer: 'voice-ai-reviewer',
gates: ['gate:voice-compliance'],
}
}
Exact-match keyword scanning, not fuzzy substring. 'twilio' matches 'twilio' in dependencies, not 'twilio-helpers' in README. Keeps false-positive overlay attachment under 1%.
Confession on that 1%: v0.1 did fuzzy substring matching and voice-pack triggered on a static-site-generator repo whose README said "we explicitly do not use Twilio." Spent an hour wondering why a blog generator was getting a TCPA threat model. Also, I shipped voice-pack without
'phone'in the keyword list for two weeks. Two startups installed it, shipped voice features, the pack sat there politely without firing once. The boilerplate every new pack now starts from has a rule: include the most obvious keyword first, not last.
Packs stack additively. twilio + stripe + livekit → voice-pack + commerce-pack. If two packs name the same gate, the kernel dedupes by name. Reviewers run in parallel on the same PR; verdicts aggregate to one APPROVED / BLOCKED chip at gate:ship.
Source: skills/great_cto/packs/, packages/cli/src/packs.ts.
Install + try
npx great-cto init
Runs locally. MIT. Pay your own LLM API. Works inside Claude Code, Cursor, OpenAI Codex CLI, Aider, and Continue via AGENTS.md + MCP.
After init:
/start "add a voice agent for restaurant order-taking"
Architect agent drafts ARCH doc. PM decomposes into beads tasks. gate:plan waits for your approval. Then senior-dev agents claim tasks in parallel; 5 reviewer agents fan out on the resulting diff; gate:ship waits for your approval again. Two clicks per feature. The rest runs unattended.
What does NOT speed up
The honest disclaimer because it matters more than the speedup headline:
- External audit cycles still take their natural time (LL 144 auditor ~2-4 weeks, FDA pre-sub 60-90 days)
- IRB approval still takes 2-3 months
- Regulator meetings still need to be scheduled
- Wet-lab validation is still real biology
- HARA signoff is a single calendar moment a human owns
Anything requiring another organization to commit time runs at human speed. The LLM accelerates your codebase and your compliance discovery. It doesn't accelerate someone else's calendar.
TL;DR
- Per-feature time drops ~10× (3–5 days → 3–5 hours). MVP wall-clock drops ~40–50% (3 months → 6–8 weeks). Cost drops ~55%.
- LLM cost across the WHOLE MVP is $500–$1,500. Not free, not trivially cheap.
- Headcount drops 4 → 2 engineers + agents. Not 0. You still need humans.
- 10 compliance packs cover voice-AI, clinical, HR-AI, API platforms, lending, clinical trials, robotics, EM fintech, climate-MRV, drug discovery.
- Architecture diagram: greatcto.systems/architecture. One real run walked stage-by-stage: greatcto.systems/proof. MTTR benchmark methodology: docs/benchmarks/MTTR.md.
- Try:
npx great-cto init. ⭐ if useful: github.com/avelikiy/great_cto.
Full deep-dive with per-pack details + the realistic MVP economics breakdown + the runway math is on Hashnode: Ten compliance packs for ten regulated industries.
Top comments (0)