Alexander Velikiy

Posted on May 16 • Edited on May 17 • Originally published at greatcto.systems

What $1.4M of compliance work looks like in 14 hours – ten packs, ten regulated industries

#ai #agents #sdlc #compliance

Startups have often reached out to me with versions of the same problem. Their engineering teams could ship a regulated feature in three days. The compliance setup around it – PCI scope-reduction analysis, SAQ-A vs SAQ-D paperwork, consent-flow rewrite to stay out of cardholder-data territory, legal review – was taking six weeks and costing each client about $42,000 in fees.

Three days of code. Six weeks of compliance. That's the ratio I kept seeing.

This isn't a complaint about lawyers or compliance – those people are doing real work that prevents real disasters. It's a complaint about how un-automated the work is. Most of those six weeks at each client were spent on questions that already had answers: which SAQ form applies, what idempotency-key contract looks like under PCI, whether the webhook signature can replay-attack the refund flow. The answers exist. They're just not in any client's codebase and they're not in any client's CI.

That's what I've been building for them. Ten compliance "packs" for GreatCTO, one per regulated industry, that wire the gates the day the team scaffolds a project – not the day the auditor lands.

Below: what each pack does, what problem it solves, how long the same work takes humans, and (importantly) what the pack does not replace.

How a pack works, in one paragraph

A pack is a small overlay that attaches when GreatCTO detects industry signals in your repo (twilio in package.json → voice; clinical in README → healthcare; RBI mentioned anywhere → emerging-markets fintech). When it attaches, it adds one to four specialist reviewer agents, a threat-model template that gets filled out before code is written, two to four named human gates, and a list of evidence artifacts that must exist before you can ship. Every gate decision is logged with rationale. The point isn't to replace the human signature – it's to deliver the auditor a tidy evidence package instead of a panicked Slack thread.

(Quick confession on the detection: the first version did fuzzy substring matching, and voice-pack happily triggered on a static-site-generator repo whose README said "we explicitly do not use Twilio." Spent an hour wondering why a blog generator was getting a TCPA threat model. Exact-match keywords only, since then.)

OK, the ten packs.

1. voice-pack – for anyone whose product makes phone calls

The problem. TCPA (US federal), state recording-consent laws (nine states require all-party consent), STIR/SHAKEN caller-ID rules, E911, CVAA accessibility, PCI redaction if you ever capture a card number on a call. These are eleven separate compliance surfaces, and a founder shipping a "voice agent" MVP touches at least seven of them by accident.

What the pack does. Detects voice signals (twilio, livekit, deepgram, elevenlabs, IVR), attaches the voice-ai-reviewer agent, generates a threat-model document that lists which states' consent rules apply to your call routing, wires gate:voice-compliance so no code ships before that threat model is signed. Ships four eval fixtures that automatically test for: prompt-injection from caller speech, synthesized-voice disclosure, identity drift across call handoffs, PII leakage in transcripts.

Human-equivalent work. I priced this out properly the last time I did it the manual way: ~80 hours of communications-law consulting ($28K at $350/h) plus ~120 hours of senior backend work to wire the consent logging properly. Six weeks of calendar time before launch. The pack does the equivalent threat-modeling + gate wiring in ~45 minutes of LLM time.

Embarrassing detail to call out up front: the first voice-pack threat-model template was 800 words long and read like a law-school essay. Three different auditors politely asked if I could make it shorter. v2 is 200 words and now they cite it back to me. Compliance auditors, it turns out, prefer one-page documents to thirty-page ones.

Who's working in this niche (37 startups in our catalogue): Bolna, Vapi, Retell AI, Sierra, Cresta, Phonely, Sonia, careCycle, Assort Health, Uplift AI.

2. clinical-pack – for medical AI

The problem. "We're decision support, not a medical device" is the most expensive sentence in healthcare AI. The FDA's CDS guidance is narrower than founders read it. The minute your model outputs anything that influences a clinical decision without leaving the four-corners test intact, you're a Software as a Medical Device (SaMD) and you owe the FDA a pre-market submission. Get the SaMD class wrong and you're shipping 18 months later than planned.

What the pack does. Two reviewers: ai-clinical-reviewer for the clinical-judgment surface, fda-reviewer for SaMD classification. The pack walks your repo through the SaMD four-corners test, classifies as Class A/B/C/D, generates the right pre-submission paperwork outline (510(k) / De Novo / PMA), and wires gate:samd-class as the gating decision. Also covers GMLP-10, post-market surveillance plan, EU MDR/IVDR if you ship in Europe, HIPAA technical safeguards.

Human-equivalent work. SaMD regulatory consulting runs $400–600/h, and the classification analysis alone is typically a 60–80 hour engagement (~$36K). Then 120+ hours of dev work building the Part 11 audit trail. Three months of work before your first FDA conversation. Pack runs the classification + evidence framework in ~60 minutes.

Who's working in this niche (40 startups): Assort Health, careCycle, Sonia, Knowtex, mdhub, FutureClinic, Ora AI, Empirical Health, AOA Dx, Nest Genomics.

3. hr-ai-pack – for AI hiring tools, screeners, interviewers, performance review

The problem. NYC Local Law 144 has been in effect since July 2023. Any AEDT (automated employment decision tool) used on a NYC resident requires an independent bias-audit posted publicly. Fines run up to $1,500 per candidate per day counting from the day the AEDT first ran. The EEOC's iTutorGroup settlement ($365K, age-discrimination AI hiring) signaled enforcement intent. EU AI Act puts employment in Annex III (high-risk). And nobody on your team is the right person to design the bias audit.

What the pack does. Detects HR signals (recruit, hiring, candidate, resume, interview, ATS, talent). Attaches hr-ai-reviewer, wires gate:aedt-audit as the annual checkpoint, generates the data-collection scaffolding the third-party auditor needs (selection rates by demographic, four-fifths-rule analysis, model card with feature lineage), and ships eval fixtures for prompt-injection-via-resume, candidate consent flow integrity, model-card transparency.

Human-equivalent work. External bias-audit engagement: $15–50K depending on auditor (we use third parties for the signoff – pack does not replace them). Internal prep: ~80 hours of someone reading the LL 144 spec + EEOC guidance + EU AI Act Annex III. Annual recurring cost: ~$25K. The pack does the prep – the evidence package the auditor receives – in roughly 30 minutes per release cycle.

Who's working in this niche (27 startups): Dover, Apriora, Greenhouse, Stepful, Mederva Health, Untether Labs, DirectShifts, Axle Health.

4. api-platform-pack – for products whose primary surface is an API

The problem. This pack solves the boring stuff that bites you at 7am when the on-call alert fires. Webhook signature drift after a key rotation. Replay attacks because nobody set a timestamp tolerance. Idempotency keys reused across users so two customers get charged for the same transaction. Public API deprecation without a Sunset header so partners' integrations silently break. None of this is glamorous. All of it ships every quarter at every startup I've worked at.

What the pack does. Attaches api-platform-reviewer, enforces OAuth 2.1 with PKCE, RFC 8594 Sunset on deprecations, HMAC + replay-window webhook signing, idempotency-key contracts on writes, OpenAPI 3.1 conformance, semver enforcement on the SDK with backward-compat matrix tests, and supply-chain hardening (Sigstore-signed releases, OpenSSF Scorecard ≥ 7). Three gates: gate:semver-contract, gate:sunset-rfc, gate:ship.

Human-equivalent work. Wiring all of the above into a new API platform takes a senior backend engineer roughly 160 hours (4 weeks at standard pace). I've personally redone this stack at three startups. The pack delivers the same setup in ~45 minutes and – more importantly – keeps it audited on every PR thereafter.

Who's working in this niche (46 startups): Vapi, Retell AI, Deepgram, ElevenLabs, LiveKit, Parachute, Medplum, Patch.

5. lending-pack – for credit, BNPL, payroll advance, anything with adverse action

The problem. The CFPB explicitly said in Circular 2023-03: "the black-box AI denied your application" is not a permissible adverse-action notice under Regulation B. Your ML underwriter must produce specific reasons in plain language. Plus FCRA furnisher rules. Plus NMLS state matrix. Plus BISG fair-lending analysis (disparate impact across protected classes). Plus the new CFPB Section 1071 reporting if you're a small-business lender. Plus state usury caps that vary from 6% to 36%.

What the pack does. Attaches lending-credit-reviewer, wires gate:fair-lending (a disparate-impact report runs on every model release before ship), and generates the templates for ECOA-compliant adverse-action notices that explain feature contributions in human language. Also handles MLA for military borrowers, the state usury matrix, and the BISG demographic estimation pipeline.

Human-equivalent work. Fair-lending consulting is ~$600/h. A clean ECOA-compliant adverse-action notice rewrite plus BISG analysis plus state usury matrix takes a fair-lending statistician + a compliance lawyer ~6 weeks and around $90K. Recurring per model release: ~$25K. Pack does the disparate-impact analysis + notice templates on every release in ~90 minutes.

Who's working in this niche (38 startups): Aspire, OnDeck, MNT-Halan, Tabby, Tamara, SaveIN, Eden Care, Remedial Health.

6. clinical-trials-pack – for CTMS, EDC, eConsent, biomedical-data platforms

The problem. 21 CFR Part 11 audit trails. ALCOA+ data integrity (Attributable, Legible, Contemporaneous, Original, Accurate, plus Complete / Consistent / Enduring / Available). If an FDA inspector asks for who changed this specific value on this specific subject at this specific time and why and your system can't produce it, your trial gets paused. Theranos paid an estimated $20M+ partly because their LIS audit trail couldn't meet Part 11. Your CTMS or EDC has logging – yes – but it doesn't capture the reason for change in the operator's own words.

What the pack does. Attaches two reviewers: clinical-trials-reviewer for the trial workflow + bio-data-reviewer for biomedical data flows (FHIR, HL7, OMOP, DICOM, genomics). Wires gate:irb-ready (IRB submission package complete before any patient enrolls), gate:part11-validation (Computer System Validation IQ/OQ/PQ before production go-live). Covers CDISC standards, Safe Harbor / Expert Determination de-identification, EU CTIS.

Human-equivalent work. Standing up a Part 11-compliant trial system from zero is roughly 6 months of work by a CSV (Computer Systems Validation) consultant ($250–400/h, ~480 hours = $120K–$190K), plus internal QA time. Pack delivers the validation framework + audit-trail patterns + IRB package outline in roughly 2 hours.

Who's working in this niche (20 startups): Curebase, Medable, Veeva Systems, Castor, AOA Dx, Nest Genomics, Syntra, Medplum.

7. robotics-pack – for cobots, surgical robots, autonomous vehicles, drones

The problem. ISO 10218 / TS 15066 govern collaborative-robot operations. The robot must stop within milliseconds when force/torque sensors detect contact above the published human-tolerance threshold per body region. The HARA (Hazard Analysis and Risk Assessment) must be signed by a licensed safety engineer before design freeze – not after, not during integration testing. If your cobot injures someone and the HARA wasn't signed-off pre-design-freeze, OSHA general-duty clause and product-liability law become very personal very fast.

What the pack does. Attaches robotics-safety-reviewer (+ fda-reviewer paired in for surgical robots). Wires gate:hara-signoff (load-bearing – pack will NOT let you proceed without this), enforces SROS2 secure DDS, OTA update strategy, power-profiling + watchdog patterns, hardware-in-the-loop (HIL) test design, Sigstore-signed firmware. The pack generates the HARA evidence package; the human safety engineer still signs it.

Human-equivalent work. Functional-safety consultants run $400–600/h. A clean HARA + ISO 10218 compliance package for a new cobot is ~12 weeks of engagement (~$120K–$200K). Pack does the evidence-and-template generation in ~3 hours; the human signoff remains mandatory and is not something to automate away.

Who's working in this niche (27 startups): Andromeda Surgical, Revolve Surgical, Figure AI, 1X Technologies, Boston Dynamics, Skild AI, Zeon Systems, NeuroBionics.

8. em-fintech-pack – for fintechs operating outside US/EU/UK

The problem. US-founded fintechs entering India / Brazil / Nigeria / Indonesia / Philippines often assume they can rent a banking partner's license forever. Sometimes you can. Regularly the regulator changes their mind mid-year – RBI did exactly this with the 2024 prepaid-instrument and cards-network rules. Partnered fintechs woke up to frozen wallets. MNT-Halan reached unicorn status partly because their license strategy was on file with the Central Bank of Egypt from day one.

What the pack does. Detects country signals (India, Nigeria, Brazil, Indonesia, Philippines, Mexico, Kenya), regulator names (RBI, CBN, BSP, OJK, MAS, BCB), and rail integrations (M-Pesa, UPI, PIX, GCash, OVO, DANA). Attaches emerging-markets-fintech-reviewer, wires gate:license-strategy (forces the own-license vs. partner-bank decision per jurisdiction before launch), and generates the jurisdiction-specific compliance matrix for KYC tiering, cross-border remittance, and local-rails integration patterns.

Human-equivalent work. Jurisdictional fintech-law consulting in India alone is $400–500/h, and a clean license-strategy analysis for one country is 4–8 weeks of engagement (~$60K–$160K). Expand to three countries → $180K–$480K. Pack does first-pass analysis + decision-package outline for each jurisdiction in ~60 minutes.

Who's working in this niche (40 startups): Razorpay, Aspire, MNT-Halan, Bolna, Remedial Health, SaveIN, Digi-Prex, Tabby, Tamara, Infiuss Health.

9. climate-pack – for carbon MRV, ESG reporting, synbio platforms

The problem. Methodology choice for carbon credits is a 10-year decision dressed up as a config option. In January 2023 a Guardian investigation found ~94% of analyzed Verra rainforest credits were "phantom credits." Disney, Shell, Gucci quietly retired their positions. If your platform issued credits under that methodology, every buyer's purchase is retroactively suspect. You cannot change methodology retroactively without re-issuing the entire credit batch – which means refund + reputational damage. Plus the new CSRD (EU) + CBAM disclosure rules. Plus biosecurity if you're touching synbio.

What the pack does. Attaches two reviewers: climate-mrv-reviewer for the MRV pipeline and biosecurity-reviewer for dual-use synbio. Wires gate:mrv-methodology ("cannot change retroactively" – the heaviest gate in the pack), gate:durc-signoff for Dual-Use Research of Concern, gate:open-weights-release for generative bio-models. Covers GHG Protocol Scope 1/2/3, Verra/Gold Standard/Puro, SBTi target validation, ISSB, IGSC HSP v2 nucleic-acid screening.

Human-equivalent work. Climate MRV consulting is ~$300–500/h. A clean methodology selection + documentation package is ~2 months of work (~$60K–$80K). Pack delivers the decision document + Verra-vs-Gold-Standard-vs-Puro comparison matrix in ~90 minutes.

Who's working in this niche (25 startups): Watershed, Persefoni, Sweep, Greenly, Plan A, Pachama, Sylvera, Patch.

10. drug-discovery-pack – for ML-driven drug discovery, GLP labs, cloud labs

The problem. Generative chemistry / protein-design models are confidently wrong outside their training distribution. The bug pattern: model produces a target with 0.92 predicted binding score, you spend $80K–$400K on wet-lab validation, it doesn't bind, post-mortem reveals the target was outside the model's applicability domain. There's no regulator here – there's just a co-founder whose lab budget just went up in smoke. Plus, if your synthesis order accidentally matches a pathogen-of-concern sequence, your synthesis vendor's legal team calls before your chemistry team does.

What the pack does. Attaches four reviewers – drug-discovery-ml-reviewer, glp-glab-reviewer, lab-automation-reviewer, biosecurity-reviewer. Wires gate:model-card-signoff (applicability domain + calibration verified before wet-lab spend – the cost-saving gate) and gate:csv-validation (IQ/OQ/PQ before GLP/GMP production – the regulatory gate). Covers ChEMBL/PubChem dataset versioning, ALCOA+ data integrity, SiLA2 instrument-driver standards, ASTM E2500.

Human-equivalent work. Standing up ML ops + applicability-domain monitoring + GLP-compliant data integrity + IQ/OQ/PQ validation framework is roughly 5 months of work across an ML engineer, a QA-CSV consultant, and a lab-automation specialist (~$180K–$280K). Pack does the framework + first model-card + validation outline in ~3 hours.

Who's working in this niche (33 startups): Future Fields, Ångström AI, Talus Bio, Abalone Bio, om therapeutics, Olio Labs, ParcelBio, Zeon Systems.

The compliance-slice comparison (with vs. without packs)

Important scope note before the numbers: this table covers only the compliance setup slice. It is the part the packs replace: regulation reading, threat-model drafting, gate definition, evidence-framework setup, and the legal-and-audit work attached to that slice. It does not include the rest of the product – architecture, backend, frontend, database, integration code, test suite, security audits, deployment, or any ongoing feature work. Those are real costs, they don't disappear, and we'll look at them next.

What the packs replace inside the compliance slice: the discovery work – reading the regulation, mapping it to your stack, drafting the gate set, writing the threat model, building the evidence framework. The part where a senior person spends 80–200 hours per industry. Mechanical, repeatable, and a great use of an LLM.

What the packs do not replace, even inside the compliance slice: third-party audits, regulator meetings, legal sign-off on the generated artifacts, safety-engineer signatures on HARA, the human review of the threat model.

End-to-end numbers for the compliance slice only, with vs. without each pack:

Pack	Compliance slice without pack	Compliance slice with pack (LLM + remaining human work)	Slice saving
voice-pack	~$42K · 6 weeks	~$22K (≈ $20K integration code for gates + $2K legal review)	~48%
clinical-pack	~$60K · 3 months	~$28K (≈ $25K dev + regulatory pursuit + $3K legal + $0.90 LLM)	~53%
hr-ai-pack (prep)	~$25K · 80h	~$8K + external LL 144 auditor at $15–50K either way	~68% on prep
api-platform-pack	~$48K · 4 weeks	~$18K (≈ $15K dev + $3K legal + $0.50 LLM)	~63%
lending-pack	~$90K · 6 weeks	~$32K (≈ $25K dev + $7K legal + $0.70 LLM)	~64%
clinical-trials-pack	~$150K · 6 months	~$55K (+ IRB approval, non-automatable)	~63%
robotics-pack	~$150K · 12 weeks	~$60K (+ licensed safety-engineer signature, non-automatable)	~60%
em-fintech-pack (per jurisdiction)	~$100K · 6 weeks	~$35K	~65%
climate-pack	~$70K · 8 weeks	~$28K	~60%
drug-discovery-pack	~$220K · 5 months	~$80K (wet-lab spend not reducible)	~64%

Median saving on the compliance slice: ~63%. Across all ten industries, the compliance-slice cost drops from roughly $955K to ~$366K, and the calendar time on the compliance work itself from ~21 months to ~7 months. That's a defensible 60 %-ish reduction on this specific portion of product work.

And the rest of the product?

Compliance is one slice of an MVP – typically 10–25 % of the total build cost for a regulated AI product. The other 75–90 % is the actual engineering: architecture, backend, frontend, database, integration code, test suite, security audits, deployment, monitoring, and the long tail of ongoing features.

Agents help here too – but the savings shape is different, the LLM costs are much higher than the cents-per-pack numbers above, and the human team doesn't get to be smaller, it just gets to ship more.

A realistic end-to-end picture for shipping a voice-AI MVP (just one example – pick your archetype, numbers scale similarly):

Work area	Traditional team (1 PM + 4 engineers, 3 months)	With agentic SDLC (1 PM + 2 engineers + agents, 3 months)
Architecture + ADRs	~$20K (senior architect, 2 weeks)	~$10K (1 sr eng + architect agent; ~$5K payroll + LLM)
Backend (Twilio, OpenAI, call routing)	~$80K (2 engineers · 8 weeks)	~$30K (1 engineer + senior-dev agents · 6 weeks)
Frontend (operator dashboard, call review UI)	~$40K (1 frontend · 4 weeks)	~$15K (1 engineer + agents · 3 weeks)
Database + migrations	~$15K	~$5K (agent-assisted schema + human review)
Test suite + QA	~$25K (1 QA · 6 weeks)	~$10K (qa-engineer agent + 1 human reviewer)
Security review + pen test	~$20K (external pen test)	~$15K (external pen test still required + security-officer agent for internal review)
Compliance (voice-pack covers)	~$42K	~$22K
Deployment + CI/CD	~$15K	~$8K
Documentation	~$10K	~$3K (agent-generated, human review)
PM + buffer	~$20K	~$10K (pm agent + lighter human PM)
MVP total	~$287K	~$128K
LLM compute across the whole MVP	$0	~$500–$1,500
Wall-clock time to MVP	~3 months	~6–8 weeks
Headcount	1 PM + 4 engineers	1 PM + 2 engineers + agents

Median MVP saving with full agentic SDLC + the voice-pack: roughly 55 % on cost, ~45 % on wall-clock time. The LLM cost across the entire product build is in the four-figure range – single dollars per pack, several hundred dollars across all the agentic SDLC work, plus a few hundred dollars in the slower archetype-reviewer + memory-loop tail.

Two things this table makes obvious:

The compliance pack isn't the headline saving. It's one of nine work areas that gets cheaper. The packs save 60 % of compliance cost; the agentic SDLC saves a similar fraction across each of the other work areas. The compounding 55 % MVP saving comes from doing both, not from one heroic LLM call.
You still need engineers. The "2 engineers + agents" team is real. Those engineers operate the pipeline, review agent output, fix the bugs agents create, integrate the actual product, and ship the code. The headcount drops from 4 → 2 – but it doesn't drop to 0. The startup that ships an MVP with no humans in 2026 doesn't exist.

What you actually get: the same MVP, faster, with half the headcount, at roughly half the cost – and a much bigger fraction of the work shifts from "writing code" to "reviewing agent output and arguing with auditors." Different work shape. Smaller team. Larger output.

Time, not just money

The dollar number (55 % MVP saving) is one frame. The time number is a different – and for a runway-constrained startup, arguably more important – one. Building this stack burns calendar time you don't get back. So here's where the wall-clock actually changes.

Wall-clock to ship the MVP

Traditional team: ~3 months for 1 PM + 4 engineers to ship a voice-AI MVP, end to end. That's the version that includes ~6 weeks of sequential compliance work running mostly parallel to feature dev.

With agentic SDLC + voice-pack: ~6–8 weeks for 1 PM + 2 engineers + agents. About a 40–50 % wall-clock reduction, not just a cost reduction.

Why faster:

Parallel agent execution. Architecture draft, backend scaffolding, test-suite generation, and threat-model writing all run in parallel. In the traditional version each is a sequential phase.
The compliance team is no longer the long pole. The pack generates the threat model in 45 minutes. The 6-week "waiting on legal to read the regulation" tail disappears.
Reviewer fan-out is parallel. Five reviewers (QA, security, performance, archetype-specific, code review) run concurrently on every PR – 5–8 minutes vs. 1–2 days of sequential human review.

Per-feature time

The smaller, more visceral comparison – what it feels like day to day to ship one feature:

Activity	Traditional senior team	With agents	Speedup
Plan a feature (ARCH doc, task breakdown)	2–4 hours of human discussion + writing	15 min (architect + pm agents → `gate:plan`)	~10×
Write the code (small feature)	1–3 days of senior dev	~1–2 hours of human review of agent output	~10–15×
Code review	2–4 hours, often async over 1–2 days	30 min (5 reviewers running in parallel)	~10×
QA / test suite	1 day	15 min (qa-engineer agent + 1 human spot check)	~25×
Deploy (canary + monitoring)	~4 hours	~10 min (auto-canary)	~25×
End-to-end per feature	~3–5 days	~3–5 hours	~10×

So shipping one feature drops from "we'll have it next week" to "we'll have it after lunch." That's the kind of speedup that changes what you're willing to build.

The investigative + maintenance numbers

Different shape of work, different speedup. These are the ones I personally care about most because they show up in on-call rotations:

Activity	Traditional	With agents	Why faster
First commit by a new team member	2–3 weeks of onboarding	~5 minutes (new agent reads CLAUDE.md + lessons.md)	No human onboarding needed for agents
Time to investigate a recurring P0 incident	~4 hours on first occurrence	~30 min on second occurrence	Memory layer remembers the detection pattern – 94 % MTTR reduction documented here
Time to add a brand-new compliance archetype	4–6 weeks (research + integration)	~3 days (pack overlay + product customization)	~70 % of the work is templates the pack already ships
Time to update an existing pack when a regulation changes	1–2 weeks (audit + rewrite)	4–8 hours (LLM reads the regulation diff, updates the reviewer prompt)	Diff-and-template is mechanical
Time to scaffold a new project + wire archetype + first feature plan ready	~1 week (kickoff meetings + arch doc)	~12 minutes (`npx great-cto init` + `/start "feature"`)	All of step zero is automated

What does NOT speed up

This is the section that keeps the rest of the article honest. Several critical-path items don't speed up at all, and pretending they do is exactly the kind of marketing the article opens by complaining about:

External audit cycles still take their natural time. A NYC LL 144 bias auditor still takes 2–4 weeks. An FDA pre-submission meeting cycle is still 60–90 days.
IRB approval for a clinical trial still takes 2–3 months. You can prepare the package in a day; you can't make the IRB committee meet faster.
Regulator meetings still need to be scheduled. RBI, CBN, BSP, FDA – those are calendars, not API endpoints.
Wet-lab validation for a drug-discovery hit is still a real biology experiment that takes weeks regardless of how smart the model was that predicted the target.
HARA signoff by a licensed safety engineer is a single calendar moment. The pack generates the package in 3 hours; the human still owns the date on which they sign it.

The pattern: internal work compresses 5–25×. External-dependency work does not. A useful mental model is: "the LLM accelerates everything that's purely you and your codebase. Everything that requires another human or organization to commit time runs at human speed."

Why this matters for runway

An early-stage AI startup typically has 18–24 months of runway. If your MVP cycle drops from 3 months to 6 weeks, you don't just save money – you double the number of pivots you can survive. Three pivots become six. Two product directions become four. The number of bets you can hedge per year doubles.

For a category-defining startup hunting for product-market fit, that's the difference between catching the wave and missing it.

What this actually changes

The 63 % saving and the $1.6M → $400K end-to-end number are fine, but they hide a deeper shift that's harder to put in a table. Three things have changed about how compliance work gets done – and none of them are "AI is magic."

1. Deep domain expertise has stopped being a moat. Pipeline design has become one.

For twenty years, the answer to "how do we ship compliantly into healthcare or fintech or voice-AI" has been "hire someone who's done it before." A senior fintech-compliance engineer with five years post-CFPB experience commands $300K+ because she knows where the bodies are buried. Same for the FDA SaMD consultant at $600/hour. Same for the ISO 10218 functional-safety lead.

That moat hasn't disappeared, but it's been partially commoditized. The regulation text is publicly available. Mapping it to a stack is mechanical. Drafting a threat model is mechanical. The LLM reads 200 pages of 21 CFR Part 11 faster than any human can think about it, and produces a first-pass mapping that's 70–80 % correct.

What hasn't commoditized: judgment about edge cases, jurisdictional interpretation, regulator relationships, the ability to defend an audit finding on the phone. Those still cost money. But the bulk of "I need someone senior who's done this before" gets replaced by "I need someone who can read what the pack wrote and decide if it matches the product reality." That's a different – and much cheaper – hire.

The new moat is pipeline design: how the gates wire together, what evidence the agent extracts, which artifacts persist, how the human-in-the-loop handoffs work, when the auditor sees the trail. That moat is much more accessible to a small startup than the hire-three-senior-experts moat ever was.

2. Agents bend the cost-per-feature curve. Hard.

Look past the 63 % saving for a second. The number that actually matters: calendar time for a founder shipping into all ten industries goes from ~21 months to ~7 months. That's not a marginal improvement. That's the difference between "this startup runs out of runway before we ship" and "we ship."

For a Series B fintech that previously needed two senior compliance engineers, the math becomes: one senior compliance person plus the pipeline replaces three to four senior compliance people. Not because the LLM is smarter than your senior person. The LLM is much faster at reading; your senior person is much smarter about judging. The bottleneck moves from "have we read the regulation" to "have we mapped it correctly to OUR product." The first question is mechanical. The second is a real skill.

Shipping rate doubles. Then it doubles again when the same compliance person works across multiple regulated industries – which historically required multiple specialists.

3. The work didn't disappear. It moved up the value chain.

This is the part most "AI replaces engineers" takes get wrong. The work doesn't go away. It redistributes:

The 200-hour "read the regulation and write the threat model" task → ~3 hours of human review of the LLM's output
The 80-hour "wire the gate definitions into CI" task → 30 minutes of operator approval of the generated gate set
The 60-hour "map every requirement to an evidence artifact" task → 4 hours of human verification per audit cycle
The work that's gone: mechanical reading + templating
The work that's MORE valuable now: judgment, jurisdictional interpretation, regulator relationships, defending an audit finding, and – separately – the production code itself

This is collaboration, not replacement. The LLM does the parts where speed beats judgment. The human does the parts where judgment beats speed. The auditor signs the certificate. The engineer ships the code.

The compliance expert of 2027 is someone who knows which regulation applies in which jurisdiction and can run a pipeline to do the reading + templating for them. Same depth of judgment, five times the productivity. That person is going to eat the market share of the compliance expert who still bills by the hour for reading regulation text.

Honest disclaimers

Same as the last article and they always apply:

Packs do not certify you. Reviewer agents produce evidence; humans sign.
Packs do not replace lawyers. Reviewers encode commonly accepted readings of each regulation. Jurisdictional interpretation is your GC's job.
Packs cover 70–80% of each industry's gate surface. The edges need overrides – fork the reviewer prompt at agents/{slug}-reviewer.md.
The dollar amounts above are real published consulting-rate ranges plus my own historical project costs. Your mileage will vary depending on industry, geography, and how much paperwork the regulator demands that month.

And while we're being honest: I once asked the senior-dev agent to refactor a retry loop. It introduced a different retry bug. The qa-engineer agent reviewed and approved. The code-reviewer agent reviewed and approved. The bug shipped, l3-support caught it in production three hours later. continuous-learner now carries a lesson called "a committee of agents can rubber-stamp the same blind spot." Painfully on-brand for an article about review pipelines.

If you want to try it: npx great-cto init. MIT. Runs locally. Pay your own LLM API. Source: github.com/avelikiy/great_cto. Pack pages with company catalogues: greatcto.systems/packs.

TL;DR

I built ten compliance packs because I got tired of paying $40-200K per industry, per startup, for the first version of the compliance setup.
Each pack: triggers on industry signals, attaches 1–4 specialist reviewer agents, wires named human gates, ships an evidence-artifact framework.
Compliance is roughly 10–25 % of MVP cost. The packs reduce the compliance slice end-to-end by ~63 % median (across all 10 industries: ~$955K → ~$366K). Calendar time on compliance: ~21 months → ~7 months.
The bigger picture is the full MVP, not just the compliance slice. Realistic numbers for a voice-AI MVP: traditional team (1 PM + 4 engineers, ~3 months) ≈ $287K. With agentic SDLC + voice-pack (1 PM + 2 engineers + agents, ~6–8 weeks) ≈ $128K. LLM cost across the entire MVP: ~$500–$1,500.
MVP saving with agents + packs: ~55 % on cost AND ~40–50 % on wall-clock time. The compliance pack contributes ~$20K of that – the rest comes from agents helping across architecture, backend, frontend, DB, QA, security review, and deployment.
Per-feature time drops ~10×. Plan → code → review → QA → deploy collapses from ~3–5 days to ~3–5 hours. The compliance archetype-update cycle goes from 1–2 weeks to 4–8 hours. New project scaffolding goes from ~1 week to ~12 minutes.
What does NOT speed up: external audits, IRB approval, regulator meetings, wet-lab validation, HARA signoff cycles. Anything that requires another organization to commit calendar time runs at human speed.
The runway implication: if your MVP cycle drops from 3 months to 6 weeks, you double the number of pivots you can survive within the same 18–24 month runway. That's existential for a category-defining startup.
You still need engineers. Headcount drops 4 → 2. It does not drop to 0. The startup that ships an MVP with no humans in 2026 doesn't exist.
What the packs do replace: regulation-reading + threat-modeling + evidence-templating – the mechanical part. They do not replace production code, third-party audits, regulator meetings, legal review, or human signatures on safety-critical gates.
The bigger shift hidden in the numbers: deep domain expertise has stopped being a moat – pipeline design has become one. Reading 21 CFR Part 11 is now mechanical (LLM-fast). Mapping it to your stack with judgment is still a senior skill. The hire profile changes; the headcount drops.
The compliance expert of 2027 knows which regulation applies AND runs a pipeline to do the reading. Same depth of judgment, 5× the productivity. That person eats the market share of the one who still bills by the hour for reading regulation text.
If your industry isn't in the ten above, three more land Q3: gov-public-pack, insurance-actuarial-pack, edtech-coppa-pack. PRs welcome.

About: I build GreatCTO – a multi-agent SDLC plugin for Claude Code. MIT, runs locally. Twitter: @avelikiy. GitHub: @avelikiy/great_cto.

Top comments (1)

Kyle Carriedo • May 17

The "5-7 agents fire per PR, reviewers in parallel with aggregated verdicts" pattern is the part most readers will under-appreciate — it's not the agent count that buys the 3-day-to-3-hour win, it's that the reviewers run concurrently and the aggregation is mechanical rather than another agent call.

A couple of things I'd be curious to hear more about, having run similar pipelines:

Verdict aggregation contract. When 5 reviewers fire in parallel and one returns "block: security risk" while another returns "approve with notes," how does aggregation actually happen — strict-veto, threshold, weighted-by-pack? The choice of aggregator matters a lot for the perceived reliability of the pipeline. Strict-veto is safer but creates a long tail of "one reviewer was flaky and held up the PR for 40 minutes." Threshold lets you ship faster but moves the safety question to "which reviewers count, and who decides."
What's the observability layer when 34 agents are in motion? The hardest thing about running this kind of fleet in our experience isn't "did it finish" — it's "which agent's output am I actually looking at when something goes sideways." A single failed reviewer in a 7-reviewer parallel fan-out can take the whole PR off the rails, and surfacing which one (and why) typically requires a transcript layer the agents themselves can't write to.
Human gate placement. Two gates per feature is interesting — most fleets we've seen converge on either zero gates (full auto, accept the cost of post-merge cleanups) or four-plus (PM/architect/reviewer/QA, each blocking). Where do your two land in the pipeline? PM-approval + final-merge-approval is one common split; architect-design + post-QA is another. The choice changes which kinds of errors the gates catch.
Cost framing — $500–$1,500 per MVP is honest and useful. The thing readers typically miss when they try to replicate is that the per-feature cost isn't the agent runs themselves, it's the re-runs when one of the parallel reviewers hits a flaky failure and the orchestrator restarts the fan-out. Concurrency multiplies LLM cost when the retry policy is "restart the whole stage," which it often is by default.

The compliance-pack-auto-attach model (lending, robotics, drug-discovery) is the most interesting governance pattern in the post — it's effectively domain-specific RAG over reviewer policy, which is probably where this whole space ends up.