For a year GreatCTO was an engineering-process engine: agents, gates, reviewers, compliance packs. Good product. Wrong headline.
Here's the thing we kept observing: the people who got the most value weren't buying "a better SDLC." They were buying the outcome of a business function — claims coded, contracts reviewed, invoices matched, taxes filed. The pipeline was the means.
So in v2.40 we said it out loud: GreatCTO ships AI autopilots for business. Products that sell the outcome of a service, not a tool to a specialist. Packs, reviewers and gates didn't go anywhere — they became the under-the-hood trust layer instead of the headline.
What an autopilot actually is
A flow. One file per vertical — flows/<vertical>.flow.json — the single source of truth that renders the CLI behavior, the runtime, and the landing page from the same data:
- steps — intake → process → decide → deliver, each tagged with the agent, the tools, and whether a human signs it
- connectors — the real-world integrations the steps call
- gates — where a named, licensed human signs before the flow continues
- owner — one accountable person who answers for what the autopilot does
The four autopilot invariants are machine-checkable (autopilot-gate.mjs): judgment boundary (confidence → escalation), accuracy-as-SLA, per-decision audit trail, per-outcome unit economics. Not a manifesto — a validator that exits 1.
6 → 16 → 25 verticals
We started with six (legal docs, medical coding, procurement, accounting, managed IT, tax). Then the expansion criterion clicked: a vertical is a fit when it pairs a large displaceable-labor pool with a legally-required named human who signs the risky call. That's the exact shape the safety engine is built for.
Ten more landed in v2.44 — prior-auth ($35–56B), KYC/AML ($61B), managed SOC, insurance claims (~$36–38B), mortgage underwriting, title & escrow, provider credentialing, collections, freight brokerage, clinical-trial ops. Then immigration, appraisal, payroll, workers-comp, estate planning, patent prosecution. Twenty-five total, every one shipping green on --validate.
Each carries its own compliance reviewer: False Claims Act + NCCI for coding, OFAC + BSA for AML, FDCPA + Reg F for collections, Circular 230 + §7216 for tax, FMCSA for freight. The regulation is a step in the flow, not a PDF you read later.
"Live" means live
A flow that calls mocked connectors is a demo. By v2.45, all verticals exercise at least one live connector — 17 live in the catalog, keyless by default (deterministic real logic or a curated public slice), switching to the real provider the moment you add a credential.
A few favorites:
- um-criteria (prior-auth) — CMS NCD/LCD-style medical-necessity matching that never auto-denies. Missing criteria escalates to the medical director. By design, not by prompt.
- sar-filing (AML) — generates a FinCEN SAR, and the filing is blocked without the BSA Officer's signature.
- comms-outreach (collections) — FDCPA/Reg F 7-in-7, TCPA, and the 8am–9pm window enforced as ALLOW/BLOCK per contact.
- primary-source (credentialing) — OIG LEIE / SAM exclusion screening as a hard block, plus a real NPI Luhn check.
The permission is never the wound
The scariest failure mode of an agent isn't going rogue. It's doing exactly what it's permitted to do, irreversibly, at machine speed, with no human hesitation. (Hat tip to Oleksandr Torlo's essay "The Permission Was the Wound.")
v2.43 made the boundary a runtime invariant, not a convention:
- Every flow step is tagged
reversibleor not, with a blast radius. Money moves, claim submission, e-signing, tax filing — irreversible. - The runtime refuses to execute an irreversible step autonomously. No prior human gate →
blocked-unsafe. Gate present → the step runs only after it's signed. -
validateFlow()enforces it statically: irreversible ⟹ preceded by a human checkpoint, and every autopilot names an accountable owner. All 25 verticals ship green.
The autopilot does the volume. The point of no return always waits for a person.
Quality is earned, not declared
Every vertical gets a 0–100 scorecard: seven weighted dimensions, golden + adversarial cases run through the reviewer with an LLM judge, and a regression gate so a score can't silently decay. Two measure→improve→re-measure cycles took legaltech from 85 to 94.75 and msp from 78 to 98.5.
If we're going to claim an autopilot can hold a function, the claim should be a number someone measured — and a gate that fails CI when it stops being true.
Where this leaves you
npx great-cto init, name the function, and you get the flow — agents, connectors, human checkpoints, the compliance pack for your domain. The pipeline that built features for a year now runs business functions, with the same receipts: all 25 autopilots, each with its flow, gates, and live-connector badges.
Next post: what happens after the flow pauses — the operator console where a human actually signs.
Top comments (0)