The operator console: where the autopilot's work waits for a signature

#ai #autopilots #hitl #product

Last post ended with the autopilot pausing at a human checkpoint. Pausing is easy — any workflow engine can stop. The hard questions are operational: where does the case wait, who is allowed to sign it, what do they see before signing, and what happens when the write fails at 2am?

That's what we built through v2.46–v2.63: the operator console. great-cto board → /autopilot.html. It's the Operate-mode surface — the app for the licensed humans the flow escalates to, not for the engineer who wired it.

Durable runs: the signature crosses a process boundary

A run persists to disk and survives restarts. startRun advances the flow to the gate and parks it as awaiting-approval; approve(id, who) resumes it and executes the irreversible write; reject ends it with nothing irreversible run. Every transition appends to an immutable audit trail.

The v2.43 safety invariant now holds end to end: the 837 claim is submitted only because a coder signed its protecting gate — provable across a process boundary, because the approve happens in a different process than the start.

We demonstrated it on medical coding live: intake → code → NCCI edits (three live connectors) → pause → the coder signs in the inbox → the claim goes out → completed. The reject path submits nothing.

Flows can require several signatures in sequence. Tax needs two: the preparer signs with their PTIN, then the taxpayer signs Form 8879 — the IRS e-file fires only after both. The board pushes a notification to the signer the moment a gate opens.

What the signer actually sees

A queue, then a case drawer. The drawer carries everything a decision needs in one panel:

The decision criteria — the SOP this case is judged against
Evidence, connector by connector — exactly what each integration found, with its live/stub flag and per-call latency
An AI-drafted determination — a templated rationale composed from the evidence, reviewed before signing
The audit trail — tamper-evident, with a "✓ verified" badge

Signing an irreversible write opens a signature ceremony: an alert dialog that names exactly what will execute — the gated step, its blast radius, the gate protecting it — and requires explicit confirmation. No one "accidentally approves" a wire transfer because the button was where their cursor happened to be.

And because humans override machines (that's the point), overrides are logged: sign against the AI recommendation and the divergence is recorded — case, recommendation, decision, who. Your regulator will ask. Now there's an answer.

The routing dial

Not every case deserves a human minute. Admin Settings sets a per-tenant confidence floor: a low-confidence approve is downgraded to escalate, and clean high-confidence cases are flagged auto-eligible. The dial moves as your trust does — start with everything escalated, widen straight-through as the override rate stays flat.

Around the queue, the things an operation actually needs:

Roles — operators sign; admins and compliance-leads see QA and Ops; invite links are scoped, with email invites and an impersonation banner when acting via a token
Smart views — All · Auto-eligible · Escalated · SLA at-risk · High blast, with SLA-aware sort and regulatory-deadline clocks on each case
QA sampling — a deterministic ~20% of closed cases lands in a QA queue to be scored 1–5; results land on the run, the audit, and Analytics
Bulk actions — multi-select (or "select auto-eligible") → approve / reject / escalate with a reason, RBAC-checked per case
Keyboard-first — ⌘K palette, j/k queue cursor, a/r/e/b decisions, ? cheatsheet

The Ops tab: because writes fail

The least glamorous tab is the one that earns the trust. For admins and compliance-leads:

KPI tiles — runs, connector calls, estimated cost, average latency, retries, over-budget, dead-letters
Dead-letter queue — every failed post-gate write with its connectors and error, and a one-click ↻ Requeue that re-runs the write and recovers the run to completed. An off-tab badge makes a stuck write visible without clicking.
Connector health — per-connector 🟢/🔴, call count, failure rate, p95 latency, last error
Metering by industry — per-vertical runs / calls / latency / cost, sorted by spend

Retries never double-submit: an idempotency key, stable per run, is threaded into every write.

Enterprise polish, measured

v2.63 was a full UI/UX pass, and we held it to numbers rather than adjectives:


Accessibility	WCAG 2.2 AA — axe-core: 0 violations, all tabs, both themes
Themes	light/dark (`prefers-color-scheme` + persist), white-label accent per tenant
Realtime	SSE pushes a change the instant any run mutates — console, CLI, or webhook
Scale	render cap keeps 500+ case queues smooth
Reliability	durable-runtime e2e across all 25 verticals (start → gate → sign → write), 348/348 lib tests

Multi-tenant scoping means an operator sees only their tenant's queue. Cases export to CSV, because the auditor's tooling is Excel and pretending otherwise helps no one.

Why this matters

"Human in the loop" is usually a checkbox in a pitch deck. Operationally it's a product: an inbox with SLA clocks, a drawer with evidence, a ceremony for the point of no return, override logs, QA sampling, and a dead-letter queue for the night the provider's API was down.

That product is what makes it safe to let the autopilot run the volume. Try it: npx great-cto init, then great-cto board. Screenshots on the landing; the run store, runtime, and console are all in the repo.