June under the hood: the board becomes a pult, prompts evolve behind a holdout gate, logs shrink 99.5%

#ai #claude #devtools #architecture

The last two posts were about the pivot — autopilots, live connectors, the operator console. This one is about the engine room: four upgrades that shipped in the same June sprint and that you'd otherwise only discover by reading the changelog. Users keep telling us they don't read the changelog. Fair.

1. The board is now a pult, not a mirror

Until v2.64 the dev board showed you the pipeline: tasks, gates, costs. To act on anything you went back to the terminal.

Now approving a gate (or pressing Run) spawns a Claude Code agent headlessly in the project and streams its output into the board — assistant text, tool calls, result, parsed from stream-json and pushed over SSE. There's a Run-agent panel with a prompt field and a live stream, and an Approve + ▶run button right on the gate card. Approve the plan, watch the implementation start, without touching a terminal.

Running an autonomous agent that edits files from a web page is exactly as dangerous as it sounds, so the guardrails came first:

same-origin only, and the project must live under $HOME
one run per project — a second Run gets a 409
hard timeout (SIGTERM → SIGKILL), 2000-line ring buffer, child stdin closed
permission mode defaults to acceptEdits — full autonomy is an explicit opt-in env var, never the default

Verified end-to-end with a stub binary (all four guardrails, Stop button) and a real claude run.

2. Prompts now have to prove they got better

Every agent in GreatCTO learns from lessons. The uncomfortable question: when the system rewrites an agent's prompt based on a lesson, who checks the rewrite didn't make it worse?

v2.37 closed the loop, porting the generate→evaluate→gate cycle from hexo-ai/sia:

Eval cases split into tuning (visible to the prompt-improver) and holdout (gate-only, anti-overfit)
A promotion gate blocks any candidate prompt that regresses on the holdout split — exit codes, not vibes
/prompt-evolve runs lesson → candidate → holdout gate → PROMOTE/REJECT, with a per-agent generation ledger you can audit
Each agent gets a generational changelog: which lesson, what held-out delta, full provenance

A learned improvement can no longer ship until it's re-proven on cases it never saw. The same loop later gated the compression layer below — turtles all the way down, but each turtle is tested.

3. Context compression: 31,475 chars of CI log → 155

Agents read logs, test output, JSON dumps. Most of it is repetition. v2.38 added a compression layer — deterministic, $0, no LLM, no native deps, concepts borrowed from chopratejas/headroom:

Input	Result
CI log	31,475 → 155 chars (−99.5%), FATAL/ERROR/stacks kept verbatim
JSON	−43% minified, −98% with array crush
Noisy test run	−86%, the FAIL preserved

The part that makes aggressive compression safe: CCR — Compressed Context with Retrieval. Anything dropped is stored locally, content-addressed, and recoverable on demand; the memory filter appends a recall footer listing what it filtered. Lossless-on-demand. And a fidelity eval (through the v2.37 holdout gate, naturally) ensures a compressor only ships if the key fact survives.

l3-support compresses logs and qa-engineer compresses test output before reasoning — fewer tokens spent re-reading the same stack trace twelve times.

4. Scope creep is now caught mechanically

The classic agent failure: asked to fix the webhook, also "improved" the auth module. v2.39 added governance inspired by NaCl, all machine-checkable at $0:

impl-brief per task — files-to-modify allowlist, files-NOT-to-modify denylist, API contract, test spec. senior-dev refuses to commit out of scope; a denylist hit is a hard fail, override only via a signed exception
/trace — requirement → use-case → task → test traceability for impact analysis and coverage gaps
gap-closure waves — adopt strict gates on a legacy repo incrementally: criticals never deferred, every deferred gap held by a signed, expiring exception. Never a silent bypass.

Also in June

Fable 5 support — agent-model: fable pins every managed agent to Claude Fable 5; the board's agent runner passes the model through verbatim.
CSRF guard on the board — cross-origin mutations now 403. A malicious page can no longer POST to your localhost and approve a gate. (Found by our own /audit, fixed the same day.)
The pre-push hook can no longer hang a push, and gate-approve survives GUI-launched shells with a minimal PATH.

All of it: open source, MIT, zero telemetry, github.com/avelikiy/great_cto. The full gory detail lives in the CHANGELOG — but now you don't have to read it.