Beni

Posted on Mar 26

What We Actually Ship With MissionControl

#ai #agents #typescript #devops

What We Actually Ship With MissionControl

Two days. Twenty-one commits. English in, pull requests out.

If you're joining mid-series: Post 1 covered the 16-hour build — Telegram bot in, pull requests out, ports-and-adapters architecture. Posts 2 and 3 were the bug safari that followed, including a $5.84 task that produced zero useful work and forced a rethink of the entire trust chain. This is what the system looks like after surviving all of that.

The Interface

MissionControl runs as a Telegram bot. No web UI. No dashboard. You message it, it does work, it messages you back.

Every interaction fits in a chat bubble. Send a task from your phone while walking the dog, get a PR link back before you're home. That constraint — everything must fit in a Telegram message — turned out to be a feature, not a limitation.

The full command set:

/task <description> — Queue a task against the default project. The agent picks it up, creates a branch, does the work, pushes, and opens a PR.
/task slug: <description> — Target a specific project.
/status — Current running task, queue depth, recent completions.
/cancel <id> — Kill a running task or remove a queued one.
/retry <id> — Re-queue a failed or cancelled task.
/logs <id> — Tail the last 50 lines of a task's execution log.
/budget — Today's spend, remaining daily budget, per-task breakdown.
/addproject <slug> <path> --github owner/repo — Register a repo. Auto-chowns .git/ for the worker user.
/create <slug> — Bootstrap a new project directory, init git, create the GitHub repo, register it.
/rmproject <slug> — Unregister and clean up.

No context switching. No browser tabs.

The Numbers

Twenty tasks in the first 48 hours across three projects.

Metric	Value
Tasks created	20
Completed	4
Failed	13
Cancelled	2
Running	1
Total spend	$12.49
Avg cost (completed)	$2.95
Most expensive	$5.84 (Task #19 — did nothing)
Cheapest success	$1.49 (Task #5 — tier enforcement)

20% completion rate. Looks bad. It's not. Tasks 1 through 4 all failed on the same CLI spawn issue — the zero-stdout bug from Post 2. Four retries of the same failure before we understood it. After the bug fix sprint, the completion rate on new tasks jumped to roughly 50%. The remaining failures: budget timeouts and permission issues on freshly registered projects. Not systemic.

The number that matters: four completed tasks produced working code, passing builds, and merged PRs. One of them — a fitness trainer dashboard — was a full-stack Next.js app with auth, data visualization, and a PostgreSQL backend. Built autonomously. $2.00.

The Safety Stack

Every layer here exists because we shipped without it and something broke.

Budget caps. $50/day global. $5 per task default, configurable up to $10. Checked before the task starts and enforced by the CLI's own --max-budget-usd flag. Task #19 — the $5.84 zero-work disaster from Post 3 — proved that budget enforcement alone isn't enough. You also need to verify the agent actually produced something.

Timeouts. 30-minute soft limit, then a 5-minute grace period. Soft limit sends SIGTERM. Grace lets the agent wrap up and commit. After grace, SIGKILL. A separate kill timer 60 seconds post-SIGTERM ensures nothing lingers. Opus on a 2-core box analyzing a large codebase can burn 15 minutes just planning. Learned that the hard way.

Orphan cleanup. On process restart, any task stuck in running state gets reset to queued. Without this, a single PM2 restart freezes the entire queue. Sounds obvious in retrospect. Wasn't obvious at 2 AM.

Commit verification. git rev-list --count main..HEAD — if zero, the task failed. No exceptions. The agent's self-assessment ("I completed the task successfully!") is advisory, not authoritative. We do not trust the agent's opinion of its own work.

Uncommitted work rescue. Before any branch cleanup: git status --porcelain. If dirty, git add -A && git commit -m 'WIP: auto-rescue'. Catches work the agent did but didn't commit — timeouts, crashes, the agent forgetting to stage files. Happens more often than expected.

Force checkout fallback. The finally block tries normal checkout first, then force checkout. A dirty working tree from a crashed task can't deadlock the next one.

Session isolation. --no-session-persistence on every CLI spawn. Every task starts clean. No stale context, no ghost sessions bleeding between runs.

Roughly sixty lines of verification and fallback logic. Least interesting code in the project. Most important.

The Architecture

Ports and adapters. Same as day one. Three boundaries:

MessagingPort — Telegram today. The interface is sendMessage(chatId, text). A Slack adapter would take an afternoon — that's the whole point of the pattern.
WorkerPort — Claude CLI today. Spawns the agent with JSON output, budget caps, tool restrictions. Could be swapped for any agent runtime that accepts a prompt and returns structured output.
VCSPort — GitHub today. Creates PRs, manages branches. Git operations happen locally through a sudo wrapper that runs everything as the sandboxed worker user.

The core — TaskRunner, TaskService, BudgetService — knows nothing about Telegram, Claude, or GitHub. It processes tasks, enforces budgets, delegates execution. That separation already paid off: changed how the CLI gets spawned twice in two days. Nothing else in the system noticed.

State lives in SQLite via better-sqlite3. One file, no server, backed up by PM2's process management. Good enough for a single-operator system. Would need Postgres if this ever went multi-user.

What's Next

Three things on the roadmap, in priority order.

Crash recovery. If a task gets interrupted mid-work — server reboot, PM2 restart, OOM kill — it gets requeued from scratch. The branch exists with partial commits, but the retry starts a fresh conversation with no memory of what came before. Want to detect partial work on the branch and pass it as context: "Here's what you did before you were interrupted. Continue from commit X." This alone could cut the failure rate in half.

Slack adapter. Telegram works for a solo operator. Slack is where teams live. The MessagingPort interface is already clean — sendMessage and onCommand — so a Slack adapter that maps slash commands to the handler interface would open this up to team use without touching the core.

Issue watcher. Auto-queue tasks from GitHub issues. Label an issue mc-auto, MissionControl picks it up, creates a task, links the PR back to the issue. The scaffolding is already in the codebase. Needs a token scope update and it's live.

Should This Be Open Source?

Still deciding. The system is opinionated — single operator, Telegram, Claude CLI, GitHub — but the architecture is portable. Swap any layer without touching the core.

The bugs we found and fixed aren't novel. Stale sessions, permission boundaries, output verification, budget enforcement — every agent builder will hit these. Shipping the fixes as a reference implementation could save other builders the same $5.84 lessons.

No decision yet. Building something similar? Reach out.

The Closing Count

Two days of building. Two days of debugging. Twenty-one commits on main. Twenty tasks processed. Four successful PRs merged. $12.49 spent.

One system that takes English descriptions from a Telegram message and turns them into branches, commits, and pull requests — with budget caps, timeout enforcement, commit verification, and session isolation.

It breaks. We fix it. It breaks differently. We fix that too. The difference between "AI agent demo" and "AI agent that ships code" is those sixty lines of verification and fallback logic that nobody shows in the demo.

MissionControl isn't done. But it works. And it works because of everything that broke.

Next up: Post 5 — the bot builds a full MVP, deploys it to production, then tells us it failed.

DEV Community

What We Actually Ship With MissionControl

What We Actually Ship With MissionControl

The Interface

The Numbers

The Safety Stack

The Architecture

What's Next

Should This Be Open Source?

The Closing Count

Top comments (0)