Jonathan Philipos

Posted on Feb 8

I built codex-monitor so I could ship code while I slept

#ai #opensource #llm #programming

The problem nobody talks about

AI coding agents are incredible. Copilot, Codex, Claude Code — they can write features, fix bugs, create pull requests. The pitch is simple: point them at a task, walk away, come back to shipped code.

Except that's not what actually happens.

What actually happens is you come back 4 hours later and discover your agent crashed 3 hours and 58 minutes ago. Or it's been looping on the same TypeScript error for 200 iterations, burning through your API credits like they're free. Or it created a PR that conflicts with three other PRs it also created. Or it just... stopped. No error, no output. Just silence.

I got tired of babysitting.

What I built

codex-monitor is the supervisor layer I wished existed. It watches your AI agents, detects when they're stuck, auto-fixes error loops, manages the full PR lifecycle, and keeps you informed through Telegram — so your agents actually deliver while you sleep.

npm install -g @virtengine/codex-monitor
cd your-project
codex-monitor

First run auto-detects it's a fresh setup and walks you through everything: which AI executors to use, API keys, Telegram bot, task management — the whole thing. After that, you just run codex-monitor and it handles the rest.

The stuff that makes it actually useful

1. It catches error loops before they eat your wallet

This was the original reason I built it. An agent tries to push, hits a pre-push hook failure — lint, typecheck, tests — tries to fix it, introduces a new error, tries to fix that, reintroduces the original error... forever. I've seen agents burn through thousands of API calls doing this.

codex-monitor watches the orchestrator's log output — the stdout and stderr that flow through the supervisor process. It doesn't peek inside the agent's sandbox or intercept what they're writing in real time. It just watches what comes out the other end. When it sees the same error pattern repeating 4+ times in 10 minutes, it pulls the emergency brake and triggers an AI-powered autofix — a separate analysis pass that actually understands the root cause instead of just throwing more code at it.

2. Live Telegram digest (this one's my favorite)

Instead of spamming you with individual notifications, it creates a single Telegram message per 10-minute window and continuously edits it as events happen. It looks like a real-time log right in your chat:

📊 Live Digest (since 22:29:33) — updating...
❌ 1 • ℹ️ 3

22:29:33 ℹ️ Orchestrator cycle started (3 tasks queued)
22:30:07 ℹ️ ✅ Task completed: "add user auth" (PR merged)
22:30:15 ❌ Pre-push hook failed: typecheck error in routes.ts
22:31:44 ℹ️ Auto-fix triggered for error loop

When the window expires, the message gets sealed and the next event starts a fresh one. You get full visibility without the notification hell.

You can also just... talk to it. More on that next.

3. An AI agent at the core — controllable from your phone

codex-monitor isn't just a passive watcher. There's an actual AI agent running inside it — powered by whatever SDK you've configured (Codex, Copilot, or both). That agent has full access to your workspace: it can read files, write code, run commands, search the codebase.

And you talk to it through Telegram.

Send any free-text message and the agent picks it up, works on it, and streams its progress back to you in a single continuously-edited message. You see every action live — files read, searches performed, code written — updating right in your chat:

🔧 Agent: refactor the auth middleware to use JWT
📊 Actions: 7 | working...
────────────────────────────
📄 Read src/middleware/auth.ts
🔎 Searched for "session" across codebase
✏️ src/middleware/auth.ts (+24 -18)
✏️ src/types/auth.d.ts (+6 -0)
📌 Follow-up: "also update the tests" (Steer ok.)
💭 Updating test assertions for JWT tokens...

If the agent is mid-task and you send a follow-up message, it doesn't get lost. codex-monitor queues it and steers the running agent to incorporate your feedback in real time. The follow-up shows up right in the streaming message so you can see it was received.

When it's done, the message gets a final summary — files modified, lines changed, the agent's response. All in one message thread. No notification hell, no scrolling through walls of output.

Built-in commands give you quick access to the operational stuff: /status, /tasks, /agents, /health, /logs. But the real power is just typing what you want done — "fix the failing test in routes.ts", "add error handling to the payment endpoint", "what's the current build status" — and having an agent with full repo context execute it on your workspace while you're on the bus.

4. Multi-executor failover

You're not limited to one AI agent. Configure Copilot, Codex, Claude Code — whatever you want — with weighted distribution. If one crashes or rate-limits, codex-monitor automatically fails over to the next one.

{
  "executors": [
    { "name": "copilot-claude", "executor": "COPILOT", "variant": "CLAUDE_OPUS_4_6", "weight": 40 },
    { "name": "codex-default", "executor": "CODEX", "variant": "DEFAULT", "weight": 35 },
    { "name": "claude-code", "executor": "CLAUDE", "variant": "SONNET_4_5", "weight": 25 }
  ],
  "failover": { "strategy": "next-in-line", "maxRetries": 3, "cooldownMinutes": 5 }
}

Or if you don't want to mess with JSON:

EXECUTORS=COPILOT:CLAUDE_OPUS_4_6:40,CODEX:DEFAULT:35,CLAUDE:SONNET_4_5:25

5. Smart PR flow

This is where it gets interesting. When an agent finishes a task:

Pre-Commit & Pre-Push hooks validate that there are no Linting, Security, Build, or Test failures with strict stops.
Check the branch — any commits? Is it behind the set upstream (main, staging, development)?
If 0 commits and far behind → archive the stale attempt (agent did nothing useful)
If there are commits → auto-rebase onto main
Merge conflicts? → AI-powered conflict resolution
Create PR through the task management API
CI passes? → merge automatically

Zero human touch from task assignment to merged code. I've woken up to 20+ PRs merged overnight.

6. Task planner

You can go a step further, and configure codex-monitor to follow a set of instructions to analyze a specification versus implementations, and identify gaps once the backlog of tasks has run dry - thus able to identify new gaps, problems, or issues in the implementations versus what the original specification and user stories required.

6. The safety stuff (actually important)

Letting AI agents commit code autonomously sounds terrifying. It should. Here's how I sleep at night:

Branch protection on main — agents can't merge without green CI (github branch protection). Period.
Pre-push hooks — lint, typecheck, and tests run before anything leaves the machine. No --no-verify.
Singleton lock — only one codex-monitor instance per project. No duplicate agents creating conflicting PRs.
Stale attempt cleanup — dead branches with 0 commits get archived automatically.
No Parallel Agents working on the same files — The orchestrator detects if a task would conflict with another already running task, and delays its execution.
Log rotation — agents generate a LOT of output. Auto-prune when the log folder exceeds your size cap.

The architecture (for the curious)

cli.mjs ─── entry point, first-run detection, crash notification
    │
config.mjs ── unified config (env + JSON + CLI flags)
    │
monitor.mjs ── the brain
    ├── log analysis, error detection
    ├── smart PR flow
    ├── executor scheduling & failover
    ├── task planner auto-trigger
    │
    ├── telegram-bot.mjs ── interactive chatbot
    ├── autofix.mjs ── error loop detection
    └── maintenance.mjs ── singleton lock, cleanup

It's all Node.js ESM. No build step. The orchestrator wrapper can be PowerShell, Bash, or anything that runs as a long-lived process — codex-monitor doesn't care what your orchestrator looks like, it just supervises it.

Hot .env reload means you can tweak config without restarting. Self-restart on source changes means you can develop codex-monitor while it's running (yes, it monitors itself and reloads when you edit its own files).

What I learned building this

AI agents are unreliable in exactly the ways you don't expect. The code they write is usually fine. The operational reliability is where everything falls apart. They crash. They loop. They create PRs against the wrong branch. They push half-finished work and go silent. The agent code quality has gotten genuinely good — but nobody built the infrastructure to keep them running.

Telegram was the right call over Slack/Discord. Dead simple API, long-poll works great for bots, message editing enables the live digest feature, and I always have my phone. Push notification on my wrist when something goes critical. That's the feedback loop I wanted.

Failover between AI providers is more useful than I expected. Rate limits hit at the worst times. Having Codex fail over to Copilot fail over to Claude means something is always workin

g. The weighted distribution also lets you lean into whichever provider is performing best this week.

Try it

npm install -g @virtengine/codex-monitor
cd your-project
codex-monitor --setup

The setup wizard takes about 2 minutes. You need a Telegram bot token (free, takes 30 seconds via @botfather) and at least one AI provider configured.

GitHub: virtengine/virtengine/scripts/codex-monitor

It's open source (Apache 2.0). If you're running AI agents on anything beyond toy projects, you probably need something like this. I built it because I needed it, and I figured other people would too.

If you've been running AI agents and have war stories about the failures, I'd love to hear them. The edge cases I've found while building this have been... educational.

Top comments (4)

Mykola Kondratiuk • Feb 8

the autonomous PR thing is wild but also makes me nervous lol. I get the appeal - waking up to completed work sounds amazing - but I'd be paranoid about what got merged while I slept. do you have rollback mechanisms? or like, a way to preview what it's about to do before it actually commits? I could see this being super useful for grunt work refactors or test writing, stuff where the constraints are clear. less sure about it for feature work where context matters more than speed. what kind of tasks have worked best for overnight runs?

Jonathan Philipos • Feb 9

I just think building the structures for new projects is the best part about it - before you have to worry about what gets merged as long as you have setup good constraints and guardrails in place to protect bad code getting through.

You can definitely use this for things like: auto-resolving sonarqube alerts, github security alerts, etc. style issues.
You can also use this to continiously monitor certain critical log streams - and identify bugs and plan tasks that resolve them.
You can customize your task planner to actually do things like becoming a tester, give it Playwright or Chrome Devtools access and let it go wild playing around in your web app - or if its CLI its even easier for it to go through and actually test components of the functionality.

You don't need to merge into a main branch, just let it work on a development branch and plan your tasks well. Think of it like having a Jira board that gets magically done by the time you're done with the day. (or start of the day if you leave it overnight)

Mykola Kondratiuk • Feb 12

Haha yeah I get that - the first time mine opened a PR without asking I had this moment of 'wait am I actually comfortable with this?' But tbh the review process is still there, it's just creating the PR not merging it. The nervous feeling goes away after you see a few and realize it's not doing anything wild. Still wouldn't let it touch production without eyes on it though.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.