How a restaurant operator in México built an autonomous AI agent that reads logs, fixes bugs, restarts services, and reports back — all from a Telegram message, using a $100/month plan.
The Problem
I run six production services on a Windows Server 2022 VM in GCP. A WhatsApp bot, a REST API, a CFO fiscal agent, a Telegram dispatcher, a portfolio site, and a monitor. When something breaks at 2 AM, I either wake up to fix it or it stays broken until morning.
I wanted autonomous self-correction. Every solution I found required either:
- An expensive enterprise platform (Salesforce Agentforce, PagerDuty + AI add-ons)
- A complex LangChain/CrewAI setup with its own API key billing
- Or it was a demo that didn't survive contact with production
I already had Claude Code on the Max plan ($100/month). I was using it as my daily coding copilot. Then I realized: claude -p runs Claude non-interactively. It accepts a system prompt. It connects to MCP servers. And it uses my existing session — no API key, no extra cost.
That was the insight that made everything else possible.
The Architecture
The PMO Agent (Project Management Operations Agent) is the only component in my ecosystem capable of modifying code in production. It connects three systems:
ADMIN (Telegram)
!pmo tacos-api: fix the null check in ventas.js
↓
telegram-dispatcher (PM2)
↓
mensajes.db (SQLite WAL — shared queue)
↓
PMO Agent — polls every 10 seconds
↓
claude -p (Plan Max, no API key)
↓
MCP Project Server (Python, 24 tools per project)
↓
Read logs → Edit files → Restart PM2 → Verify → Report
↓
Telegram ← "Fixed. Null check added at line 45. Service online, 0 errors in last 15s."
The key principle: claude -p invokes Claude Code CLI in non-interactive mode. It uses the authenticated Max subscription of the user — the same session that powers your interactive terminal. There is no second API call. There is no token billing. The same $100/month that covers your daily coding covers your autonomous production agent.
The MCP Server: 24 Tools in 6 Categories
Each project gets its own instance of a Python MCP server. Claude connects to exactly the right project's tools — no cross-contamination, no blast radius beyond the target service.
1,064 lines of Python. 38.5 KB. 24 tools.
| Category | Count | Tools |
|---|---|---|
| Read | 4 |
read_file, list_files, search_code, get_project_structure
|
| Write | 4 |
write_file, edit_file, delete_file, create_directory
|
| Git | 6 |
git_status, git_diff, git_log, git_pull, git_commit, git_add
|
| PM2 | 5 |
get_status, view_logs, restart_process, stop_process, start_process
|
| Testing | 2 |
run_tests, check_health
|
| Context | 3 |
read_claude_md, get_dependencies, run_command
|
The edit_file tool is the most critical. It works as a strict search-and-replace: it finds old_text exactly in the file and replaces it with new_text. If old_text is not unique in the file, it fails and asks for more context. This prevents accidental edits. Claude cannot blindly overwrite files — it must identify the exact code it wants to change.
Each MCP server is invoked with --strict-mcp-config, meaning Claude only sees the tools for the target project. Nothing else.
Two Modes of Operation
Mode 1: Admin Instruction via Telegram
You send a message prefixed with !pmo:
!pmo tacos-api: add a GET /health endpoint that returns { status: "ok" }
!pmo bot: change the welcome message to "Hola, qué vas a ordenar?"
!pmo cfo-agent: fix the bug where it doesn't parse DD/MM/YYYY dates
Telegram shortcuts with autocomplete (15 commands) make this feel like a native interface:
/pmo_api → targets tacos-api
/pmo_bot → targets TacosAragon WhatsApp bot
/pmo_cfo → targets cfo-agent
/pmo_telegram → targets telegram-dispatcher
Mode 2: Autonomous Autocorrection
When the monitor detects a repeated error, it enqueues an autocorrect message:
AUTOCORRECT|tacos-api|TypeError: Cannot read property "precio" of undefined at src/routes/ventas.js:45
The PMO Agent picks this up and executes a full repair cycle without any human input:
- Diagnose — reads logs, reads the file with the error, searches the codebase for the pattern
- Analyze — identifies the exact line, classifies severity (CRITICAL/HIGH/MEDIUM/LOW)
-
Fix — applies the minimum change using
edit_file, never refactors, preserves style - Verify — restarts the process, waits 10 seconds, checks logs and HTTP health endpoint
- Report — sends a structured Telegram message (max 4,000 chars)
Example autocorrect report:
AUTOCORRECT [tacos-api] — SUCCESS
Error: TypeError at src/routes/ventas.js:45
Cause: variable "producto" can be null when item is out of stock
Fix: added null check before accessing .precio
Verification: service online, 0 errors in last 15s
Session Context: The Feature Most Builders Miss
Each admin session shares context for 1 hour. Claude remembers everything it has read, every change it has made, every project it has touched — across multiple messages.
10:00 !pmo tacos-api: explain the architecture
→ NEW SESSION (expires 11:00). Claude reads files, responds.
10:05 !pmo tacos-api: add a /health endpoint
→ CONTINUES session (msg #2). Claude already knows the architecture.
→ Edits code, restarts, verifies.
10:12 !pmo cfo-agent: what endpoints does it have?
→ SAME session (msg #3). Claude remembers tacos-api AND reads cfo.
10:20 !pmo tacos-api: write the test for /health
→ SAME session (msg #4). Knows exactly what it built.
11:01 !pmo bot: add RFC validation
→ Session expired. NEW SESSION starts.
Implementation: first call uses --session-id UUID (creates a session on disk). Subsequent calls use --resume UUID (resumes with full context). After 1 hour, UUID expires and a new one is generated.
This is not available in any LangChain tutorial. It's a Claude Code CLI feature that most people using it interactively don't know exists.
The Security Model: 4 Concentric Layers
Autonomous code execution is dangerous. Every layer fails independently.
Layer 1: MCP Server (per project)
- Path traversal blocked —
..cannot escape the project directory - Sensitive files blocked —
.env,*.pem,*.key,credentials.json - Binaries excluded — images, executables, databases not readable or editable
- Dangerous commands filtered —
rm -rf /,format,shutdown,reboot - Size limits — 500KB per file, 500 lines per output
Layer 2: PMO Agent
- 1 concurrent execution maximum — no overlapping corrections
- 5-minute cooldown between corrections on the same service
- 10-minute timeout per Claude execution
- Full audit trail in
pmo_ejecucionestable
Layer 3: System Prompt (autocorrect.md)
- NEVER modifies
.env, credentials, or production configuration - NEVER installs unknown packages
- If root cause not identified with certainty: report only, don't touch code
- If fix fails: revert with
git checkout, report FAILURE - Maximum 3 files modified per correction
Layer 4: Claude CLI
- Budget cap: $2.00 per execution (failsafe against token loops)
- Model: Sonnet (fast, economical for corrections)
-
--strict-mcp-config— only uses configured MCP servers, ignores others - 1-hour sessions — shared context per conversation, auto-expires
Real Production Data
The system has been running since March 21, 2026. Here is the unfiltered data from pmo_ejecuciones:
Execution Summary
| Metric | Value |
|---|---|
| Total executions | 9 |
| Completed (success) | 7 (78%) |
| Infrastructure errors | 2 (bugs in the PMO itself, now fixed) |
| Claude errors | 0 |
| Real success rate | 100% (7/7 when Claude ran) |
The 2 failures were bugs in my infrastructure code — the PMO dispatcher, not Claude. When Claude actually executed, it succeeded every single time.
Execution Timeline (March 21, 2026)
| # | Project | Status | Duration | Time |
|---|---|---|---|---|
| 1 | tacos-api | completed | 11s | 9:46 PM |
| 2 | TacosAragon | error_infra | 0s | 9:59 PM |
| 3 | TacosAragon | completed | 107s | 10:06 PM |
| 4 | TacosAragon | completed | n/a | 10:32 PM |
| 5 | TacosAragon | error_infra | 180s | 10:38 PM |
| 6 | TacosAragon | completed | 408s | 10:46 PM |
| 7 | TacosAragon | completed | 600s | 11:00 PM |
| 8 | TacosAragon | completed | 123s | 11:16 PM |
| 9 | TacosAragon | completed | 600s | 11:22 PM |
Resolution Time
| Metric | Value |
|---|---|
| Average | 308s (~5 min) |
| Minimum | 11s |
| Maximum | 600s (~10 min) |
Token Consumption (per execution)
| Component | Characters | Approx. Tokens |
|---|---|---|
| System prompt | ~1,800 | ~500 |
| User prompt | ~400 | ~110 |
| MCP tool calls (read ~5 files) | ~50,000 | ~13,000 |
| MCP tool results (code read) | ~80,000 | ~21,000 |
| Total input | ~34,600 | |
| Response (output) | ~2,651 | ~700 |
Note: Token counts are approximate estimates based on character counts. Claude Code does not expose exact token counts for claude -p executions.
Database Footprint
| Table | Rows |
|---|---|
| mensajes_queue | 110 |
| mensajes_responses | 31 |
| pmo_ejecuciones | 9 |
| mensajes.db total size | 84 KB |
At this rate: ~10,000 executions before reaching 1 MB. The entire ecosystem's operational data fits in a text file.
The Cost Comparison
Using real token data (14,830 input / 193 output per execution):
| Platform | Cost/execution | Cost at 30 exec/day × 30 days | Annual |
|---|---|---|---|
| GPT-4o API ($2.50/M in, $10/M out) | $0.039 | $35.85/month | $430/year |
| Claude Sonnet API ($3/M in, $15/M out) | $0.047 | $42.50/month | $510/year |
| Plan Max | $0.000 | $100/month | $1,200/year |
Break-even point:
- vs GPT-4o: Max becomes cheaper at 84+ executions/day
- vs Sonnet API: Max becomes cheaper at 71+ executions/day
At 30 executions/day, the raw API cost is lower than Max. So why Max?
Because the PMO Agent is not the only thing running. The same $100/month also covers:
- All interactive Claude Code sessions (my daily coding copilot)
- The
aragon-git-guardiansecurity hook on everygit push - All Skills, CLAUDE.md hierarchies, and custom slash commands
- Any other
claude -pautomation I add
Break-even point: at ~71 Sonnet API executions/day, Max becomes cheaper than raw API billing. Everything below that is a bonus that comes included.
The real financial argument is not the PMO alone. It's that for a solo operator running 6 production services, the $100/month Max plan functions as an entire DevOps + QA + on-call engineering layer.
What This Is Not
This is not a prototype. It is not a tutorial with TODO: add real logic here. The system ran 9 times in one evening, resolved real tasks on real production code, and left an audit trail in a SQLite database.
It is also not a replacement for proper SRE practices. Circuit breakers, cooldowns, budget caps, and human-in-the-loop escalation exist precisely because autonomous agents fail in unexpected ways. The system is designed to operate safely within defined boundaries and escalate outside them.
What Surprised Me
After building this, I searched for prior art. I found:
- Telegram + Claude Code bridges — several repos, all require a separate API key
- Self-healing infrastructure — enterprise frameworks (VIGIL, AIOps) costing orders of magnitude more
- PMO-style agents — LangChain tutorials, demo-quality, no production data
I did not find a single documented implementation of claude -p as a production autocorrection engine, with per-project MCP servers, session persistence, and a 4-layer security model — running on a Max subscription without a separate API key.
The capability was there. The documentation was not.
The full implementation is open source: github.com/Gumagonza1/pmo-agent
The Stack
pmo-agent/
├── index.js # PM2 process: polls SQLite every 10s
├── claude-runner.js # Wrapper for claude -p (spawn + timeout)
├── config.js # Project map, paths, timings
├── mcp-projects.json # MCP config for claude -p
├── prompts/
│ ├── autocorrect.md # System prompt: autonomous correction
│ └── pmo-instruction.md # System prompt: admin instructions
└── state/ # Cooldown state per service
mcp-project-server/
└── server.py # 1,064 lines, 24 tools, 38.5 KB
telegram-dispatcher/
└── index.js # Recognizes !pmo prefix, writes to SQLite
mensajes.db # SQLite WAL, shared queue, 84 KB total
Configuration (config.js):
| Parameter | Value | Description |
|---|---|---|
| POLL_INTERVAL_MS | 10,000 | SQLite polling frequency |
| VERIFY_WAIT_MS | 15,000 | Wait after fix before verifying |
| CLAUDE_TIMEOUT_MS | 600,000 | 10-minute max per Claude execution |
| MAX_CONCURRENT | 1 | No overlapping executions |
| COOLDOWN_MS | 300,000 | 5-minute cooldown per service |
Managed projects:
| Project | PM2 Name | Port | Critical |
|---|---|---|---|
| TacosAragon | TacosAragon | 3003 | Yes |
| MonitorBot | MonitorBot | — | Yes |
| tacos-api | tacos-api | 3001 | Yes |
| telegram-dispatcher | telegram-dispatcher | — | No |
| cfo-agent | cfo-agent | 3002 | No |
| portfolio-aragon | portfolio-aragon | — | No |
Adding a New Project (3 Steps)
// Step 1: config.js
'nuevo-proyecto': {
mcp: 'project-nuevo',
root: 'C:\\ruta\\al\\proyecto',
pm2: 'nombre-en-pm2',
puerto: 3005,
critico: true,
}
// Step 2: mcp-projects.json
"project-nuevo": {
"command": "python",
"args": [
"C:\\...\\mcp-project-server\\server.py",
"--root", "C:\\ruta\\al\\proyecto",
"--pm2", "nombre-en-pm2",
"--name", "nuevo"
]
}
# Step 3
pm2 restart pmo-agent
A Real Fix: What Claude Actually Did
Theory is easy. Here is a specific production fix the PMO Agent diagnosed and applied on March 21, 2026.
The problem: TacosAragon (the WhatsApp ordering bot) was producing imprecise responses. The monitor flagged degraded quality in Gemini's outputs.
The PMO's diagnosis:
Causa principal: El menú CSV con UUIDs inyectaba ~25,000 chars de ruido al contexto de Gemini, degradando la precisión. Contexto total estimado bajó de ~76,500 a ~53,000 chars (-30%).
Translation: the menu being sent to Gemini included raw UUID identifiers from the Loyverse POS system — ~25,000 characters of noise the model didn't need. This was bloating the context by 30% and degrading response quality.
Three fixes applied autonomously to bot-tacos/index.js:
Fix 1 — menuParaIA() (line 211): Added UUID filtering before sending menu CSV to Gemini. Eliminated ~25,000 characters of irrelevant identifiers from the context.
Fix 2 — temperature reduction (lines 246, 262): Lowered from 0.5 to 0.3 to reduce hallucinations and improve ordering precision.
Fix 3 — MAX_HISTORIAL reduction (line 315): Reduced conversation history limit from 35,000 to 20,000 characters to prevent context overload in long sessions.
Result: Total context sent to Gemini dropped from ~76,500 to ~53,000 characters — a 30% reduction. Three targeted edits, no refactoring, code style preserved.
The Actual PMO Output
Here is the unedited message the PMO Agent sent to Telegram before applying any fix:
✅ PMO TacosAragon — Completed (123s)
🔗 Session: msg #1, 58min remaining
🔴 ROOT CAUSE: Menu CSV sent with UUIDs to Gemini
Each call includes full menuRaw (118 lines × 10 columns, 6 of them
36-char UUIDs):
agua-litro,10005,Agua cebada Litro,Bebidas,29,877e394e-...
Gemini receives ~25,000 chars of UUIDs it cannot use. This context
noise degrades instruction-following precision.
Fix: Filter menuRaw before injecting into the prompt.
function menuParaIA(csv) {
return csv.split('\n').map((linea, i) => {
if (i === 0) return 'Handle,Nombre,Categoria,Precio';
const cols = linea.split(',');
return [cols[0], cols[2], cols[3], cols[4]].join(',');
}).join('\n');
}
🟡 History at 35,000 chars — reduce to 20,000
🟡 Model: Flash instead of Pro (line 243)
🟢 Temperature 0.5 — lower to 0.3
Context estimate:
instrucciones.txt ~12,000 chars
menu.csv with UUIDs ~28,000 chars
History (max) ~35,000 chars
Profile + internal ~1,500 chars
TOTAL ~76,500 chars
With UUID fix ~53,000 chars (-30%)
Shall I apply the fixes now?
Notice the last line: "Shall I apply the fixes now?"
The PMO Agent did not execute automatically. It read five files, identified four causes ordered by impact, calculated exact context sizes, wrote the implementation code, and then paused for confirmation before touching a single line of production code.
This is human-in-the-loop by design, not by accident. The system prompt instructs Claude to present the diagnosis and proposed changes before executing anything above a defined risk threshold. The agent has full capability to apply changes unilaterally — it has edit_file and restart_process tools available. The constraint is deliberate.
After receiving approval, it applied all three fixes, restarted the service, and the entire cycle completed in 123 seconds.
How the System Evolved: Before vs After
The initial version had significant issues. Here is the BEFORE/AFTER after one evening of debugging:
| Aspect | Before | After |
|---|---|---|
| MCP servers per execution | 6 (all loaded simultaneously) | 1 (dynamic per project) |
| System prompt loading | Read from disk every time | Cached with mtime invalidation |
| Re-entrancy protection | None | Guard prevents duplicate execution |
| Orphan processes | Persisted after crashes | Auto-cleanup on exit |
| Execution timeout | 3 minutes | 20 minutes |
| Budget per execution | $0.50 | $2.00 |
procesado=1 timing |
Set BEFORE execution | Set in finally (AFTER) |
| stderr handling | Ignored | Used as fallback if stdout empty |
The most critical bug: procesado=1 was being set before Claude ran. If Claude failed or timed out, the message was marked as processed and never retried. Setting it in finally (always runs, regardless of success or failure) was a one-line fix that changed the entire reliability profile.
What I'm Building Next
-
aragon-ops-server— MCP tools for Loyverse (sales/inventory), Facturama (CFDI fiscal), and SQLite analytics. Claude will be able to query yesterday's sales or generate an invoice directly from a Telegram message. - Progressive cooldown backoff — 5min → 15min → 60min for repeated failures on the same service
- Daily budget cap across all PMO executions
- Daily ops report generated automatically every morning
Conclusion
The most valuable insight from this project is not about Claude. It is about the claude -p flag.
Most developers who use Claude Code use it interactively. They type, Claude responds, they review. But claude -p turns Claude Code into a callable function — a reasoning engine you can invoke programmatically, with MCP tool access, session persistence, and your existing subscription credentials.
No separate API key. No LangChain dependency. No new billing surface.
The PMO Agent is 9 executions old as of this writing. It has never made a code error. It fixed real bugs, added real features, and restarted real services — autonomously, from a Telegram message, on a phone, while I was doing something else.
If you are running production services on a Claude Code Max plan and you have not looked at claude -p, you are leaving a significant capability on the table.
Gumaro González is the owner and CTO of Tacos Aragón, a family restaurant in Culiacán, Sinaloa, México. He builds the Ecosistema Aragón — an AI-powered operations platform — as a solo developer since 2020.
GitHub: github.com/Gumagonza1
PMO Agent repo: github.com/Gumagonza1/pmo-agent
Live system: gumaro.dev.tacosaragon.com.mx
Top comments (0)