<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gumaro Gonzalez</title>
    <description>The latest articles on DEV Community by Gumaro Gonzalez (@gumagonza1).</description>
    <link>https://dev.to/gumagonza1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3838095%2F114dcf9e-4f46-4fa9-ad92-aa9b3ad315e1.png</url>
      <title>DEV Community: Gumaro Gonzalez</title>
      <link>https://dev.to/gumagonza1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gumagonza1"/>
    <language>en</language>
    <item>
      <title>24 Custom MCP Tools Later: Why Your Agent's Biggest Cost Is Not the Model — It's the Prompt</title>
      <dc:creator>Gumaro Gonzalez</dc:creator>
      <pubDate>Tue, 24 Mar 2026 06:17:42 +0000</pubDate>
      <link>https://dev.to/gumagonza1/24-custom-mcp-tools-later-why-your-agents-biggest-cost-is-not-the-model-its-the-prompt-1723</link>
      <guid>https://dev.to/gumagonza1/24-custom-mcp-tools-later-why-your-agents-biggest-cost-is-not-the-model-its-the-prompt-1723</guid>
      <description>&lt;p&gt;Every time your agent sends a prompt like "read the file src/routes/ventas.js, find line 45, and tell me what's there", you're paying for 25 tokens of natural language that the model has to interpret, might misunderstand, and will probably hallucinate part of the answer.&lt;br&gt;
When my agent does the same thing, it calls:&lt;br&gt;
{&lt;br&gt;
  "tool": "read_file",&lt;br&gt;
  "parameters": {&lt;br&gt;
    "path": "src/routes/ventas.js",&lt;br&gt;
    "offset": 40,&lt;br&gt;
    "limit": 10&lt;br&gt;
  }&lt;br&gt;
}&lt;br&gt;
The model didn't generate that path from memory. It didn't guess what's on line 45. The MCP tool returned the actual file content with line numbers from the actual file system. Zero interpretation. Zero hallucination. Fewer tokens.&lt;br&gt;
I built 24 custom MCP tools organized in 6 categories. They power an autonomous agent that manages 6 production services for my business. This post is about what I learned building those tools  and why MCP is the single biggest lever you have for reducing cost, hallucination, and prompt bloat in any agent system.&lt;br&gt;
What "Native Prompts" Means (And Why It Matters More Than Model Choice)&lt;br&gt;
I use the term "native prompt" to describe something most agent builders overlook: every MCP tool definition is an instruction the model consumes without you writing it in the system prompt.&lt;br&gt;
When you register a tool like this:&lt;br&gt;
&lt;a class="mentioned-user" href="https://dev.to/server"&gt;@server&lt;/a&gt;.tool()&lt;br&gt;
async def search_code(pattern: str, glob: str = "*&lt;em&gt;/&lt;/em&gt;", case_insensitive: bool = True) -&amp;gt; str:&lt;br&gt;
    """Regex search across project files using ripgrep. &lt;br&gt;
    Returns matching lines with file paths and line numbers.&lt;br&gt;
    Use for finding function definitions, variable usage, &lt;br&gt;
    import patterns, or error-related code."""&lt;br&gt;
That docstring, those parameter names, those type hints — they are documentation the model actually reads. You don't need to write in your system prompt: "When you need to find code patterns, use regex search. Pass the pattern as the first argument..." The tool schema already communicates this.&lt;br&gt;
This is a fundamental shift in how you think about prompt engineering for agents:&lt;br&gt;
TRADITIONAL AGENT PROMPT:&lt;br&gt;
┌─────────────────────────────────────────────┐&lt;br&gt;
│ System prompt (800 tokens)                  │&lt;br&gt;
│ ├── Role description                        │&lt;br&gt;
│ ├── How to read files (150 tokens)          │&lt;br&gt;
│ ├── How to edit files (200 tokens)          │&lt;br&gt;
│ ├── How to search code (100 tokens)         │&lt;br&gt;
│ ├── How to manage processes (120 tokens)    │&lt;br&gt;
│ ├── How to use git (130 tokens)             │&lt;br&gt;
│ └── Safety rules                            │&lt;br&gt;
│                                             │&lt;br&gt;
│ Every instruction = tokens you pay for      │&lt;br&gt;
│ Every ambiguity = hallucination risk        │&lt;br&gt;
└─────────────────────────────────────────────┘&lt;/p&gt;

&lt;p&gt;MCP-BASED AGENT:&lt;br&gt;
┌─────────────────────────────────────────────┐&lt;br&gt;
│ System prompt (200 tokens)                  │&lt;br&gt;
│ ├── Role description                        │&lt;br&gt;
│ └── Safety rules                            │&lt;br&gt;
│                                             │&lt;br&gt;
│ Tool schemas (consumed natively by model)   │&lt;br&gt;
│ ├── read_file: schema + docstring           │&lt;br&gt;
│ ├── edit_file: schema + docstring           │&lt;br&gt;
│ ├── search_code: schema + docstring         │&lt;br&gt;
│ ├── restart_process: schema + docstring     │&lt;br&gt;
│ ├── check_health: schema + docstring        │&lt;br&gt;
│ └── ... 19 more tools                       │&lt;br&gt;
│                                             │&lt;br&gt;
│ Instructions live IN the tools, not         │&lt;br&gt;
│ in prose the model might misread            │&lt;br&gt;
└─────────────────────────────────────────────┘&lt;br&gt;
The system prompt shrinks from 800 tokens to 200 because the tools carry their own documentation. And that documentation is structured  parameter names, types, descriptions  not free-text that the model has to parse and might misinterpret.&lt;br&gt;
Native prompts are cheaper, more precise, and harder to hallucinate against.&lt;br&gt;
The 24 Tools: Anatomy of a Custom MCP Server&lt;br&gt;
My MCP server is a single Python file using the MCP SDK. Each production service gets its own instance, parameterized by project root and process name. Here's the full toolbox:&lt;br&gt;
┌──────────────────────────────────────────────────┐&lt;br&gt;
│                 MCP PROJECT SERVER                │&lt;br&gt;
│            24 tools · 6 categories               │&lt;br&gt;
├──────────────┬───────────────────────────────────┤&lt;br&gt;
│  CODE READ   │  read_file         (with line #s) │&lt;br&gt;
│              │  list_files        (glob patterns) │&lt;br&gt;
│              │  search_code       (ripgrep regex) │&lt;br&gt;
│              │  get_project_structure  (dir tree) │&lt;br&gt;
├──────────────┼───────────────────────────────────┤&lt;br&gt;
│  CODE WRITE  │  edit_file    (search &amp;amp; replace)  │&lt;br&gt;
│              │  write_file   (create/overwrite)   │&lt;br&gt;
│              │  delete_file                       │&lt;br&gt;
│              │  create_directory                  │&lt;br&gt;
├──────────────┼───────────────────────────────────┤&lt;br&gt;
│  PM2 PROCESS │  get_status   (CPU, mem, uptime)  │&lt;br&gt;
│              │  view_logs    (last N lines)       │&lt;br&gt;
│              │  restart_process                   │&lt;br&gt;
│              │  stop_process                      │&lt;br&gt;
│              │  start_process                     │&lt;br&gt;
├──────────────┼───────────────────────────────────┤&lt;br&gt;
│  GIT         │  git_status                       │&lt;br&gt;
│              │  git_diff                          │&lt;br&gt;
│              │  git_log                           │&lt;br&gt;
│              │  git_pull                          │&lt;br&gt;
│              │  git_commit                        │&lt;br&gt;
│              │  git_add                           │&lt;br&gt;
├──────────────┼───────────────────────────────────┤&lt;br&gt;
│  TESTING     │  run_tests   (autodetect runtime) │&lt;br&gt;
│              │  check_health (HTTP status check)  │&lt;br&gt;
├──────────────┼───────────────────────────────────┤&lt;br&gt;
│  CONTEXT     │  read_claude_md   (project docs)  │&lt;br&gt;
│              │  get_dependencies (pkg/req files)  │&lt;br&gt;
│              │  run_command (shell, with timeout) │&lt;br&gt;
└──────────────┴───────────────────────────────────┘&lt;br&gt;
Every tool has hard constraints baked into the server code, not into the prompt:&lt;br&gt;
edit_file requires old_text to match exactly once in the file. If it's ambiguous, the tool returns an error — the model cannot apply a vague edit.&lt;br&gt;
read_file caps at 500 lines and 500KB — the model can't accidentally dump a 10MB log into context.&lt;br&gt;
run_command has a blocklist of 13 substrings + 5 regex patterns (path traversal, fork bombs, curl|bash piping, DROP TABLE, etc.).&lt;br&gt;
All tools are sandboxed to the project directory — path traversal with ../../ is blocked at the server level.&lt;br&gt;
These constraints would normally be paragraphs in your system prompt that the model might ignore under pressure. As MCP server logic, they are enforced by code, not by hope.&lt;br&gt;
How MCP Tools Reduce Token Consumption&lt;br&gt;
Let's trace a real operation: "Find where producto.precio is used and check if there's a null safety issue."&lt;br&gt;
Without MCP tools (raw prompt approach):&lt;br&gt;
TURN 1 — User prompt:&lt;br&gt;
"Search the codebase for producto.precio usage" (9 tokens)&lt;/p&gt;

&lt;p&gt;TURN 1 — Model output:&lt;br&gt;
"I'll search for that pattern. Let me run:&lt;br&gt;
grep -rn 'producto.precio' src/" (22 tokens generated)&lt;/p&gt;

&lt;p&gt;TURN 2 — Execution result injected:&lt;br&gt;
&lt;a href="https://dev.to~300%20tokens%20of%20context"&gt;raw grep output, 40 lines&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;TURN 2 — Model output:&lt;br&gt;
"I found it in 3 files. Let me read ventas.js:&lt;br&gt;
cat -n src/routes/ventas.js | head -60" (28 tokens generated)&lt;/p&gt;

&lt;p&gt;TURN 3 — Execution result injected:&lt;br&gt;
&lt;a href="https://dev.to~400%20tokens%20of%20context"&gt;60 lines of code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;TURN 3 — Model output:&lt;br&gt;
"Line 45 has no null check. Here's the fix..." (50 tokens)&lt;/p&gt;

&lt;p&gt;TOTAL: ~809 tokens across 3 turns&lt;br&gt;
With MCP tools:&lt;br&gt;
TURN 1 — Model calls tool:&lt;br&gt;
search_code(pattern="producto\.precio", glob="*&lt;em&gt;/&lt;/em&gt;.js")&lt;br&gt;
→ Tool returns structured matches (~120 tokens)&lt;/p&gt;

&lt;p&gt;TURN 1 — Model calls tool:&lt;br&gt;
read_file(path="src/routes/ventas.js", offset=40, limit=10)&lt;br&gt;&lt;br&gt;
→ Tool returns 10 lines with numbers (~80 tokens)&lt;/p&gt;

&lt;p&gt;TURN 1 — Model output:&lt;br&gt;
"Line 45 has no null check. Here's the fix..." (50 tokens)&lt;/p&gt;

&lt;p&gt;TOTAL: ~250 tokens in 1 turn&lt;br&gt;
3.2x fewer tokens. 1 turn instead of 3. No generated bash commands. No raw output parsing.&lt;br&gt;
The savings compound across every execution. With structured tool responses, the model receives exactly the data it needs in a predictable format no wasted tokens on grep headers, bash syntax, or conversational padding.&lt;br&gt;
The multiplier effect at scale&lt;br&gt;
Monthly executions:        30,000&lt;br&gt;
Tokens saved per exec:     ~550 (809 - 250)&lt;br&gt;
Total tokens saved/month:  16,500,000&lt;/p&gt;

&lt;p&gt;At Claude Sonnet API rates ($3 input / $15 output per MTok):&lt;br&gt;
Savings ≈ $150-$300/month just from token compression&lt;/p&gt;

&lt;p&gt;At GPT-4o rates ($2.50 / $10):&lt;br&gt;
Savings ≈ $100-$200/month&lt;br&gt;
This is before you factor in the flat subscription model. MCP tools reduce costs on API AND subscription plans on subscriptions because you consume less of your rate-limited quota per operation.&lt;br&gt;
The Cost Equation: Why Flat Beats Per-Token for Agents&lt;br&gt;
My agent runs on claude -p (Claude Code CLI) using a Max subscription at $100/month. No API key. No per-token billing. The CLI invokes Claude with native MCP support it reads mcp-projects.json, connects to the specified servers via stdio, and exposes all tools to the model automatically.&lt;br&gt;
Here's what this looks like compared to API pricing for a moderately active agent (1,000 daily executions, ~2,600 tokens each):&lt;br&gt;
MONTHLY COST COMPARISON — 30,000 executions/month&lt;br&gt;
═══════════════════════════════════════════════════&lt;/p&gt;

&lt;p&gt;$100  ██ Claude Max (flat)&lt;/p&gt;

&lt;p&gt;$390  ████████ GPT-4o API ($2.50/$10 per MTok)&lt;/p&gt;

&lt;p&gt;$408  ████████ Gemini 3.1 Pro ($2.00/$12)&lt;/p&gt;

&lt;p&gt;$441  █████████ GPT-5.2 API ($1.75/$14)&lt;/p&gt;

&lt;p&gt;$540  ███████████ Claude Sonnet 4.6 API ($3/$15)&lt;/p&gt;

&lt;p&gt;$900  ██████████████████ Claude Opus 4.6 API ($5/$25)&lt;br&gt;
The flat model wins by 4-9x. But the real insight is: MCP tools make the flat model even flatter. Because each execution consumes fewer tokens (thanks to tool compression), you fit more executions within the same rate-limited window.&lt;br&gt;
One developer tracked 10 billion tokens of Claude Code usage over 8 months and estimated it would have cost over $15,000 on API pricing. He paid ~$800 total on the Max plan. That's a 93% saving before any MCP optimization.&lt;br&gt;
Dynamic MCP config: only load what you need&lt;br&gt;
Here's an optimization most people miss. Instead of loading all 6 MCP servers (one per project) into every execution, my agent generates a mini-config with only the target project's server:&lt;br&gt;
// claude-runner.js&lt;br&gt;
function generarMiniConfig(proyecto) {&lt;br&gt;
  const serverKey = PROYECTOS[proyecto].mcp;&lt;br&gt;
  return {&lt;br&gt;
    mcpServers: {&lt;br&gt;
      [serverKey]: fullConfig.mcpServers[serverKey]&lt;br&gt;
    }&lt;br&gt;
  };&lt;br&gt;
}&lt;br&gt;
Why does this matter? Because every MCP server loaded = tool schemas injected into context = tokens consumed. If you have 6 servers × 24 tools = 144 tool definitions in context, that's a significant chunk of your prompt budget wasted on tools the model won't use in this execution.&lt;br&gt;
Loading only the relevant server keeps the tool context tight: 24 tools instead of 144. That's ~80% reduction in tool-schema tokens per execution.&lt;br&gt;
MCP as a Code Optimization Pattern&lt;br&gt;
Beyond cost and hallucination, custom MCP tools change how you structure agent code. Here are three patterns I've found most impactful:&lt;br&gt;
Pattern 1: Constraint enforcement via tool design&lt;br&gt;
Instead of writing in your prompt "never edit more than 3 files in a single operation", design the tool to enforce it:&lt;br&gt;
&lt;a class="mentioned-user" href="https://dev.to/server"&gt;@server&lt;/a&gt;.tool()&lt;br&gt;
async def edit_file(path: str, old_text: str, new_text: str) -&amp;gt; str:&lt;br&gt;
    """Edit a file using exact search and replace.&lt;br&gt;
    old_text must match exactly ONE location in the file.&lt;br&gt;
    If old_text appears 0 or 2+ times, the edit is rejected."""&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;content = read(path)
count = content.count(old_text)
if count == 0:
    return "ERROR: old_text not found in file"
if count &amp;gt; 1:
    return f"ERROR: old_text found {count} times. Be more specific."

new_content = content.replace(old_text, new_text, 1)
write(path, new_content)
return f"OK: replaced 1 occurrence in {path}"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The constraint is impossible to bypass through prompt manipulation. No amount of creative prompting will make edit_file accept an ambiguous edit.&lt;br&gt;
Pattern 2: Context injection via tool responses&lt;br&gt;
Your read_claude_md tool is not just "reading a file." It's injecting project-specific context into the model's reasoning window at exactly the right moment:&lt;br&gt;
&lt;a class="mentioned-user" href="https://dev.to/server"&gt;@server&lt;/a&gt;.tool()&lt;br&gt;
async def read_claude_md() -&amp;gt; str:&lt;br&gt;
    """Read the project's CLAUDE.md documentation.&lt;br&gt;
    Contains architecture decisions, conventions, &lt;br&gt;
    known issues, and deployment notes.&lt;br&gt;
    Call this BEFORE making changes to understand project context."""&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;claude_md = project_root / "CLAUDE.md"
if claude_md.exists():
    return claude_md.read_text(encoding="utf-8")[:5000]
return "No CLAUDE.md found for this project."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;That docstring "Call this BEFORE making changes" is a native prompt. The model reads it as part of the tool schema and learns when to use the tool, not just how. You didn't write this timing instruction in your system prompt. The tool teaches it.&lt;br&gt;
Pattern 3: One server, N projects&lt;br&gt;
The most powerful code optimization: my entire MCP server is one Python file parameterized by CLI arguments.&lt;/p&gt;

&lt;h1&gt;
  
  
  Same server.py, different instances
&lt;/h1&gt;

&lt;p&gt;python server.py --root C:\projects\api    --pm2 tacos-api     --name api&lt;br&gt;
python server.py --root C:\projects\bot    --pm2 TacosAragon   --name bot&lt;br&gt;
python server.py --root C:\projects\cfo    --pm2 cfo-agent     --name cfo&lt;br&gt;
Adding a new project to the agent requires zero new code. Three config lines:&lt;br&gt;
"project-new": {&lt;br&gt;
  "command": "python",&lt;br&gt;
  "args": ["server.py", "--root", "C:\new-project", "--pm2", "new-svc", "--name", "new"]&lt;br&gt;
}&lt;br&gt;
Restart. The agent now has full read/write/git/process/test capabilities over the new project. 24 tools, zero development time.&lt;br&gt;
This is the "N×M problem" that MCP was designed to solve. Without it, adding a new project would mean writing new integration code — bash scripts, API wrappers, custom parsers. With MCP, the protocol is the integration layer.&lt;br&gt;
Sessions: The Forgotten Token Optimization&lt;br&gt;
MCP tools compress tokens per execution. But sessions compress tokens across executions.&lt;br&gt;
My agent uses --session-id UUID to maintain context for 1 hour across up to 8 messages. Here's what this saves:&lt;br&gt;
WITHOUT SESSIONS:&lt;br&gt;
  Message 1: system prompt (200 tok) + tool schemas + task → response&lt;br&gt;
  Message 2: system prompt (200 tok) + tool schemas + task → response&lt;br&gt;
  Message 3: system prompt (200 tok) + tool schemas + task → response&lt;br&gt;
  Message 4: system prompt (200 tok) + tool schemas + task → response&lt;/p&gt;

&lt;p&gt;Total system prompt tokens: 800+ (repeated 4 times)&lt;br&gt;
  Context from previous messages: 0 (each starts fresh)&lt;/p&gt;

&lt;p&gt;WITH SESSIONS:&lt;br&gt;
  Message 1: system prompt (200 tok) + tool schemas + task → response&lt;br&gt;
  Message 2: task only → response (has full prior context)&lt;br&gt;
  Message 3: task only → response (has full prior context)&lt;br&gt;
  Message 4: task only → response (has full prior context)&lt;/p&gt;

&lt;p&gt;Total system prompt tokens: 200 (loaded once)&lt;br&gt;
  Context from previous messages: everything&lt;br&gt;
The model remembers files it read, changes it made, and errors it encountered without re-sending any of it. For a sequence of related operations (diagnose → fix → verify → report), sessions eliminate ~600 tokens of redundant context per follow-up message.&lt;br&gt;
Over 30,000 monthly executions with an average session length of 3 messages, that's roughly 12 million tokens saved tokens that never enter the context window, never count against your rate limit, and never cost you a cent.&lt;br&gt;
The Blocklist: What Your MCP Server Should Never Allow&lt;br&gt;
If you're building MCP tools that execute code or commands, here's the blocklist I arrived at after running in production:&lt;br&gt;
BLOCKED_SUBSTRINGS = [&lt;br&gt;
    "rm -rf /", "rm -rf ~",      # filesystem destruction&lt;br&gt;
    "format", "del /s /q",        # Windows destruction&lt;br&gt;
    "rmdir /s /q",                # Windows recursive delete&lt;br&gt;
    "shutdown", "reboot",          # system control&lt;br&gt;
    "halt", "poweroff",            # system control&lt;br&gt;
    ":(){:|:&amp;amp;};:",                  # fork bomb&lt;br&gt;
    "DROP TABLE",                  # database destruction&lt;br&gt;
    "chmod 777",                   # permission escalation&lt;br&gt;
    "chown -R",                    # ownership takeover&lt;br&gt;
]&lt;/p&gt;

&lt;p&gt;BLOCKED_PATTERNS = [&lt;br&gt;
    r"curl.&lt;em&gt;|\s&lt;/em&gt;(bash|sh|python|node)",   # remote code exec&lt;br&gt;
    r"wget.&lt;em&gt;|\s&lt;/em&gt;(bash|sh)",               # remote code exec&lt;br&gt;
    r"(bash|sh)\s+&amp;lt;(",                    # process substitution&lt;br&gt;
    r"eval\s+\$(",                        # eval injection&lt;br&gt;
    r"../../../..",                # path traversal&lt;br&gt;
]&lt;br&gt;
These are not prompt instructions. They are server-side enforcement that the model cannot circumvent regardless of prompt injection, jailbreaking, or hallucination. The run_command tool checks every command against this list before execution and returns a hard error if any pattern matches.&lt;br&gt;
Could a determined attacker find ways around these? Maybe. But the point is that the defense doesn't depend on the model behaving correctly. It's code, not a suggestion.&lt;br&gt;
Key Takeaways&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Every MCP tool is a native prompt. Tool schemas carry documentation that the model reads automatically. Move instructions from your system prompt into tool definitions they're cheaper, more precise, and structurally enforced.&lt;/li&gt;
&lt;li&gt;Tool constraints beat prompt constraints. "Never edit ambiguously" as a prompt instruction is a suggestion. edit_file rejecting non-unique matches is a guarantee. Put your guardrails in server code, not in prose.&lt;/li&gt;
&lt;li&gt;MCP tools compress tokens 3x. Structured tool calls replace multi-turn bash generation, raw output parsing, and conversational overhead. The savings compound at scale.&lt;/li&gt;
&lt;li&gt;Dynamic MCP config saves context budget. Load only the servers relevant to the current task. 24 tools in context instead of 144 is an 80% reduction in schema tokens.&lt;/li&gt;
&lt;li&gt;Sessions are the multiplier. Token compression per-execution (MCP tools) × token elimination across-executions (sessions) = dramatic reduction in total consumption. This is the compounding effect that makes the flat subscription model viable at high volume.&lt;/li&gt;
&lt;li&gt;One server, N projects. Parameterize your MCP server by project root and process name. Adding a new project should be a config change, not a code change.&lt;/li&gt;
&lt;li&gt;The flat subscription changes everything. At 30,000 executions/month, a $100 flat plan is 4-9x cheaper than any per-token API. MCP tools amplify this advantage by fitting more operations into the same rate-limited window.
Build your tools once. Reuse them everywhere. Let the protocol do the integration work.
The Stack
MCP Server: Python + MCP SDK (24 tools, single file, parameterized per project)
Agent runtime: claude -p (Claude Code CLI, Max plan, Sonnet model)
Protocol: MCP over stdio (JSON-RPC 2.0)
Sessions: --session-id / --resume (1-hour context retention)
Config: Dynamic per-execution mini-config (only target project's server)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I'm Gumaro González. I run a restaurant in Culiacán, México and I build the software behind it from the WhatsApp order bot to the autonomous agent infrastructure. Everything built with Claude Code as my copilot.&lt;/p&gt;

&lt;p&gt;GitHub: github.com/Gumagonza1&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>agents</category>
      <category>llm</category>
    </item>
    <item>
      <title>claude_runner — how I eliminated Claude API costs by using the subscription I was already paying for</title>
      <dc:creator>Gumaro Gonzalez</dc:creator>
      <pubDate>Mon, 23 Mar 2026 09:16:34 +0000</pubDate>
      <link>https://dev.to/gumagonza1/clauderunner-how-i-eliminated-claude-api-costs-by-using-the-subscription-i-was-already-paying-for-5gil</link>
      <guid>https://dev.to/gumagonza1/clauderunner-how-i-eliminated-claude-api-costs-by-using-the-subscription-i-was-already-paying-for-5gil</guid>
      <description>&lt;p&gt;For months I was paying for Claude twice. The monthly subscription and the API tokens every time an agent made a call.&lt;br&gt;
Turns out I didn't have to.&lt;br&gt;
The problem&lt;br&gt;
When you import the Anthropic SDK directly, every token gets billed:&lt;/p&gt;

&lt;h1&gt;
  
  
  This charges per token consumed
&lt;/h1&gt;

&lt;p&gt;from anthropic import Anthropic&lt;br&gt;
client = Anthropic(api_key="sk-...")&lt;br&gt;
response = client.messages.create(...)&lt;br&gt;
1,000 fiscal document analyses per month with Sonnet: between $25 and $80 USD on top of what you already pay for the subscription. And that scales linearly with every new agent you add to the system.&lt;br&gt;
The discovery&lt;br&gt;
Claude Code CLI has an authentication hierarchy that almost nobody documents:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;CLAUDE_CODE_USE_BEDROCK / USE_VERTEX   (cloud providers)&lt;/li&gt;
&lt;li&gt;ANTHROPIC_AUTH_TOKEN                   (proxies / gateways)&lt;/li&gt;
&lt;li&gt;ANTHROPIC_API_KEY                      ← per-token billing starts here&lt;/li&gt;
&lt;li&gt;apiKeyHelper script                    (rotating credentials)&lt;/li&gt;
&lt;li&gt;~/.claude/.credentials.json            ← your Max subscription lives here
When you run claude login, the CLI stores an OAuth token at ~/.claude/.credentials.json. That token is exactly the same one your interactive terminal uses. If there is no ANTHROPIC_API_KEY defined in the environment, the CLI falls back to position 5 and uses your subscription session.
claude -p spawned as a subprocess from your code does the same thing. No API key. No additional invoice.
What claude_runner is
It's a module that acts as a bridge between agents and Claude, using claude -p as a subprocess instead of the SDK.
It exists in two versions:
tacos-aragon-fiscal/src/claude_runner.py   ← Python, fiscal document analysis
pmo-agent/claude-runner.js                 ← JavaScript, production agent
Both do the same thing: spawn the claude -p process, parse the output, and return the response to the calling agent. No secrets in the code. No environment variable that can leak into logs.
Python implementation (79 lines)
The Python version is minimal. No external dependencies beyond the standard library.
"""
claude_runner.py — No API key, no per-token costs.
Replaces direct Anthropic SDK calls.
"""&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;import subprocess, sys, os, tempfile&lt;/p&gt;

&lt;p&gt;CLAUDE_TIMEOUT = 300  # 5 minutes default&lt;/p&gt;

&lt;p&gt;def run(system_prompt: str, user_message: str,&lt;br&gt;
        model: str = "sonnet",&lt;br&gt;
        max_budget: float = 2.0,&lt;br&gt;
        timeout: int = CLAUDE_TIMEOUT,&lt;br&gt;
        session_id: str | None = None) -&amp;gt; str:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;full_prompt = f"=== INSTRUCTIONS ===\n{system_prompt}\n\n=== TASK ===\n{user_message}"

tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.txt',
                                  delete=False, encoding='utf-8')
tmp.write(full_prompt)
tmp.close()

try:
    cmd = (
        f'type "{tmp.name}" | claude -p'
        f' --output-format text'
        f' --model {model}'
        f' --permission-mode bypassPermissions'
        f' --max-budget-usd {max_budget}'
    )
    if session_id:
        cmd += f' --session-id {session_id}'

    comspec = os.environ.get('COMSPEC', r'C:\Windows\system32\cmd.exe')
    result = subprocess.run(
        [comspec, '/c', cmd],
        capture_output=True, text=True,
        timeout=timeout, encoding='utf-8', errors='replace'
    )

    if result.returncode != 0:
        raise RuntimeError(f"Claude error: {result.stderr}")

    return result.stdout.strip()

finally:
    os.unlink(tmp.name)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Why a temp file instead of passing the prompt directly? Windows has a ~32,767 character limit on command-line arguments. Fiscal analysis prompts exceed that regularly. The temp file is the reliable solution.&lt;br&gt;
JavaScript implementation (with stream-json)&lt;br&gt;
const { spawn } = require('child_process');&lt;br&gt;
const crypto = require('crypto');&lt;br&gt;
const fs = require('fs');&lt;br&gt;
const os = require('os');&lt;br&gt;
const path = require('path');&lt;/p&gt;

&lt;p&gt;let _running = false; // Anti-reentrance guard&lt;/p&gt;

&lt;p&gt;async function runClaude({ promptFile, userPrompt, projectName }) {&lt;/p&gt;

&lt;p&gt;if (_running) {&lt;br&gt;
    return { ok: false, error: 'Execution already in progress' };&lt;br&gt;
  }&lt;br&gt;
  _running = true;&lt;/p&gt;

&lt;p&gt;const mcpConfig = getMcpConfigForProject(projectName);&lt;br&gt;
  const { sessionId, isNew } = getOrCreateSession(projectName);&lt;/p&gt;

&lt;p&gt;const finalPrompt = isNew&lt;br&gt;
    ? &lt;code&gt;${await fs.promises.readFile(promptFile, 'utf8')}\n\n${userPrompt}&lt;/code&gt;&lt;br&gt;
    : userPrompt;&lt;/p&gt;

&lt;p&gt;const tmpId     = crypto.randomBytes(4).toString('hex');&lt;br&gt;
  const tmpPrompt = path.join(os.tmpdir(), &lt;code&gt;pmo-prompt-${tmpId}.txt&lt;/code&gt;);&lt;br&gt;
  const tmpBat    = path.join(os.tmpdir(), &lt;code&gt;pmo-run-${tmpId}.bat&lt;/code&gt;);&lt;/p&gt;

&lt;p&gt;await fs.promises.writeFile(tmpPrompt, finalPrompt, 'utf8');&lt;/p&gt;

&lt;p&gt;const batContent = [&lt;br&gt;
    '&lt;a class="mentioned-user" href="https://dev.to/echo"&gt;@echo&lt;/a&gt; off',&lt;br&gt;
    &lt;code&gt;type "${tmpPrompt}" | claude -p ^&lt;/code&gt;,&lt;br&gt;
    &lt;code&gt;--output-format stream-json ^&lt;/code&gt;,&lt;br&gt;
    &lt;code&gt;--model sonnet ^&lt;/code&gt;,&lt;br&gt;
    &lt;code&gt;--mcp-config "${mcpConfig}" ^&lt;/code&gt;,&lt;br&gt;
    &lt;code&gt;--strict-mcp-config ^&lt;/code&gt;,&lt;br&gt;
    &lt;code&gt;--permission-mode bypassPermissions ^&lt;/code&gt;,&lt;br&gt;
    &lt;code&gt;--max-turns 20 ^&lt;/code&gt;,&lt;br&gt;
    &lt;code&gt;--max-budget-usd 2.00 ^&lt;/code&gt;,&lt;br&gt;
    &lt;code&gt;--session-id ${sessionId}&lt;/code&gt;&lt;br&gt;
  ].join('\r\n');&lt;/p&gt;

&lt;p&gt;await fs.promises.writeFile(tmpBat, batContent, 'utf8');&lt;/p&gt;

&lt;p&gt;return new Promise((resolve) =&amp;gt; {&lt;br&gt;
    let finalOutput = '';&lt;br&gt;
    let watchdog;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const resetWatchdog = () =&amp;gt; {
  clearTimeout(watchdog);
  watchdog = setTimeout(() =&amp;gt; {
    proc.kill();
    resolve({ ok: false, error: 'Inactivity timeout' });
  }, 5 * 60 * 1000);
};

const proc = spawn('cmd.exe', ['/c', tmpBat]);

proc.stdout.on('data', (chunk) =&amp;gt; {
  resetWatchdog();
  for (const line of chunk.toString().split('\n')) {
    if (!line.trim()) continue;
    try {
      const event = JSON.parse(line);
      processStreamEvent(event, (output) =&amp;gt; { finalOutput = output; });
    } catch { /* non-JSON line, ignore */ }
  }
});

proc.on('close', async (code) =&amp;gt; {
  clearTimeout(watchdog);
  _running = false;
  await Promise.allSettled([
    fs.promises.unlink(tmpPrompt),
    fs.promises.unlink(tmpBat)
  ]);
  resolve({ ok: code === 0, output: finalOutput, exitCode: code, sessionId });
});

resetWatchdog();
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;});&lt;br&gt;
}&lt;br&gt;
The stream-json event parser&lt;br&gt;
function processStreamEvent(event, onResult) {&lt;br&gt;
  switch (event.type) {&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;case 'system':
  console.log(`🔌 MCP: ${event.mcp_servers.map(s =&amp;gt; s.name).join(', ')}`);
  break;

case 'assistant':
  for (const block of event.message.content) {
    if (block.type === 'thinking') console.log(`💭 ${block.thinking}`);
    if (block.type === 'tool_use') console.log(`🔧 ${block.name}`);
    if (block.type === 'text')     console.log(`💬 ${block.text}`);
  }
  break;

case 'result':
  onResult(event.result);
  // cost_usd is informational in Plan Max — not a real charge
  console.log(`💰 Equivalent cost: $${event.cost_usd.toFixed(4)}`);
  break;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;}&lt;br&gt;
}&lt;br&gt;
Why stream-json instead of text? With text the process looks frozen until it finishes. With stream-json the inactivity watchdog can tell Claude is still working because each tool_use event resets the timer. Five minutes without any event means something went wrong.&lt;br&gt;
The three decisions that make it work in production&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Anti-reentrance guard
Without _running, the same error can trigger the agent twice in a row. Two Claude instances operating on the same codebase simultaneously is a race condition that ends badly.&lt;/li&gt;
&lt;li&gt;Dynamic MCP config per project
Instead of loading the global config with all servers in the ecosystem, the runner generates a minimal JSON before each execution:
{
"mcpServers": {
"project-tacos-bot": {
  "command": "python",
  "args": ["C:/servers/tacos-bot-mcp/index.py"]
}
}
}
If you load 6 MCP servers with 24 tools each when you only need 24 tools from one of them, you are injecting the schema of 144 tools into the context. More input tokens, slower execution, and Claude has to ignore tools it will never use.&lt;/li&gt;
&lt;li&gt;Session management per project
Each project gets a session UUID with a 1-hour TTL and an 8-message limit:
const SESSION_TTL_MS      = 60 * 60 * 1000;
const SESSION_MAX_MESSAGES = 8;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;function getOrCreateSession(projectName) {&lt;br&gt;
  const now = Date.now();&lt;br&gt;
  const existing = activeSessions.get(projectName);&lt;/p&gt;

&lt;p&gt;if (existing &amp;amp;&amp;amp;&lt;br&gt;
      (now - existing.createdAt) &amp;lt; SESSION_TTL_MS &amp;amp;&amp;amp;&lt;br&gt;
      existing.messages &amp;lt; SESSION_MAX_MESSAGES) {&lt;br&gt;
    return { sessionId: existing.sessionId, isNew: false };&lt;br&gt;
  }&lt;/p&gt;

&lt;p&gt;const sessionId = crypto.randomUUID();&lt;br&gt;
  activeSessions.set(projectName, { sessionId, createdAt: now, messages: 0 });&lt;br&gt;
  return { sessionId, isNew: true };&lt;br&gt;
}&lt;br&gt;
On an active session, the prompt skips the instructions. Claude remembers them from the first message of the session. On a new session, they go in full. The difference in input tokens is significant when the system prompt is long.&lt;br&gt;
The trap to avoid&lt;br&gt;
If at any point you define ANTHROPIC_API_KEY in the server environment, the CLI detects it at position 3 in the hierarchy and starts billing per token with no warning. The runner stops using the subscription without you noticing.&lt;/p&gt;

&lt;h1&gt;
  
  
  Check before going to production
&lt;/h1&gt;

&lt;p&gt;echo $ANTHROPIC_API_KEY&lt;/p&gt;

&lt;h1&gt;
  
  
  If this returns anything, there is a problem
&lt;/h1&gt;

&lt;p&gt;unset ANTHROPIC_API_KEY&lt;br&gt;
That variable must not exist in the environment where the agent runs.&lt;br&gt;
The cost_usd field is not a charge&lt;br&gt;
The result event in the stream-json output includes a cost_usd field. That number shows what it would cost if you were in API mode. In Plan Max it is not deducted from any balance and generates no billing.&lt;br&gt;
I log it as an efficiency reference to know exactly how much I am saving per execution and to catch if a call is consuming more context than expected.&lt;br&gt;
Cost comparison&lt;br&gt;
Setup&lt;br&gt;
Per execution&lt;br&gt;
500 analyses/month&lt;br&gt;
1,000/month&lt;br&gt;
Sonnet SDK direct&lt;br&gt;
~$0.047&lt;br&gt;
~$23&lt;br&gt;
~$47&lt;br&gt;
GPT-4o API direct&lt;br&gt;
~$0.039&lt;br&gt;
~$19&lt;br&gt;
~$39&lt;br&gt;
claude_runner Plan Max&lt;br&gt;
$0.00&lt;br&gt;
$0.00&lt;br&gt;
$0.00&lt;br&gt;
Break-even against Plan Max at $200/month: 71 executions per day. After that point, the API costs more than the full plan.&lt;br&gt;
What runs in production&lt;br&gt;
tacos-aragon-fiscal (Python): downloads CFDIs from Mexico's SAT tax authority, parses the fiscal XML, calculates taxes, and detects inconsistencies between declared income and actual sales.&lt;br&gt;
pmo-agent (JavaScript): detects errors in production processes, proposes fixes ordered by impact, waits for Telegram approval, applies changes, commits, restarts PM2, and reports exactly what it did. If something ends up worse than before, it runs git checkout automatically.&lt;br&gt;
aragon-git-guardian: intercepts every git push, scans the repo with grep across 10 security categories, and if it finds anything calls claude_runner with the specific evidence for contextual analysis. No AI when there are no findings, no cost when the repo is clean.&lt;br&gt;
Monthly cost for all these agents combined: $0 additional on top of the subscription.&lt;br&gt;
How to replicate it&lt;/p&gt;

&lt;h1&gt;
  
  
  1. Install and authenticate Claude Code
&lt;/h1&gt;

&lt;p&gt;npm install -g @anthropic-ai/claude-code&lt;br&gt;
claude login&lt;/p&gt;

&lt;h1&gt;
  
  
  2. Verify there is no API key in the environment
&lt;/h1&gt;

&lt;p&gt;unset ANTHROPIC_API_KEY&lt;br&gt;
echo $ANTHROPIC_API_KEY  # should be empty&lt;/p&gt;

&lt;h1&gt;
  
  
  3. Test that the CLI uses the subscription
&lt;/h1&gt;

&lt;p&gt;claude -p "respond with just: ok"&lt;/p&gt;

&lt;h1&gt;
  
  
  Should respond without asking for an API key
&lt;/h1&gt;

&lt;p&gt;After that: spawn claude -p as a subprocess with spawn (Node.js) or subprocess (Python), parse the stream-json, and manage the process lifecycle.&lt;br&gt;
The highest-impact decision is writing your own MCP server instead of using generic third-party MCPs. Your own context is cleaner, uses fewer tokens, and Claude enters the session already knowing how your system is built.&lt;/p&gt;

&lt;p&gt;PMO Agent repo: github.com/Gumagonza1/pmo-agent&lt;/p&gt;

&lt;p&gt;Claude_runner repo:&lt;a href="https://github.com/Gumagonza1/claude-runner" rel="noopener noreferrer"&gt;https://github.com/Gumagonza1/claude-runner&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Self-healing server article: dev.to/gumagonza1/i-built-a-self-healing-production-server-using-claude-code-no-api-key-no-extra-cost-1eoo&lt;/p&gt;

</description>
      <category>claude</category>
      <category>ai</category>
      <category>productivity</category>
      <category>devops</category>
    </item>
    <item>
      <title>I Built a Self-Healing Production Server Using Claude Code — No API Key, No Extra Cost</title>
      <dc:creator>Gumaro Gonzalez</dc:creator>
      <pubDate>Sun, 22 Mar 2026 08:11:17 +0000</pubDate>
      <link>https://dev.to/gumagonza1/i-built-a-self-healing-production-server-using-claude-code-no-api-key-no-extra-cost-1eoo</link>
      <guid>https://dev.to/gumagonza1/i-built-a-self-healing-production-server-using-claude-code-no-api-key-no-extra-cost-1eoo</guid>
      <description>&lt;p&gt;&lt;em&gt;How a restaurant operator in México built an autonomous AI agent that reads logs, fixes bugs, restarts services, and reports back — all from a Telegram message, using a $100/month plan.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;I run six production services on a Windows Server 2022 VM in GCP. A WhatsApp bot, a REST API, a CFO fiscal agent, a Telegram dispatcher, a portfolio site, and a monitor. When something breaks at 2 AM, I either wake up to fix it or it stays broken until morning.&lt;/p&gt;

&lt;p&gt;I wanted autonomous self-correction. Every solution I found required either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An expensive enterprise platform (Salesforce Agentforce, PagerDuty + AI add-ons)&lt;/li&gt;
&lt;li&gt;A complex LangChain/CrewAI setup with its own API key billing&lt;/li&gt;
&lt;li&gt;Or it was a demo that didn't survive contact with production&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I already had Claude Code on the Max plan ($100/month). I was using it as my daily coding copilot. Then I realized: &lt;code&gt;claude -p&lt;/code&gt; runs Claude non-interactively. It accepts a system prompt. It connects to MCP servers. And it uses &lt;strong&gt;my existing session&lt;/strong&gt; — no API key, no extra cost.&lt;/p&gt;

&lt;p&gt;That was the insight that made everything else possible.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;The PMO Agent (Project Management Operations Agent) is the only component in my ecosystem capable of modifying code in production. It connects three systems:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ADMIN (Telegram)
    !pmo tacos-api: fix the null check in ventas.js
         ↓
telegram-dispatcher (PM2)
         ↓
mensajes.db (SQLite WAL — shared queue)
         ↓
PMO Agent — polls every 10 seconds
         ↓
claude -p (Plan Max, no API key)
         ↓
MCP Project Server (Python, 24 tools per project)
         ↓
Read logs → Edit files → Restart PM2 → Verify → Report
         ↓
Telegram ← "Fixed. Null check added at line 45. Service online, 0 errors in last 15s."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The key principle:&lt;/strong&gt; &lt;code&gt;claude -p&lt;/code&gt; invokes Claude Code CLI in non-interactive mode. It uses the authenticated Max subscription of the user — the same session that powers your interactive terminal. There is no second API call. There is no token billing. The same $100/month that covers your daily coding covers your autonomous production agent.&lt;/p&gt;




&lt;h2&gt;
  
  
  The MCP Server: 24 Tools in 6 Categories
&lt;/h2&gt;

&lt;p&gt;Each project gets its own instance of a Python MCP server. Claude connects to exactly the right project's tools — no cross-contamination, no blast radius beyond the target service.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1,064 lines of Python. 38.5 KB. 24 tools.&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;th&gt;Tools&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Read&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;read_file&lt;/code&gt;, &lt;code&gt;list_files&lt;/code&gt;, &lt;code&gt;search_code&lt;/code&gt;, &lt;code&gt;get_project_structure&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;write_file&lt;/code&gt;, &lt;code&gt;edit_file&lt;/code&gt;, &lt;code&gt;delete_file&lt;/code&gt;, &lt;code&gt;create_directory&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Git&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;git_status&lt;/code&gt;, &lt;code&gt;git_diff&lt;/code&gt;, &lt;code&gt;git_log&lt;/code&gt;, &lt;code&gt;git_pull&lt;/code&gt;, &lt;code&gt;git_commit&lt;/code&gt;, &lt;code&gt;git_add&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PM2&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;get_status&lt;/code&gt;, &lt;code&gt;view_logs&lt;/code&gt;, &lt;code&gt;restart_process&lt;/code&gt;, &lt;code&gt;stop_process&lt;/code&gt;, &lt;code&gt;start_process&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Testing&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;run_tests&lt;/code&gt;, &lt;code&gt;check_health&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;read_claude_md&lt;/code&gt;, &lt;code&gt;get_dependencies&lt;/code&gt;, &lt;code&gt;run_command&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;edit_file&lt;/code&gt; tool is the most critical. It works as a strict search-and-replace: it finds &lt;code&gt;old_text&lt;/code&gt; exactly in the file and replaces it with &lt;code&gt;new_text&lt;/code&gt;. If &lt;code&gt;old_text&lt;/code&gt; is not unique in the file, it fails and asks for more context. This prevents accidental edits. Claude cannot blindly overwrite files — it must identify the exact code it wants to change.&lt;/p&gt;

&lt;p&gt;Each MCP server is invoked with &lt;code&gt;--strict-mcp-config&lt;/code&gt;, meaning Claude only sees the tools for the target project. Nothing else.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Modes of Operation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mode 1: Admin Instruction via Telegram
&lt;/h3&gt;

&lt;p&gt;You send a message prefixed with &lt;code&gt;!pmo&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!pmo tacos-api: add a GET /health endpoint that returns { status: "ok" }
!pmo bot: change the welcome message to "Hola, qué vas a ordenar?"
!pmo cfo-agent: fix the bug where it doesn't parse DD/MM/YYYY dates
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Telegram shortcuts with autocomplete (15 commands) make this feel like a native interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/pmo_api   → targets tacos-api
/pmo_bot   → targets TacosAragon WhatsApp bot
/pmo_cfo   → targets cfo-agent
/pmo_telegram → targets telegram-dispatcher
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Mode 2: Autonomous Autocorrection
&lt;/h3&gt;

&lt;p&gt;When the monitor detects a repeated error, it enqueues an autocorrect message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AUTOCORRECT|tacos-api|TypeError: Cannot read property "precio" of undefined at src/routes/ventas.js:45
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The PMO Agent picks this up and executes a full repair cycle without any human input:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Diagnose&lt;/strong&gt; — reads logs, reads the file with the error, searches the codebase for the pattern&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analyze&lt;/strong&gt; — identifies the exact line, classifies severity (CRITICAL/HIGH/MEDIUM/LOW)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix&lt;/strong&gt; — applies the minimum change using &lt;code&gt;edit_file&lt;/code&gt;, never refactors, preserves style&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify&lt;/strong&gt; — restarts the process, waits 10 seconds, checks logs and HTTP health endpoint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Report&lt;/strong&gt; — sends a structured Telegram message (max 4,000 chars)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example autocorrect report:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AUTOCORRECT [tacos-api] — SUCCESS
Error: TypeError at src/routes/ventas.js:45
Cause: variable "producto" can be null when item is out of stock
Fix: added null check before accessing .precio
Verification: service online, 0 errors in last 15s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Session Context: The Feature Most Builders Miss
&lt;/h2&gt;

&lt;p&gt;Each admin session shares context for 1 hour. Claude remembers everything it has read, every change it has made, every project it has touched — across multiple messages.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;10:00  !pmo tacos-api: explain the architecture
       → NEW SESSION (expires 11:00). Claude reads files, responds.

10:05  !pmo tacos-api: add a /health endpoint
       → CONTINUES session (msg #2). Claude already knows the architecture.
       → Edits code, restarts, verifies.

10:12  !pmo cfo-agent: what endpoints does it have?
       → SAME session (msg #3). Claude remembers tacos-api AND reads cfo.

10:20  !pmo tacos-api: write the test for /health
       → SAME session (msg #4). Knows exactly what it built.

11:01  !pmo bot: add RFC validation
       → Session expired. NEW SESSION starts.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Implementation: first call uses &lt;code&gt;--session-id UUID&lt;/code&gt; (creates a session on disk). Subsequent calls use &lt;code&gt;--resume UUID&lt;/code&gt; (resumes with full context). After 1 hour, UUID expires and a new one is generated.&lt;/p&gt;

&lt;p&gt;This is not available in any LangChain tutorial. It's a Claude Code CLI feature that most people using it interactively don't know exists.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Security Model: 4 Concentric Layers
&lt;/h2&gt;

&lt;p&gt;Autonomous code execution is dangerous. Every layer fails independently.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: MCP Server (per project)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Path traversal blocked — &lt;code&gt;..&lt;/code&gt; cannot escape the project directory&lt;/li&gt;
&lt;li&gt;Sensitive files blocked — &lt;code&gt;.env&lt;/code&gt;, &lt;code&gt;*.pem&lt;/code&gt;, &lt;code&gt;*.key&lt;/code&gt;, &lt;code&gt;credentials.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Binaries excluded — images, executables, databases not readable or editable&lt;/li&gt;
&lt;li&gt;Dangerous commands filtered — &lt;code&gt;rm -rf /&lt;/code&gt;, &lt;code&gt;format&lt;/code&gt;, &lt;code&gt;shutdown&lt;/code&gt;, &lt;code&gt;reboot&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Size limits — 500KB per file, 500 lines per output&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 2: PMO Agent
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;1 concurrent execution maximum — no overlapping corrections&lt;/li&gt;
&lt;li&gt;5-minute cooldown between corrections on the same service&lt;/li&gt;
&lt;li&gt;10-minute timeout per Claude execution&lt;/li&gt;
&lt;li&gt;Full audit trail in &lt;code&gt;pmo_ejecuciones&lt;/code&gt; table&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 3: System Prompt (&lt;code&gt;autocorrect.md&lt;/code&gt;)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;NEVER modifies &lt;code&gt;.env&lt;/code&gt;, credentials, or production configuration&lt;/li&gt;
&lt;li&gt;NEVER installs unknown packages&lt;/li&gt;
&lt;li&gt;If root cause not identified with certainty: report only, don't touch code&lt;/li&gt;
&lt;li&gt;If fix fails: revert with &lt;code&gt;git checkout&lt;/code&gt;, report FAILURE&lt;/li&gt;
&lt;li&gt;Maximum 3 files modified per correction&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 4: Claude CLI
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Budget cap: $2.00 per execution (failsafe against token loops)&lt;/li&gt;
&lt;li&gt;Model: Sonnet (fast, economical for corrections)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--strict-mcp-config&lt;/code&gt; — only uses configured MCP servers, ignores others&lt;/li&gt;
&lt;li&gt;1-hour sessions — shared context per conversation, auto-expires&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Real Production Data
&lt;/h2&gt;

&lt;p&gt;The system has been running since March 21, 2026. Here is the unfiltered data from &lt;code&gt;pmo_ejecuciones&lt;/code&gt;:&lt;/p&gt;

&lt;h3&gt;
  
  
  Execution Summary
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Total executions&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Completed (success)&lt;/td&gt;
&lt;td&gt;7 (78%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure errors&lt;/td&gt;
&lt;td&gt;2 (bugs in the PMO itself, now fixed)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude errors&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real success rate&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100% (7/7 when Claude ran)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 2 failures were bugs in my infrastructure code — the PMO dispatcher, not Claude. When Claude actually executed, it succeeded every single time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Execution Timeline (March 21, 2026)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;th&gt;Duration&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;tacos-api&lt;/td&gt;
&lt;td&gt;completed&lt;/td&gt;
&lt;td&gt;11s&lt;/td&gt;
&lt;td&gt;9:46 PM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;TacosAragon&lt;/td&gt;
&lt;td&gt;error_infra&lt;/td&gt;
&lt;td&gt;0s&lt;/td&gt;
&lt;td&gt;9:59 PM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;TacosAragon&lt;/td&gt;
&lt;td&gt;completed&lt;/td&gt;
&lt;td&gt;107s&lt;/td&gt;
&lt;td&gt;10:06 PM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;TacosAragon&lt;/td&gt;
&lt;td&gt;completed&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;10:32 PM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;TacosAragon&lt;/td&gt;
&lt;td&gt;error_infra&lt;/td&gt;
&lt;td&gt;180s&lt;/td&gt;
&lt;td&gt;10:38 PM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;TacosAragon&lt;/td&gt;
&lt;td&gt;completed&lt;/td&gt;
&lt;td&gt;408s&lt;/td&gt;
&lt;td&gt;10:46 PM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;TacosAragon&lt;/td&gt;
&lt;td&gt;completed&lt;/td&gt;
&lt;td&gt;600s&lt;/td&gt;
&lt;td&gt;11:00 PM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;TacosAragon&lt;/td&gt;
&lt;td&gt;completed&lt;/td&gt;
&lt;td&gt;123s&lt;/td&gt;
&lt;td&gt;11:16 PM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;TacosAragon&lt;/td&gt;
&lt;td&gt;completed&lt;/td&gt;
&lt;td&gt;600s&lt;/td&gt;
&lt;td&gt;11:22 PM&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Resolution Time
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Average&lt;/td&gt;
&lt;td&gt;308s (~5 min)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Minimum&lt;/td&gt;
&lt;td&gt;11s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maximum&lt;/td&gt;
&lt;td&gt;600s (~10 min)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Token Consumption (per execution)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Characters&lt;/th&gt;
&lt;th&gt;Approx. Tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;System prompt&lt;/td&gt;
&lt;td&gt;~1,800&lt;/td&gt;
&lt;td&gt;~500&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User prompt&lt;/td&gt;
&lt;td&gt;~400&lt;/td&gt;
&lt;td&gt;~110&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP tool calls (read ~5 files)&lt;/td&gt;
&lt;td&gt;~50,000&lt;/td&gt;
&lt;td&gt;~13,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP tool results (code read)&lt;/td&gt;
&lt;td&gt;~80,000&lt;/td&gt;
&lt;td&gt;~21,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total input&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~34,600&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Response (output)&lt;/td&gt;
&lt;td&gt;~2,651&lt;/td&gt;
&lt;td&gt;~700&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Note: Token counts are approximate estimates based on character counts. Claude Code does not expose exact token counts for &lt;code&gt;claude -p&lt;/code&gt; executions.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Database Footprint
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Table&lt;/th&gt;
&lt;th&gt;Rows&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;mensajes_queue&lt;/td&gt;
&lt;td&gt;110&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mensajes_responses&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pmo_ejecuciones&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;mensajes.db total size&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;84 KB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At this rate: ~10,000 executions before reaching 1 MB. The entire ecosystem's operational data fits in a text file.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Cost Comparison
&lt;/h2&gt;

&lt;p&gt;Using real token data (14,830 input / 193 output per execution):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Cost/execution&lt;/th&gt;
&lt;th&gt;Cost at 30 exec/day × 30 days&lt;/th&gt;
&lt;th&gt;Annual&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4o API ($2.50/M in, $10/M out)&lt;/td&gt;
&lt;td&gt;$0.039&lt;/td&gt;
&lt;td&gt;$35.85/month&lt;/td&gt;
&lt;td&gt;$430/year&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Sonnet API ($3/M in, $15/M out)&lt;/td&gt;
&lt;td&gt;$0.047&lt;/td&gt;
&lt;td&gt;$42.50/month&lt;/td&gt;
&lt;td&gt;$510/year&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Plan Max&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.000&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$100/month&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$1,200/year&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Break-even point:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;vs GPT-4o: Max becomes cheaper at &lt;strong&gt;84+ executions/day&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;vs Sonnet API: Max becomes cheaper at &lt;strong&gt;71+ executions/day&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At 30 executions/day, the raw API cost is lower than Max. So why Max?&lt;/p&gt;

&lt;p&gt;Because the PMO Agent is not the only thing running. The same $100/month also covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All interactive Claude Code sessions (my daily coding copilot)&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;aragon-git-guardian&lt;/code&gt; security hook on every &lt;code&gt;git push&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;All Skills, CLAUDE.md hierarchies, and custom slash commands&lt;/li&gt;
&lt;li&gt;Any other &lt;code&gt;claude -p&lt;/code&gt; automation I add&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Break-even point:&lt;/strong&gt; at ~71 Sonnet API executions/day, Max becomes cheaper than raw API billing. Everything below that is a bonus that comes included.&lt;/p&gt;

&lt;p&gt;The real financial argument is not the PMO alone. It's that for a solo operator running 6 production services, the $100/month Max plan functions as an entire DevOps + QA + on-call engineering layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Is Not
&lt;/h2&gt;

&lt;p&gt;This is not a prototype. It is not a tutorial with &lt;code&gt;TODO: add real logic here&lt;/code&gt;. The system ran 9 times in one evening, resolved real tasks on real production code, and left an audit trail in a SQLite database.&lt;/p&gt;

&lt;p&gt;It is also not a replacement for proper SRE practices. Circuit breakers, cooldowns, budget caps, and human-in-the-loop escalation exist precisely because autonomous agents fail in unexpected ways. The system is designed to operate safely within defined boundaries and escalate outside them.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Surprised Me
&lt;/h2&gt;

&lt;p&gt;After building this, I searched for prior art. I found:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Telegram + Claude Code bridges&lt;/strong&gt; — several repos, all require a separate API key&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-healing infrastructure&lt;/strong&gt; — enterprise frameworks (VIGIL, AIOps) costing orders of magnitude more&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PMO-style agents&lt;/strong&gt; — LangChain tutorials, demo-quality, no production data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I did not find a single documented implementation of &lt;code&gt;claude -p&lt;/code&gt; as a production autocorrection engine, with per-project MCP servers, session persistence, and a 4-layer security model — running on a Max subscription without a separate API key.&lt;/p&gt;

&lt;p&gt;The capability was there. The documentation was not.&lt;/p&gt;

&lt;p&gt;The full implementation is open source: &lt;strong&gt;&lt;a href="https://github.com/Gumagonza1/pmo-agent" rel="noopener noreferrer"&gt;github.com/Gumagonza1/pmo-agent&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pmo-agent/
├── index.js              # PM2 process: polls SQLite every 10s
├── claude-runner.js      # Wrapper for claude -p (spawn + timeout)
├── config.js             # Project map, paths, timings
├── mcp-projects.json     # MCP config for claude -p
├── prompts/
│   ├── autocorrect.md    # System prompt: autonomous correction
│   └── pmo-instruction.md # System prompt: admin instructions
└── state/               # Cooldown state per service

mcp-project-server/
└── server.py             # 1,064 lines, 24 tools, 38.5 KB

telegram-dispatcher/
└── index.js             # Recognizes !pmo prefix, writes to SQLite

mensajes.db              # SQLite WAL, shared queue, 84 KB total
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Configuration (config.js):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;POLL_INTERVAL_MS&lt;/td&gt;
&lt;td&gt;10,000&lt;/td&gt;
&lt;td&gt;SQLite polling frequency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VERIFY_WAIT_MS&lt;/td&gt;
&lt;td&gt;15,000&lt;/td&gt;
&lt;td&gt;Wait after fix before verifying&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLAUDE_TIMEOUT_MS&lt;/td&gt;
&lt;td&gt;600,000&lt;/td&gt;
&lt;td&gt;10-minute max per Claude execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MAX_CONCURRENT&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;No overlapping executions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;COOLDOWN_MS&lt;/td&gt;
&lt;td&gt;300,000&lt;/td&gt;
&lt;td&gt;5-minute cooldown per service&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Managed projects:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;PM2 Name&lt;/th&gt;
&lt;th&gt;Port&lt;/th&gt;
&lt;th&gt;Critical&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;TacosAragon&lt;/td&gt;
&lt;td&gt;TacosAragon&lt;/td&gt;
&lt;td&gt;3003&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MonitorBot&lt;/td&gt;
&lt;td&gt;MonitorBot&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tacos-api&lt;/td&gt;
&lt;td&gt;tacos-api&lt;/td&gt;
&lt;td&gt;3001&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;telegram-dispatcher&lt;/td&gt;
&lt;td&gt;telegram-dispatcher&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cfo-agent&lt;/td&gt;
&lt;td&gt;cfo-agent&lt;/td&gt;
&lt;td&gt;3002&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;portfolio-aragon&lt;/td&gt;
&lt;td&gt;portfolio-aragon&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Adding a New Project (3 Steps)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Step 1: config.js&lt;/span&gt;
&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;nuevo-proyecto&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;project-nuevo&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;C:&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;ruta&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;al&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;proyecto&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;pm2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;nombre-en-pm2&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;puerto&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3005&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;critico&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Step&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;mcp-projects.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"project-nuevo"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"C:&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;...&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;mcp-project-server&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;server.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"--root"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"C:&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;ruta&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;al&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;proyecto"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"--pm2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nombre-en-pm2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"--name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nuevo"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Step 3&lt;/span&gt;
pm2 restart pmo-agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  A Real Fix: What Claude Actually Did
&lt;/h2&gt;

&lt;p&gt;Theory is easy. Here is a specific production fix the PMO Agent diagnosed and applied on March 21, 2026.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; TacosAragon (the WhatsApp ordering bot) was producing imprecise responses. The monitor flagged degraded quality in Gemini's outputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The PMO's diagnosis:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Causa principal: El menú CSV con UUIDs inyectaba ~25,000 chars de ruido al contexto de Gemini, degradando la precisión. Contexto total estimado bajó de ~76,500 a ~53,000 chars (-30%).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Translation: the menu being sent to Gemini included raw UUID identifiers from the Loyverse POS system — ~25,000 characters of noise the model didn't need. This was bloating the context by 30% and degrading response quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three fixes applied autonomously to &lt;code&gt;bot-tacos/index.js&lt;/code&gt;:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix 1 — &lt;code&gt;menuParaIA()&lt;/code&gt; (line 211):&lt;/strong&gt; Added UUID filtering before sending menu CSV to Gemini. Eliminated ~25,000 characters of irrelevant identifiers from the context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix 2 — temperature reduction (lines 246, 262):&lt;/strong&gt; Lowered from 0.5 to 0.3 to reduce hallucinations and improve ordering precision.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix 3 — &lt;code&gt;MAX_HISTORIAL&lt;/code&gt; reduction (line 315):&lt;/strong&gt; Reduced conversation history limit from 35,000 to 20,000 characters to prevent context overload in long sessions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Total context sent to Gemini dropped from ~76,500 to ~53,000 characters — a 30% reduction. Three targeted edits, no refactoring, code style preserved.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Actual PMO Output
&lt;/h3&gt;

&lt;p&gt;Here is the unedited message the PMO Agent sent to Telegram before applying any fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="err"&gt;✅&lt;/span&gt; &lt;span class="nx"&gt;PMO&lt;/span&gt; &lt;span class="nx"&gt;TacosAragon&lt;/span&gt; &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="nc"&gt;Completed &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt;🔗&lt;/span&gt; &lt;span class="nx"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt; &lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;58&lt;/span&gt;&lt;span class="nx"&gt;min&lt;/span&gt; &lt;span class="nx"&gt;remaining&lt;/span&gt;

&lt;span class="err"&gt;🔴&lt;/span&gt; &lt;span class="nx"&gt;ROOT&lt;/span&gt; &lt;span class="nx"&gt;CAUSE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Menu&lt;/span&gt; &lt;span class="nx"&gt;CSV&lt;/span&gt; &lt;span class="nx"&gt;sent&lt;/span&gt; &lt;span class="kd"&gt;with&lt;/span&gt; &lt;span class="nx"&gt;UUIDs&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="nx"&gt;Gemini&lt;/span&gt;

&lt;span class="nx"&gt;Each&lt;/span&gt; &lt;span class="nx"&gt;call&lt;/span&gt; &lt;span class="nx"&gt;includes&lt;/span&gt; &lt;span class="nx"&gt;full&lt;/span&gt; &lt;span class="nf"&gt;menuRaw &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;118&lt;/span&gt; &lt;span class="nx"&gt;lines&lt;/span&gt; &lt;span class="err"&gt;×&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="nx"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;them&lt;/span&gt;
&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;char&lt;/span&gt; &lt;span class="nx"&gt;UUIDs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="nx"&gt;agua&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;litro&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;10005&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;Agua&lt;/span&gt; &lt;span class="nx"&gt;cebada&lt;/span&gt; &lt;span class="nx"&gt;Litro&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;Bebidas&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;29&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;877&lt;/span&gt;&lt;span class="nx"&gt;e394e&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;

&lt;span class="nx"&gt;Gemini&lt;/span&gt; &lt;span class="nx"&gt;receives&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;000&lt;/span&gt; &lt;span class="nx"&gt;chars&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;UUIDs&lt;/span&gt; &lt;span class="nx"&gt;it&lt;/span&gt; &lt;span class="nx"&gt;cannot&lt;/span&gt; &lt;span class="nx"&gt;use&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="nx"&gt;This&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;
&lt;span class="nx"&gt;noise&lt;/span&gt; &lt;span class="nx"&gt;degrades&lt;/span&gt; &lt;span class="nx"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;following&lt;/span&gt; &lt;span class="nx"&gt;precision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="nx"&gt;Fix&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Filter&lt;/span&gt; &lt;span class="nx"&gt;menuRaw&lt;/span&gt; &lt;span class="nx"&gt;before&lt;/span&gt; &lt;span class="nx"&gt;injecting&lt;/span&gt; &lt;span class="nx"&gt;into&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;menuParaIA&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;linea&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Handle,Nombre,Categoria,Precio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cols&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;linea&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;cols&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nx"&gt;cols&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nx"&gt;cols&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nx"&gt;cols&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]].&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="err"&gt;🟡&lt;/span&gt; &lt;span class="nx"&gt;History&lt;/span&gt; &lt;span class="nx"&gt;at&lt;/span&gt; &lt;span class="mi"&gt;35&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;000&lt;/span&gt; &lt;span class="nx"&gt;chars&lt;/span&gt; &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="nx"&gt;reduce&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;000&lt;/span&gt;
&lt;span class="err"&gt;🟡&lt;/span&gt; &lt;span class="nx"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Flash&lt;/span&gt; &lt;span class="nx"&gt;instead&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nc"&gt;Pro &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt; &lt;span class="mi"&gt;243&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt;🟢&lt;/span&gt; &lt;span class="nx"&gt;Temperature&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="nx"&gt;lower&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;

&lt;span class="nx"&gt;Context&lt;/span&gt; &lt;span class="nx"&gt;estimate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="nx"&gt;instrucciones&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;txt&lt;/span&gt;    &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;000&lt;/span&gt; &lt;span class="nx"&gt;chars&lt;/span&gt;
  &lt;span class="nx"&gt;menu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;csv&lt;/span&gt; &lt;span class="kd"&gt;with&lt;/span&gt; &lt;span class="nx"&gt;UUIDs&lt;/span&gt;  &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;000&lt;/span&gt; &lt;span class="nx"&gt;chars&lt;/span&gt;
  &lt;span class="nc"&gt;History &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;max&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="mi"&gt;35&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;000&lt;/span&gt; &lt;span class="nx"&gt;chars&lt;/span&gt;
  &lt;span class="nx"&gt;Profile&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;internal&lt;/span&gt;    &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="nx"&gt;chars&lt;/span&gt;
  &lt;span class="nx"&gt;TOTAL&lt;/span&gt;               &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="mi"&gt;76&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="nx"&gt;chars&lt;/span&gt;
  &lt;span class="nx"&gt;With&lt;/span&gt; &lt;span class="nx"&gt;UUID&lt;/span&gt; &lt;span class="nx"&gt;fix&lt;/span&gt;       &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="mi"&gt;53&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;000&lt;/span&gt; &lt;span class="nf"&gt;chars &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nx"&gt;Shall&lt;/span&gt; &lt;span class="nx"&gt;I&lt;/span&gt; &lt;span class="nx"&gt;apply&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;fixes&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the last line: &lt;strong&gt;"Shall I apply the fixes now?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The PMO Agent did not execute automatically. It read five files, identified four causes ordered by impact, calculated exact context sizes, wrote the implementation code, and then &lt;strong&gt;paused for confirmation&lt;/strong&gt; before touching a single line of production code.&lt;/p&gt;

&lt;p&gt;This is human-in-the-loop by design, not by accident. The system prompt instructs Claude to present the diagnosis and proposed changes before executing anything above a defined risk threshold. The agent has full capability to apply changes unilaterally — it has &lt;code&gt;edit_file&lt;/code&gt; and &lt;code&gt;restart_process&lt;/code&gt; tools available. The constraint is deliberate.&lt;/p&gt;

&lt;p&gt;After receiving approval, it applied all three fixes, restarted the service, and the entire cycle completed in 123 seconds.&lt;/p&gt;




&lt;h2&gt;
  
  
  How the System Evolved: Before vs After
&lt;/h2&gt;

&lt;p&gt;The initial version had significant issues. Here is the BEFORE/AFTER after one evening of debugging:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MCP servers per execution&lt;/td&gt;
&lt;td&gt;6 (all loaded simultaneously)&lt;/td&gt;
&lt;td&gt;1 (dynamic per project)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;System prompt loading&lt;/td&gt;
&lt;td&gt;Read from disk every time&lt;/td&gt;
&lt;td&gt;Cached with mtime invalidation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Re-entrancy protection&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Guard prevents duplicate execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orphan processes&lt;/td&gt;
&lt;td&gt;Persisted after crashes&lt;/td&gt;
&lt;td&gt;Auto-cleanup on exit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Execution timeout&lt;/td&gt;
&lt;td&gt;3 minutes&lt;/td&gt;
&lt;td&gt;20 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Budget per execution&lt;/td&gt;
&lt;td&gt;$0.50&lt;/td&gt;
&lt;td&gt;$2.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;procesado=1&lt;/code&gt; timing&lt;/td&gt;
&lt;td&gt;Set BEFORE execution&lt;/td&gt;
&lt;td&gt;Set in &lt;code&gt;finally&lt;/code&gt; (AFTER)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;stderr handling&lt;/td&gt;
&lt;td&gt;Ignored&lt;/td&gt;
&lt;td&gt;Used as fallback if stdout empty&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The most critical bug: &lt;code&gt;procesado=1&lt;/code&gt; was being set before Claude ran. If Claude failed or timed out, the message was marked as processed and never retried. Setting it in &lt;code&gt;finally&lt;/code&gt; (always runs, regardless of success or failure) was a one-line fix that changed the entire reliability profile.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'm Building Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;aragon-ops-server&lt;/code&gt; — MCP tools for Loyverse (sales/inventory), Facturama (CFDI fiscal), and SQLite analytics. Claude will be able to query yesterday's sales or generate an invoice directly from a Telegram message.&lt;/li&gt;
&lt;li&gt;Progressive cooldown backoff — 5min → 15min → 60min for repeated failures on the same service&lt;/li&gt;
&lt;li&gt;Daily budget cap across all PMO executions&lt;/li&gt;
&lt;li&gt;Daily ops report generated automatically every morning&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The most valuable insight from this project is not about Claude. It is about the &lt;code&gt;claude -p&lt;/code&gt; flag.&lt;/p&gt;

&lt;p&gt;Most developers who use Claude Code use it interactively. They type, Claude responds, they review. But &lt;code&gt;claude -p&lt;/code&gt; turns Claude Code into a callable function — a reasoning engine you can invoke programmatically, with MCP tool access, session persistence, and your existing subscription credentials.&lt;/p&gt;

&lt;p&gt;No separate API key. No LangChain dependency. No new billing surface.&lt;/p&gt;

&lt;p&gt;The PMO Agent is 9 executions old as of this writing. It has never made a code error. It fixed real bugs, added real features, and restarted real services — autonomously, from a Telegram message, on a phone, while I was doing something else.&lt;/p&gt;

&lt;p&gt;If you are running production services on a Claude Code Max plan and you have not looked at &lt;code&gt;claude -p&lt;/code&gt;, you are leaving a significant capability on the table.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Gumaro González is the owner and CTO of Tacos Aragón, a family restaurant in Culiacán, Sinaloa, México. He builds the Ecosistema Aragón — an AI-powered operations platform — as a solo developer since 2020.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;GitHub: &lt;a href="https://github.com/Gumagonza1" rel="noopener noreferrer"&gt;github.com/Gumagonza1&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;PMO Agent repo: &lt;a href="https://github.com/Gumagonza1/pmo-agent" rel="noopener noreferrer"&gt;github.com/Gumagonza1/pmo-agent&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Live system: &lt;a href="https://gumaro.dev.tacosaragon.com.mx" rel="noopener noreferrer"&gt;gumaro.dev.tacosaragon.com.mx&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>mcp</category>
      <category>ai</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
