signalscout

Posted on Apr 16

I Spent Two Days Debugging My Agent Stack. The Fix Was npm update.

#programming #devops #ai #debugging

I Spent Two Days Debugging My Agent Stack. The Fix Was `npm update`.

A forensic investigation into how Codex CLI v0.50.0 quietly broke everything — and the 1,886 versions I skipped by not checking.

The Crime Scene

I run a multi-agent stack. OpenClaw orchestrates, Codex writes code, Gemini/Groq/DeepSeek handle the cheap inference, and the whole thing talks to itself through MCP (Model Context Protocol). It's either beautiful or terrifying depending on how you feel about autonomous systems. Most days, it works.

Last Tuesday, it stopped working.

Not dramatically — there was no stack trace, no segfault, no red alert. The kind of failure where you stare at logs for four hours before realizing the patient has been dead since morning. Codex sessions were silently dropping tool calls. MCP handshakes were timing out. The agent stack would spin up, do 40% of the work, then... nothing. No error. Just vibes.

I did what any reasonable person does: I blamed the LLM provider.

The Investigation

Here's the thing about debugging a system where five different AI models talk to each other through three protocol layers: everything is a suspect. My first 12 hours looked like this:

Hour 1-3: "It's definitely Groq's rate limits."
Nope. Switched to Gemini. Same behavior.

Hour 3-6: "MCP config must be wrong."
Rewrote my MCP server config. Twice. Compared against the docs character by character. Deployed. Same behavior.

Hour 6-9: "Maybe OpenClaw's routing is broken after the last update."
Filed two GitHub issues (#64085, #64086). Wrote detailed reproduction steps. Drew architecture diagrams. The maintainers were very polite about it.

Hour 9-11: "Let me check the Codex cache database."
Opened ~/.codex/logs_2.sqlite. Found 2,026 sessions. Scrolled through. Everything looked normal. The client_version field said 0.120.0. I nodded and moved on.

Hour 11: "Wait."

The Moment

I don't remember exactly what made me type it. Muscle memory, probably. Or divine intervention.

$ codex --version
0.50.0

I stared at the terminal for about ten seconds.

Then I stared at the cache database entry that said 0.120.0.

Then I ran:

$ which codex
/home/yin/.npm-global/bin/codex

$ ls -la $(which codex)
codex -> ../lib/node_modules/@openai/codex/bin/codex.js

$ npm list -g @openai/codex
└── @openai/codex@0.120.0

Huh. npm says 0.120.0. The binary says 0.50.0. The cache says 0.120.0. Three different answers from one tool.

What I had was a partially-updated installation where the npm package metadata had been updated but the actual binary was still running from a cached older version. The kind of bug you create by running npm install -g at 2 AM and not noticing the postinstall script failed.

The Autopsy: What 1,886 Versions Changed

I was curious. How far behind was I, really?

$ npm view @openai/codex versions --json | python3 -c "
import json, sys
versions = json.load(sys.stdin)
print(f'Total published versions: {len(versions)}')
"
Total published versions: 1886

One thousand, eight hundred, and eighty-six versions. Between my installed v0.50.0 and the current v0.120.0, OpenAI had shipped nearly two thousand releases. That's roughly 26 releases per day. The Codex team does not sleep.

The v0.50.0 lineage tells a story:

0.50.0-alpha.1 — the optimistic beginning
0.50.0-alpha.2 — "we found some issues"
0.50.0-alpha.3 — "we found more issues"
0.50.0 — "ship it, we'll fix it in 0.51"

And then they shipped 0.51. And 0.52. And kept going for eighteen hundred more releases while I sat on 0.50.0 like it was a vintage wine that would appreciate with age.

What Actually Broke

The root cause was MCP protocol compatibility. Between v0.50.0 and v0.120.0, the Codex CLI underwent significant architectural changes:

Typed code-mode tool declarations. v0.120.0 introduced proper TypeScript-style type declarations for tool calls. v0.50.0 was sending untyped tool schemas. Modern MCP servers (including the ones OpenClaw spins up) expected typed declarations and silently dropped the untyped ones.
Core crate extractions. The Codex team extracted core functionality into separate Rust crates. This changed the internal message format in subtle ways that only manifested when Codex talked to external MCP servers (as opposed to its built-in tools).
MCP cleanup fixes. There were literal bug fixes for MCP session management — connection pooling, timeout handling, retry logic. My v0.50.0 was using MCP patterns that had known bugs which were fixed a thousand versions ago.
Richer MCP app support. The newer version supports MCP apps as first-class citizens. My v0.50.0 was treating MCP connections as second-class tool providers, which meant every agent handoff was going through a compatibility shim that occasionally lost messages.

The beautiful irony: my config.toml was perfectly configured.

model = "gpt-5.4"
reasoning_effort = "medium"  
personality = "pragmatic"

[plugins]
gmail = "openai-curated"
github = "openai-curated"

The model migrations from gpt-5 → gpt-5.3-codex → gpt-5.4 were all properly specified. The config was fine. The binary executing that config was from a different geological era.

The Fix

$ npm install -g @openai/codex@latest
$ codex --version
0.120.0

Two seconds. Two seconds to fix what took me two days to diagnose.

The agent stack came back online immediately. MCP handshakes completed. Tool calls went through. Sessions that had been failing at 40% completion started running to 100%. The 2,026 sessions in ~/.codex/sessions/ started growing again.

Timeline of Discovery

Time	Activity	Usefulness
Hour 0-3	Blame Groq	0%
Hour 3-6	Rewrite MCP config	0%
Hour 6-9	File GitHub issues against OpenClaw	0% (but they were well-written)
Hour 9-11	Forensic analysis of SQLite cache	5% (found the version discrepancy clue)
Hour 11	`codex --version`	100%
Hour 11 + 2 sec	`npm install -g @openai/codex@latest`	∞%

Total debugging time: ~24 hours.
Total fix time: 2 seconds.
Ratio: 43,200:1.

Lessons Learned

1. Check the version first. Always.

Before you blame the cloud, blame the config, blame the provider, blame Mercury retrograde — run --version. I know this. I've told junior devs this. I've written it on whiteboards. And I still spent 24 hours not doing it.

2. npm global installs are haunted.

The failure mode here was a partial update: npm's package metadata updated, but the binary didn't get replaced. This is a known class of npm bugs that's existed for a decade. If you run a global npm tool in production (or production-adjacent) workflows, pin it with a version manager or at least verify the binary version matches npm list -g.

3. MCP compatibility is version-sensitive.

MCP is still young. The protocol is evolving fast. Unlike HTTP, where a server from 2015 can talk to a client from 2025, MCP servers and clients need to be within a reasonable version range of each other. When your MCP client is 1,886 versions behind, "reasonable" left the building months ago.

4. Multi-agent stacks amplify version debt.

In a monolith, a stale dependency usually manifests as a clear error. In a multi-agent stack where five services talk through protocol bridges, a stale dependency manifests as mysterious partial failures with no error messages. The debugging surface area is multiplicative.

5. The cache lies.

My SQLite cache said client_version: 0.120.0 because it had been written by a different invocation of Codex (probably through OpenClaw's process spawning, which had its own newer copy). The lesson: cache metadata reflects the last writer, not the current runtime. Always verify at the binary level.

The Broader Point

We're in the era of agent stacks — systems where multiple AI-powered tools coordinate through shared protocols. These stacks are powerful but they have a failure mode that traditional software doesn't: silent degradation. When your REST API client is outdated, you get a 400 error. When your MCP client is outdated, you get a successful handshake that quietly drops half the capabilities.

The tooling will catch up. Version compatibility matrices, protocol negotiation, graceful degradation warnings — it's all coming. But right now, in April 2026, the state of the art is a developer staring at their terminal at 2 AM, typing --version for the thing they should have checked twelve hours ago.

My agent stack is humming now. All 2,026 sessions are flowing. Codex and OpenClaw are best friends again. MCP connections are solid.

And I've added a cron job:

0 9 * * 1 codex --version | mail -s "codex version check" me@example.com

Because I will forget again.

Ryan builds AI agent infrastructure at DreamSiteBuilders.com. He can be found on GitHub shipping tools that solve problems he created for himself.

DEV Community