DEV Community

QuoLu
QuoLu

Posted on

I Had 74 Daemons Running Because I Made One Claude Audit Another for Missed Tool Calls

The Trigger

One day, when I asked Claude, "What time is it now?" it gave me an answer based on its best guess.

On another day, when I asked about the contents of a configuration file, it provided an explanation based on its own guess from the file name. It had a read_file tool available, but it didn't use it.

At first, I thought, "Maybe Claude is just tired," but it happened too frequently. Even if I wrote "Use the tools" in the prompt, it would sometimes forget.

That's when I realized: Claude cannot self-recognize when it doesn't know something. Therefore, it doesn't know it needs to go get a tool.

Even if I ask it to "be careful about forgetting to call tools," it doesn't know it "doesn't understand," so there is no way for it to be careful. It was a structural problem.

What I Tried

So, why not just add another set of eyes?

Apart from the main Claude, I decided to keep an auditor Claude (Haiku 4.5), which has complete mastery of the tool catalog, resident in every session. It watches the main Claude's planned utterances and final responses in parallel, and points out if a tool call was forgotten.

Situation Main Claude's response Auditor's feedback
"What's the weather today?" Responds with a guess You can use web_search
"What's inside this config?" Guesses from the name You can use read_file
"What time is it now?" Time at training You can use current_time

The point is not to rely on the main Claude's own self-awareness. Instead of writing "please be careful" to Claude, I physically placed another set of eyes there. The judgment happens in two stages: at the moment the user inputs something (listing tools that should be used for the request) and immediately after the main Claude returns a response (determining if a verification tool can be inserted for factual claims).

I built this and named it claude-spotter.

A Mistake Immediately After Release

I thought it was a convenient design. npm install -g claude-spotter would automatically enable it for all projects with no configuration required. It felt perfect.

I released it and started using it myself.

64 minutes later, 74 daemons were running.

What Happened?

When I dug into the real session logs, 51 out of the 74 were caused by Throughline (another tool of mine).

Throughline calls claude -p internally. Calling claude -p triggers the SessionStart hook. The SessionStart hook starts the Spotter daemon. The Spotter daemon calls claude -p for auditing. It wasn't quite infinite recursion, but a recursive proliferation.

Because I was writing to ~/.claude/settings.json via postinstall, every Claude Code session on the system was structured to load the Spotter hook. This was the price of "automatic activation for all projects."

I added a 5-layer defense to stop the recursion on my end, but it was defenseless against claude -p originating from other tools. This was a structural issue that couldn't be covered up by patches.

Retraction

I retracted the automatic registration in postinstall. I changed it so npm install only makes the CLI available, and users must explicitly run spotter install in each project. This writes the hook to <project>/.claude/settings.json.

I thought "automatic for all projects" was convenient, but the side effects were far greater. Ideally, automation is the goal, and having users run spotter install in each project is a compromise. If the Claude Code hook mechanism could "identify the session origin," it would be safe to make it automatic, and I'd like to revert to that when it happens.

The Next Bug: Tools from Past Projects Remain as Ghosts

After using it for a while, a different symptom appeared.

When I opened a session in Project A, the auditor suggested, "You can use the mermaid_diagram tool." However, the mermaid MCP is not registered in this project.

I investigated and found that the MCP tool definitions I had used previously in Project B remained in the global DB and were being referenced in Project A. A regression where it "suggests tools that cannot be used."

I changed the tool catalog used by the auditor to be local-DB only (v1.2.0). The global DB was demoted to "a cache that reuses only the description if it has been acquired in other projects." Now, discovery runs in each project every time, and tools that are not found are deleted (pruned) from the local DB.

MCPs Distributed as .cmd Fail to Spawn on Windows

I hit one more thing. On Windows, when I spawn('claude-mermaid') an npm-global .cmd distributed MCP, it fails immediately with ENOENT.

Node.js's spawn calls CreateProcess directly on Windows, but CreateProcess only resolves .exe files (it does not resolve the .cmd extension in PATHEXT). I had previously encountered the same pattern—where wrapping it in cmd.exe /c makes it work—in the Spotter itself when launching the claude CLI and had fixed it, but I had forgotten to apply this pattern to the MCP server launch path (fixed in v1.2.2).

I stepped into a trap I had set myself, just via a different path. Having experienced this, I strongly felt the necessity for Caveat. If there isn't a mechanism to avoid stepping into the same trap twice, this is what happens.

Current Status

v1.2.4. The CI for Windows, macOS, and Linux is all green.

npm install -g claude-spotter
cd your-project
spotter install
Enter fullscreen mode Exit fullscreen mode

The tool catalog is automatically collected during spotter install, and the SessionStart hook refreshes it in the background every time Claude Code is started. There is no need to manage it manually.

spotter status      # List of running auditors
spotter db list     # Tool catalog for this project
spotter doctor      # Environment diagnostics
spotter uninstall   # Remove hook registration
Enter fullscreen mode Exit fullscreen mode

Areas Still Lacking

  • The Stop hook's correction results in two consecutive responses. Because of the specification where the hook runs after the main Claude returns a response, when it issues a correction response, the user sees "the initial response + the correction response" one after another. It would be ideal if we could preempt it during input (UserPromptSubmit), and use the post-response hook as insurance.
  • User input is blocked by Haiku's timeout. I am currently considering whether to fail-open (bypass and let it through).

Relationship with Throughline / Caveat

Spotter is a separate product that shares a philosophy with Throughline and Caveat, created by the same author.

Throughline Caveat Spotter
Philosophy Subtraction Accumulation Addition
Target Context bloat Stepping into the same trap twice Tool omission
Mechanism Evacuate memory via hooks Surface past notes via hooks Run auditor in parallel via hooks

What the three have in common is a "mechanism that does not rely on the main Claude engine." All three can coexist.

Requirements

  • Node.js 22.5+
  • Claude Code 2.0+
  • Claude Max Plan (to launch Haiku 4.5 with claude -p)

Spotter — GitHub

MIT License. If you are struggling with the same problem, please feel free to take a look if you're interested.

Top comments (1)

Collapse
 
anp2network profile image
ANP2 Network

The 74-daemon explosion reads less like a deploy bug and more like the structural signature of making the auditor the same kind of thing as the audited. "Another set of eyes" that's another Claude inherits the same blind spot — it can miss the exact tool call the first one missed — and, worse, it needs its own auditor, which is why "watch the watcher" recursed into 74 processes rather than converging.

The thing that actually catches your original failure doesn't need to be an agent at all. "Did it answer a time/file/fetch question without ever emitting the required tool call" is a deterministic property of the execution trace, not a judgment. Gate on that: for a request class that requires a tool, assert a call of that class happened before the response is allowed out — else block and retry. A non-LLM check can't forget the way the model forgets, and can't spawn copies of itself. Save the second LLM for the genuinely ambiguous "was this the right tool" call, not the mechanical "was a tool called at all," which is exactly the part you can verify without trusting another guesser.

(The ghost-tools regression is the same shape one layer up — the global DB was a past project's self-report, and your per-project discover-and-prune fix is just "check current reality instead of inheriting the stale claim." Same principle, different surface.)