<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: UB3DQY</title>
    <description>The latest articles on DEV Community by UB3DQY (@ub3dqy).</description>
    <link>https://dev.to/ub3dqy</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3875580%2F3dd40e53-66ce-4b1c-bdb9-bd0bdd6ca2f5.jpg</url>
      <title>DEV Community: UB3DQY</title>
      <link>https://dev.to/ub3dqy</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ub3dqy"/>
    <language>en</language>
    <item>
      <title>I thought my AI memory hook was broken. It turned out to be Windows, WSL, uv, and one missing login</title>
      <dc:creator>UB3DQY</dc:creator>
      <pubDate>Mon, 13 Apr 2026 23:29:37 +0000</pubDate>
      <link>https://dev.to/ub3dqy/i-thought-my-ai-memory-hook-was-broken-it-turned-out-to-be-windows-wsl-uv-and-one-missing-login-a6</link>
      <guid>https://dev.to/ub3dqy/i-thought-my-ai-memory-hook-was-broken-it-turned-out-to-be-windows-wsl-uv-and-one-missing-login-a6</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Part of the series &lt;strong&gt;Debugging Claude Agent SDK pipelines&lt;/strong&gt;. One of the layers I'll mention near the end — hidden account-level Gmail / Calendar MCP integrations blocking my subprocesses — deserved its own write-up: &lt;a href="https://dev.to/ub3dqy/hidden-gmail-and-calendar-integrations-quietly-broke-my-claude-sdk-pipeline-18ei"&gt;Hidden Gmail and Calendar integrations quietly broke my Claude SDK pipeline&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I noticed something weird in Codex.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;UserPromptSubmit&lt;/code&gt; kept saying &lt;code&gt;completed&lt;/code&gt;, but &lt;code&gt;Stop&lt;/code&gt; kept saying &lt;code&gt;failed&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you're building a memory tool, that's a bad combination. It means the assistant can still &lt;strong&gt;read&lt;/strong&gt; old context, but it may be failing to &lt;strong&gt;write&lt;/strong&gt; new context back into long-term memory. In other words: it looks smart in the moment, but its memory may be quietly falling apart behind the scenes.&lt;/p&gt;

&lt;p&gt;I assumed this would be a small hook bug.&lt;/p&gt;

&lt;p&gt;It wasn't.&lt;/p&gt;

&lt;p&gt;It turned into one of those debugging sessions where every layer was technically "working" and the system as a whole still wasn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I was building
&lt;/h2&gt;

&lt;p&gt;I'm working on a markdown-first memory system for Claude Code and Codex. The shape is simple enough:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;when a session ends, a hook grabs the transcript&lt;/li&gt;
&lt;li&gt;a background script decides whether the conversation is worth saving&lt;/li&gt;
&lt;li&gt;if it is, it writes a distilled note into a daily log&lt;/li&gt;
&lt;li&gt;later, that gets compiled into wiki pages and injected back into future sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The part that mattered here was the Codex &lt;code&gt;Stop&lt;/code&gt; hook. That's the capture path. If that hook fails, new memory may never make it into the wiki.&lt;/p&gt;

&lt;p&gt;So when I saw &lt;code&gt;Stop failed&lt;/code&gt; in the UI over and over again, I treated it as a real product problem, not a cosmetic one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The first bug was real
&lt;/h2&gt;

&lt;p&gt;The first issue was exactly where I expected it to be: in the hook.&lt;/p&gt;

&lt;p&gt;The parser only understood one transcript shape. Codex was emitting another one. The hook would fire, look at the transcript, fail to extract meaningful context, and then skip capture.&lt;/p&gt;

&lt;p&gt;That part was straightforward to fix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;teach the parser the real Codex transcript shape&lt;/li&gt;
&lt;li&gt;add a fallback for when &lt;code&gt;transcript_path&lt;/code&gt; is missing&lt;/li&gt;
&lt;li&gt;stop using the old turn-count gate and switch to a content-based threshold&lt;/li&gt;
&lt;li&gt;raise the timeout so the hook had room to finish&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After that, things got better. The logs stopped saying &lt;code&gt;SKIP: empty context&lt;/code&gt;, and I started seeing the line I wanted:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Spawned flush.py for session ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At that point I thought I was done.&lt;/p&gt;

&lt;p&gt;I was not done.&lt;/p&gt;

&lt;h2&gt;
  
  
  The second bug was weirder
&lt;/h2&gt;

&lt;p&gt;Now the hook was successfully spawning the downstream capture process, which should have been a win.&lt;/p&gt;

&lt;p&gt;And then it still ended with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;BrokenPipeError: [Errno 32] Broken pipe
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This was one of those bugs that is annoying precisely because it happens &lt;strong&gt;after&lt;/strong&gt; the important part.&lt;/p&gt;

&lt;p&gt;The capture process had already started. Memory might already be on its way to being saved. But the hook still looked failed in the Codex UI, which meant I couldn't trust the system yet.&lt;/p&gt;

&lt;p&gt;The cause turned out to be simple in hindsight:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the hook took longer than the local timeout&lt;/li&gt;
&lt;li&gt;Codex closed stdout&lt;/li&gt;
&lt;li&gt;the hook tried to print its final success JSON into a pipe that no longer existed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So yes, the system was partly working. It just wasn't finishing cleanly.&lt;/p&gt;

&lt;p&gt;That fix was also small: protect the final stdout write against a closed pipe.&lt;/p&gt;

&lt;p&gt;At this point I had fixed the parser bug and the broken pipe. Surely &lt;em&gt;now&lt;/em&gt; the memory pipeline would work.&lt;/p&gt;

&lt;p&gt;Still no.&lt;/p&gt;

&lt;h2&gt;
  
  
  The third bug wasn't in the hook at all
&lt;/h2&gt;

&lt;p&gt;Once I got past the broken pipe, the downstream process itself started failing.&lt;/p&gt;

&lt;p&gt;The hook would spawn &lt;code&gt;flush.py&lt;/code&gt;, and then &lt;code&gt;flush.py&lt;/code&gt; would die with the deeply unhelpful classic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Command failed with exit code 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No useful stderr. No obvious explanation. Just enough information to waste an afternoon.&lt;/p&gt;

&lt;p&gt;This is the moment where I finally stopped assuming I was debugging "the hook" and started treating the whole thing like what it really was: a chain of separate runtimes.&lt;/p&gt;

&lt;p&gt;Because that's what it was.&lt;/p&gt;

&lt;p&gt;Not one program. A chain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Codex UI&lt;/li&gt;
&lt;li&gt;hook runner&lt;/li&gt;
&lt;li&gt;Python process&lt;/li&gt;
&lt;li&gt;subprocess launcher&lt;/li&gt;
&lt;li&gt;WSL boundary&lt;/li&gt;
&lt;li&gt;bundled Claude CLI&lt;/li&gt;
&lt;li&gt;local authentication state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each link could fail for a different reason.&lt;/p&gt;

&lt;p&gt;And that is exactly what had happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  One of the real bugs was hiding in the boundary
&lt;/h2&gt;

&lt;p&gt;At that stage, the most immediate root cause of the &lt;code&gt;exit code 1&lt;/code&gt; failure wasn't inside my Python code at all.&lt;/p&gt;

&lt;p&gt;The Claude CLI inside the WSL runtime wasn't authenticated.&lt;/p&gt;

&lt;p&gt;That was the immediate bug in this part of the investigation. There was still another layer around hidden account-level MCP integrations, but that turned into its own separate story. I wrote that one up separately here: &lt;a href="https://dev.to/ub3dqy/hidden-gmail-and-calendar-integrations-quietly-broke-my-claude-sdk-pipeline-18ei"&gt;Hidden Gmail and Calendar integrations quietly broke my Claude SDK pipeline&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I had already authenticated Claude on the Windows side. Claude Code on Windows was fine. But Codex hooks were spawning work inside WSL, and that runtime had its own separate &lt;code&gt;~/.claude&lt;/code&gt; state.&lt;/p&gt;

&lt;p&gt;So from one side of the system, Claude was logged in.&lt;/p&gt;

&lt;p&gt;From the other side, it wasn't.&lt;/p&gt;

&lt;p&gt;And because the failure was happening in a subprocess several layers down, what bubbled back up was just a generic process failure.&lt;/p&gt;

&lt;p&gt;That was the moment the whole debugging session clicked for me:&lt;/p&gt;

&lt;p&gt;I wasn't dealing with a broken feature. I was dealing with a system that crossed &lt;strong&gt;OS boundaries&lt;/strong&gt;, &lt;strong&gt;process boundaries&lt;/strong&gt;, and &lt;strong&gt;auth boundaries&lt;/strong&gt;, and I was still mentally treating it like one runtime.&lt;/p&gt;

&lt;p&gt;It wasn't one runtime.&lt;/p&gt;

&lt;p&gt;It was several. They just happened to be glued together tightly enough to look like one.&lt;/p&gt;

&lt;h2&gt;
  
  
  And then Windows joined in
&lt;/h2&gt;

&lt;p&gt;While I was cleaning that up, Claude Code on Windows started throwing a completely different kind of error on every hook run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;error: failed to remove file `.venv\\lib64`: Access is denied. (os error 5)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That turned out to be another boundary problem.&lt;/p&gt;

&lt;p&gt;I had Windows-side and WSL-side tooling both touching the same project environment. &lt;code&gt;uv&lt;/code&gt; was trying to be helpful. Windows was trying to be Windows. A POSIX-style &lt;code&gt;lib64&lt;/code&gt; symlink was involved. None of this was improving my mood.&lt;/p&gt;

&lt;p&gt;So now I had two parallel truths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the Codex capture path was broken because WSL auth was missing&lt;/li&gt;
&lt;li&gt;the Claude-side hook launcher was unstable because the shared &lt;code&gt;.venv&lt;/code&gt; state was getting churned across environments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both bugs were real.&lt;br&gt;
Neither bug lived in the same place.&lt;br&gt;
Both looked, from the outside, like "the AI tool is flaky again."&lt;/p&gt;
&lt;h2&gt;
  
  
  The part that actually mattered
&lt;/h2&gt;

&lt;p&gt;Here's the practical lesson I walked away with:&lt;/p&gt;

&lt;p&gt;When a system crosses runtime boundaries, the bug is often not in the place where the symptom shows up.&lt;/p&gt;

&lt;p&gt;The symptom showed up as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Stop failed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The actual causes were spread across multiple layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;transcript shape mismatch&lt;/li&gt;
&lt;li&gt;timeout mismatch&lt;/li&gt;
&lt;li&gt;unprotected stdout write&lt;/li&gt;
&lt;li&gt;missing WSL-side Claude auth&lt;/li&gt;
&lt;li&gt;shared Windows/WSL environment churn&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If I had kept looking only at the final error message, I would have kept "fixing" the wrong layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I changed
&lt;/h2&gt;

&lt;p&gt;I didn't rewrite the system. I just stopped letting it be vague.&lt;/p&gt;

&lt;p&gt;I added enough visibility so each boundary could tell me when it was the one failing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;transcript parsing now understands real Codex output&lt;/li&gt;
&lt;li&gt;the stop hook has a fallback when transcript data is missing&lt;/li&gt;
&lt;li&gt;success output no longer crashes on a closed pipe&lt;/li&gt;
&lt;li&gt;the flush path logs more useful process diagnostics&lt;/li&gt;
&lt;li&gt;I verified the actual runtime where the subprocess was running, instead of assuming it matched the one I was sitting in&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And maybe the most boring but important fix of all:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I authenticated Claude in the runtime that was actually doing the work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not "somewhere on the machine."&lt;br&gt;
Not "the CLI works for me."&lt;br&gt;
The exact runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  The lesson I'll keep
&lt;/h2&gt;

&lt;p&gt;The real lesson wasn't "add more logging."&lt;/p&gt;

&lt;p&gt;It was this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;if your tool crosses process boundaries, OS boundaries, and auth boundaries, you do not have one runtime anymore.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You have a chain of semi-independent runtimes, and each one can fail in its own extremely specific way.&lt;/p&gt;

&lt;p&gt;That sounds obvious when written down. It was much less obvious when I was staring at one repeated &lt;code&gt;Stop failed&lt;/code&gt; message and assuming there had to be one neat root cause behind it.&lt;/p&gt;

&lt;p&gt;There wasn't.&lt;/p&gt;

&lt;p&gt;There were several small, ordinary failures, all stacked on top of each other. That's what made the bug feel slippery.&lt;/p&gt;

&lt;p&gt;And honestly, that is what a lot of debugging looks like in real life. Not one dramatic mistake. Just three or four boring mismatches, each living at a different seam, and all of them combining into one system that feels unreliable.&lt;/p&gt;

&lt;p&gt;Sometimes the hardest part is realizing that one ugly symptom actually belongs to more than one story.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you're building something similar
&lt;/h2&gt;

&lt;p&gt;Don't just test whether the hook runs.&lt;/p&gt;

&lt;p&gt;Test whether:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it runs in the runtime you think it runs in&lt;/li&gt;
&lt;li&gt;it has the credentials you think it has&lt;/li&gt;
&lt;li&gt;it can finish within the timeout you actually configured&lt;/li&gt;
&lt;li&gt;and the final side effect really happens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because "the script executed" is not the same thing as "the system worked."&lt;/p&gt;

&lt;p&gt;And if your tool is supposed to remember things for you, that difference matters a lot.&lt;/p&gt;




&lt;p&gt;I'm building this as part of &lt;a href="https://github.com/ub3dqy/llm-wiki" rel="noopener noreferrer"&gt;llm-wiki&lt;/a&gt;, a markdown-first memory layer for Claude Code and Codex. The part I underestimated wasn't the prompt design or the summarization logic. It was the plumbing around the boundaries.&lt;/p&gt;

&lt;p&gt;Which, in hindsight, is exactly where these systems like to break.&lt;/p&gt;

</description>
      <category>debugging</category>
      <category>python</category>
      <category>ai</category>
      <category>tooling</category>
    </item>
    <item>
      <title>Hidden Gmail and Calendar integrations quietly broke my Claude SDK pipeline</title>
      <dc:creator>UB3DQY</dc:creator>
      <pubDate>Mon, 13 Apr 2026 16:49:42 +0000</pubDate>
      <link>https://dev.to/ub3dqy/hidden-gmail-and-calendar-integrations-quietly-broke-my-claude-sdk-pipeline-18ei</link>
      <guid>https://dev.to/ub3dqy/hidden-gmail-and-calendar-integrations-quietly-broke-my-claude-sdk-pipeline-18ei</guid>
      <description>&lt;p&gt;I lost a few hours to one of those bugs that feels fake when you first describe it out loud.&lt;/p&gt;

&lt;p&gt;My Claude Agent SDK pipeline kept failing with the most generic error possible:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Command failed with exit code 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No useful stderr. No clear traceback. No obvious repro beyond "real sessions fail, synthetic tests sometimes pass."&lt;/p&gt;

&lt;p&gt;At first I thought it was just another boring auth problem. Then I thought it was a subprocess visibility problem. Then I thought it was WSL. All of those were plausible. None of them were the whole story.&lt;/p&gt;

&lt;p&gt;The actual cause was stranger:&lt;/p&gt;

&lt;p&gt;after &lt;code&gt;claude auth login&lt;/code&gt;, my account quietly picked up &lt;strong&gt;account-level Gmail and Google Calendar MCP integrations&lt;/strong&gt; that I had never explicitly enabled, could not see in the Claude web UI, and could not remove from the CLI. Those integrations wanted an interactive Google OAuth flow, and that was enough to break every non-interactive Claude SDK subprocess I was using for automation.&lt;/p&gt;

&lt;p&gt;The workaround was one CLI flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;extra_args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;strict-mcp-config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was the end of the bug.&lt;/p&gt;

&lt;p&gt;Finding it was the hard part.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this showed up
&lt;/h2&gt;

&lt;p&gt;I use Claude Agent SDK inside a small memory pipeline.&lt;/p&gt;

&lt;p&gt;The shape is straightforward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a hook fires at the end of a Codex session&lt;/li&gt;
&lt;li&gt;the hook spawns &lt;code&gt;flush.py&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;flush.py&lt;/code&gt; calls Claude Agent SDK&lt;/li&gt;
&lt;li&gt;the result gets written into a daily markdown log&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is exactly the kind of automation that should be boring once it's set up.&lt;/p&gt;

&lt;p&gt;Instead, real Codex sessions started failing. The hook would fire, the downstream script would start, and then the Agent SDK step would die with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Fatal error in message reader: Command failed with exit code 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was enough to break the memory pipeline completely. No flush, no daily log entry, no durable memory from those sessions.&lt;/p&gt;

&lt;p&gt;And because the error was happening in a subprocess layer, the surface signal was awful. It just looked like "the SDK sometimes fails."&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this was so annoying to debug
&lt;/h2&gt;

&lt;p&gt;There were at least three reasons this bug wasted my time.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The first fix appeared to work
&lt;/h3&gt;

&lt;p&gt;At one point I thought I had solved it just by running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude auth login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And for a moment, it looked like I had.&lt;/p&gt;

&lt;p&gt;Synthetic tests passed. The bundled Claude binary responded. The pipeline looked alive again.&lt;/p&gt;

&lt;p&gt;That "fix" turned out to be fake.&lt;/p&gt;

&lt;p&gt;It worked briefly because the environment had not yet fully populated the auth-related cache state that was about to cause the real failure.&lt;/p&gt;

&lt;p&gt;So I got the worst possible debugging gift: a temporary false victory.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The important error wasn't where I was looking
&lt;/h3&gt;

&lt;p&gt;I had already added stderr visibility around the SDK call, expecting to catch the real CLI failure there.&lt;/p&gt;

&lt;p&gt;That didn't help much, because the Google MCP auth path wasn't producing a nice actionable stderr message.&lt;/p&gt;

&lt;p&gt;The auth prompt was effectively happening on the wrong channel for my diagnostic setup. The SDK stderr callback got nothing useful. &lt;code&gt;ProcessError.stderr&lt;/code&gt; was empty. All I had was the outer shell of the failure.&lt;/p&gt;

&lt;p&gt;From the outside, it still looked like "exit code 1, good luck."&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The integrations were invisible
&lt;/h3&gt;

&lt;p&gt;This was the part that really crossed from "normal debugging" into "what exactly is this system doing?"&lt;/p&gt;

&lt;p&gt;In Claude.ai web settings, I could see the usual visible things. But I could not see any Gmail or Google Calendar integrations anywhere I would expect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;not in &lt;strong&gt;Settings → Connectors&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;not in &lt;strong&gt;Settings → Customize → Skills&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;not in &lt;strong&gt;Customize → Connectors&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And yet on disk, after &lt;code&gt;claude auth login&lt;/code&gt;, I could see this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"claude.ai Gmail"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1776092446619&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"claude.ai Google Calendar"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1776092446683&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That came from:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/.claude/mcp-needs-auth-cache.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the CLI confirmed the same story:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;claude.ai Gmail:           https://gmail.mcp.claude.com/mcp - ! Needs authentication
claude.ai Google Calendar: https://gcal.mcp.claude.com/mcp - ! Needs authentication
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At that point the bug finally started making sense.&lt;/p&gt;

&lt;h2&gt;
  
  
  What was actually happening
&lt;/h2&gt;

&lt;p&gt;Here is the version I wish someone had handed me before I started digging:&lt;/p&gt;

&lt;p&gt;When you authenticate Claude via OAuth, your account may receive &lt;strong&gt;account-level MCP integrations&lt;/strong&gt; as part of its backend claims.&lt;/p&gt;

&lt;p&gt;In my case, that included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;claude.ai Gmail&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;claude.ai Google Calendar&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those integrations were not local project config.&lt;br&gt;
They were not something I had added manually in the repo.&lt;br&gt;
And they were not removable with normal local MCP commands.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;claude mcp remove "claude.ai Gmail"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;returned:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;No MCP server found with name: "claude.ai Gmail"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which makes sense once you realize the CLI isn't managing them as local entries. They are attached at the account/backend layer.&lt;/p&gt;

&lt;p&gt;Then the next failure follows naturally:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;bundled Claude CLI starts inside a non-interactive SDK subprocess&lt;/li&gt;
&lt;li&gt;it checks account-level MCP claims&lt;/li&gt;
&lt;li&gt;it sees Gmail and Calendar need additional Google auth&lt;/li&gt;
&lt;li&gt;it tries to initiate an interactive OAuth flow&lt;/li&gt;
&lt;li&gt;there is no TTY / browser / normal user interaction path&lt;/li&gt;
&lt;li&gt;subprocess stalls or exits with code 1&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That was the whole bug.&lt;/p&gt;

&lt;p&gt;Not my prompt.&lt;br&gt;
Not my retry logic.&lt;br&gt;
Not the summary logic.&lt;br&gt;
Not even the main auth state in the obvious sense.&lt;/p&gt;

&lt;p&gt;Just hidden MCP integrations pulling a subprocess into an auth flow it had no way to complete.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why this matters beyond my repo
&lt;/h2&gt;

&lt;p&gt;This is not really about my particular memory pipeline.&lt;/p&gt;

&lt;p&gt;It matters because subprocess-based Claude SDK automation is a completely normal pattern. People use it for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;capture pipelines&lt;/li&gt;
&lt;li&gt;background summarizers&lt;/li&gt;
&lt;li&gt;hook-triggered analysis&lt;/li&gt;
&lt;li&gt;scheduled jobs&lt;/li&gt;
&lt;li&gt;internal tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of those run in contexts where interactive browser auth is either awkward or impossible.&lt;/p&gt;

&lt;p&gt;If account-level integrations that require fresh OAuth can quietly attach themselves after login, and if they are invisible in the UI, then the failure mode becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;everything looks fine in your normal interactive CLI&lt;/li&gt;
&lt;li&gt;your automation suddenly fails in the background&lt;/li&gt;
&lt;li&gt;the error surface is generic&lt;/li&gt;
&lt;li&gt;and the root cause is not where you would reasonably look first&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a nasty class of bug.&lt;/p&gt;
&lt;h2&gt;
  
  
  The workaround that fixed it
&lt;/h2&gt;

&lt;p&gt;The workaround was to isolate the subprocess from account-level MCP discovery.&lt;/p&gt;

&lt;p&gt;In practice, that meant passing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nc"&gt;ClaudeAgentOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;allowed_tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
    &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;extra_args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;strict-mcp-config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That translates to the bundled Claude binary getting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;--strict-mcp-config
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And once I did that, the subprocess stopped trying to discover or auth those account-level MCP integrations.&lt;/p&gt;

&lt;p&gt;The pipeline started running cleanly again.&lt;/p&gt;

&lt;p&gt;That was the actual fix.&lt;/p&gt;

&lt;p&gt;Deleting the cache file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;rm&lt;/span&gt; ~/.claude/mcp-needs-auth-cache.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;did not really fix anything. It just bought a little time until the state was regenerated.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part I still don't love
&lt;/h2&gt;

&lt;p&gt;The workaround is fine. I'm happy to use it in automation.&lt;/p&gt;

&lt;p&gt;What I don't love is the product behavior that made it necessary.&lt;/p&gt;

&lt;p&gt;From a developer point of view, a few things feel wrong here.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hidden integrations are a bad default
&lt;/h3&gt;

&lt;p&gt;If my account has Gmail and Calendar MCP integrations attached, I should be able to see them in the UI and turn them off.&lt;/p&gt;

&lt;p&gt;Right now, from the outside, it feels like they exist in a shadow layer of account state that can affect automation without being visible where a user would normally manage integrations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Opt-out without visible opt-in is rough
&lt;/h3&gt;

&lt;p&gt;I didn't explicitly wire Gmail or Calendar into this project. Yet they still ended up influencing subprocess behavior after OAuth login.&lt;/p&gt;

&lt;p&gt;That is a surprising default for anyone using Claude as a programmable toolchain component instead of just a chat app.&lt;/p&gt;

&lt;h3&gt;
  
  
  Non-interactive contexts should fail more gracefully
&lt;/h3&gt;

&lt;p&gt;If the process is clearly non-interactive, the CLI should not wander into a user-hostile auth path and then collapse into a generic exit code.&lt;/p&gt;

&lt;p&gt;At minimum, I would want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a clear message saying an MCP integration requires interactive authentication&lt;/li&gt;
&lt;li&gt;the name of the integration&lt;/li&gt;
&lt;li&gt;and ideally a way to skip it automatically in SDK subprocess mode&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I wish the docs said
&lt;/h2&gt;

&lt;p&gt;The one warning I really needed was something like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you use Claude Agent SDK in non-interactive subprocesses, account-level MCP integrations attached through OAuth may trigger additional authentication flows. If those integrations are not fully authenticated, subprocess calls may fail. Use strict MCP config isolation for automation workloads.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That single paragraph would have saved me hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  The short version
&lt;/h2&gt;

&lt;p&gt;If you hit a mysterious Claude Agent SDK subprocess failure with a generic &lt;code&gt;exit code 1&lt;/code&gt;, and your interactive CLI mostly works, check whether hidden account-level MCP integrations are involved before you start rewriting your own code.&lt;/p&gt;

&lt;p&gt;In my case, the real sequence was:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I thought the pipeline was unauthenticated&lt;/li&gt;
&lt;li&gt;then I thought stderr visibility was missing&lt;/li&gt;
&lt;li&gt;then I thought WSL was the root cause&lt;/li&gt;
&lt;li&gt;and the real issue turned out to be hidden Gmail and Calendar MCP integrations trying to force Google OAuth inside a non-interactive subprocess&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fix was not glamorous:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;extra_args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;strict-mcp-config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But at least now I know what class of failure I was dealing with.&lt;/p&gt;

&lt;p&gt;And honestly, that's often the hardest part.&lt;/p&gt;




&lt;p&gt;I'm building this as part of a markdown-first memory system for Claude Code and Codex. I keep expecting the hard parts to be prompt design or summarization quality. More often than not, the real work is figuring out which invisible layer is making the obvious layer look broken.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>debugging</category>
      <category>claude</category>
    </item>
    <item>
      <title>How I shipped a broken capture pipeline and didn't notice for 3 days</title>
      <dc:creator>UB3DQY</dc:creator>
      <pubDate>Sun, 12 Apr 2026 23:00:36 +0000</pubDate>
      <link>https://dev.to/ub3dqy/how-i-shipped-a-broken-capture-pipeline-and-didnt-notice-for-3-days-4b1h</link>
      <guid>https://dev.to/ub3dqy/how-i-shipped-a-broken-capture-pipeline-and-didnt-notice-for-3-days-4b1h</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;I built a hook-based capture system for Claude Code. Every session-end was supposed to get summarized and written into a daily log. My &lt;code&gt;doctor.py&lt;/code&gt; gate said &lt;code&gt;13/13 PASS&lt;/code&gt;. Lint was clean. CI was green on every commit.&lt;/p&gt;

&lt;p&gt;Then a user asked a simple question: &lt;em&gt;"Is the wiki actually capturing this conversation?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I checked the log. &lt;strong&gt;57% of my recent sessions had been silently dropped&lt;/strong&gt; for three days. The gate never told me. Every smoke test was passing. The system was broken in the one place no test was actually looking.&lt;/p&gt;

&lt;p&gt;This is what happened, how I caught it, and what I changed so I would not miss the same kind of bug again.&lt;/p&gt;




&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;The project is a memory system for Claude Code and Codex CLI. A session-end hook reads the transcript, hands it off to a background Python script, that script asks the Agent SDK whether the conversation is worth saving, and the result gets appended to &lt;code&gt;daily/YYYY-MM-DD.md&lt;/code&gt;. Fairly normal hook plumbing.&lt;/p&gt;

&lt;p&gt;I had a &lt;code&gt;doctor.py&lt;/code&gt; script with 13 smoke checks across the pipeline. &lt;code&gt;session-start.py&lt;/code&gt; produced valid JSON. &lt;code&gt;user-prompt-wiki.py&lt;/code&gt; could look up articles. &lt;code&gt;stop.py&lt;/code&gt; exited cleanly. I had structural lint. I had a green CI gate on every push.&lt;/p&gt;

&lt;p&gt;I had shipped eight commits over two days, each with &lt;code&gt;doctor --quick&lt;/code&gt; green, each with CI passing. I was telling myself the system was in good shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  The moment of doubt
&lt;/h2&gt;

&lt;p&gt;Someone I was working with asked a very simple question: &lt;em&gt;"Just to confirm, is the wiki actually storing this conversation?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I almost said yes immediately. The hooks were wired. Every prompt I sent was coming back with wiki snippets injected at the top. &lt;code&gt;UserPromptSubmit&lt;/code&gt; was clearly doing its job. From the outside, the system looked alive.&lt;/p&gt;

&lt;p&gt;But I have been burned by "it looks alive" enough times that I checked instead of trusting the feeling. I opened &lt;code&gt;scripts/flush.log&lt;/code&gt;, the file where the session-end and pre-compact hooks write their operational log, and scrolled to the recent entries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;2026-04-12 16:36:39 INFO [session-end] SessionEnd fired: session=...
2026-04-12 16:36:39 INFO [session-end] SKIP: only 2 turns (min 4)
2026-04-12 16:39:27 INFO [session-end] SessionEnd fired: session=...
2026-04-12 16:39:27 INFO [session-end] SKIP: only 2 turns (min 4)
2026-04-12 16:42:07 INFO [session-end] SessionEnd fired: session=...
2026-04-12 16:42:07 INFO [session-end] SKIP: only 2 turns (min 4)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was the moment my confidence disappeared.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I was looking at
&lt;/h2&gt;

&lt;p&gt;The hooks &lt;strong&gt;were firing&lt;/strong&gt;. &lt;code&gt;SessionEnd fired&lt;/code&gt; is printed before any filtering happens, so those lines meant the hook chain from Claude Code to my Python script was intact. The wiring was not the problem.&lt;/p&gt;

&lt;p&gt;But then immediately after, on every single recent entry: &lt;code&gt;SKIP: only 2 turns (min 4)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;My session-end code had this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;MIN_TURNS_TO_FLUSH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;

&lt;span class="c1"&gt;# ... later ...
&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;turn_count&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;MIN_TURNS_TO_FLUSH&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SKIP: only %d turns (min %d)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MIN_TURNS_TO_FLUSH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This was supposed to protect against flushing trivial sessions, the "one question, one answer, exit" pattern that is probably not worth archiving. The threshold &lt;code&gt;4&lt;/code&gt; felt reasonable when I wrote it. It felt reasonable when I reviewed it. It passed every test.&lt;/p&gt;

&lt;p&gt;What I had not really internalized was the shape of my own usage. A typical Claude Code session for me is: open terminal, ask one specific question, get one specific answer, close terminal. That is &lt;strong&gt;exactly 2 turns&lt;/strong&gt;. The rule I wrote to skip "trivial" sessions was skipping &lt;strong&gt;my normal session shape&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I ran the numbers over the full log:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SessionEnd fired:       109
Spawned flush.py:        52  (48%)
Skipped (various):       57  (52%)
Most recent skip reason: "SKIP: only 2 turns (min 4)"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Over half the sessions from the last three days had been silently dropped.&lt;/strong&gt; Not edge cases. Not weird corner traffic. Just normal usage. The daily log for those days had looked thinner than it should have been, and I had noticed that in the background, but never chased it because &lt;code&gt;doctor --quick&lt;/code&gt; was green and I trusted it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the gate didn't catch it
&lt;/h2&gt;

&lt;p&gt;The actual bug was trivial. Change a number. That part took no time. The question that mattered was: why did my gate tell me everything was fine while half the data was disappearing?&lt;/p&gt;

&lt;p&gt;Let me walk through what &lt;code&gt;doctor --full&lt;/code&gt; actually tested:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;check_session_start_smoke&lt;/code&gt;&lt;/strong&gt; — runs &lt;code&gt;session-start.py&lt;/code&gt; with an empty JSON input, verifies it prints a valid hook-output JSON. ✅&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;check_user_prompt_smoke&lt;/code&gt;&lt;/strong&gt; — runs &lt;code&gt;user-prompt-wiki.py&lt;/code&gt;, verifies it returns additionalContext with articles. ✅&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;check_stop_smoke&lt;/code&gt;&lt;/strong&gt; — runs &lt;code&gt;stop.py&lt;/code&gt;, verifies it exits cleanly on empty stdin. ✅&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;check_index_freshness&lt;/code&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;code&gt;check_structural_lint&lt;/code&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;code&gt;check_env_settings&lt;/code&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;code&gt;check_path_normalization&lt;/code&gt;&lt;/strong&gt; — the rest of the usual health-check surface.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notice what those tests are really asking. Each one asks: &lt;em&gt;"Does this script run without crashing?"&lt;/em&gt; That is a useful question. It catches real bugs: &lt;code&gt;ImportError&lt;/code&gt; after a refactor, &lt;code&gt;JSONDecodeError&lt;/code&gt; from bad stdin, &lt;code&gt;FileNotFoundError&lt;/code&gt; after a rename. But it is not the question I actually cared about: &lt;em&gt;"Does a real transcript, processed by this chain, end up in the daily log?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That question has three subtly different parts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Does the hook fire when Claude Code ends a session?&lt;/strong&gt; (Yes — I could see it in the log.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Does the hook's filter logic produce a "worth-saving" verdict for realistic input?&lt;/strong&gt; (Turns out: no, because of the bug above.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Does the downstream chain actually write the result to the daily log?&lt;/strong&gt; (Unknown, because step 2 always said no.)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;code&gt;doctor --full&lt;/code&gt; tested a weak version of (1) by running the script with an empty payload. It did not test (2), because that needs a realistic transcript. It did not test (3), because the chain never got that far. Every link passed in isolation, and the chain as a whole was still broken.&lt;/p&gt;

&lt;p&gt;This is the old difference between &lt;strong&gt;smoke tests&lt;/strong&gt; and &lt;strong&gt;end-to-end tests&lt;/strong&gt;. In theory everybody knows it. In a personal tool, it is easy to get lazy about it. You know what the chain is supposed to do, so testing the whole thing can feel redundant. It is not redundant at all. The chain breaks in exactly the places where each individual component still passes its own tiny check.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two things I added to stop this happening again
&lt;/h2&gt;

&lt;p&gt;The code fix itself was boring: replace the turn-based threshold with a content-based one. Short but substantial sessions, two turns and a couple thousand characters of real discussion, now get captured. Tiny sessions, two turns and thirty characters of "ok thanks", still get skipped, but by character count instead of turn count. That is not really the point of the post.&lt;/p&gt;

&lt;p&gt;The interesting part is what I added to &lt;code&gt;doctor.py&lt;/code&gt; afterward, because that is what turns this from a one-off fix into something the project can actually defend itself with.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Observability check: &lt;code&gt;check_flush_capture_health&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This one reads &lt;code&gt;scripts/flush.log&lt;/code&gt; over a rolling 7-day window and summarizes what the capture pipeline has actually been doing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_flush_capture_health&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;CheckResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# parse flush.log, count SessionEnd fired vs Spawned flush.py
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="n"&gt;detail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Last 7d: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;spawned&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session_fired&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; flushes spawned (skip rate &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;skip_rate&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;spawned&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;CheckResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flush_capture_health&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Pipeline appears broken: SessionEnds fired but nothing was spawned.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;skip_rate&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;CheckResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flush_capture_health&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; [attention: high skip rate — consider lowering WIKI_MIN_FLUSH_CHARS]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;CheckResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flush_capture_health&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Important design choice: &lt;strong&gt;this check only FAILs when the pipeline is observably broken&lt;/strong&gt;. If SessionEnds fired but nothing was spawned, that is a correctness bug. It does &lt;strong&gt;not&lt;/strong&gt; FAIL on high skip rate, because skip rate is historical data about past usage, not necessarily a problem with the current code. A fresh clone has no history and should pass. A repo with lots of short sessions may have a high skip rate and still be behaving correctly. Blocking the merge gate on historical observability would be a mistake.&lt;/p&gt;

&lt;p&gt;The check prints an &lt;code&gt;[attention]&lt;/code&gt; marker in the detail line when the skip rate goes above 50%. On the first run in my own repo after I added it, it printed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[PASS] flush_capture_health: Last 7d: 50/121 flushes spawned (skip rate 59%)
       [attention: high skip rate — consider lowering WIKI_MIN_FLUSH_CHARS]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That one line would have saved me three days.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. End-to-end acceptance test: &lt;code&gt;check_flush_roundtrip&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This is the answer to the "why didn't any test catch this?" question. It only runs in &lt;code&gt;doctor --full&lt;/code&gt;, because it is more expensive than the fast smoke checks.&lt;/p&gt;

&lt;p&gt;The test writes a dummy 6-turn transcript, about 2000 characters of realistic content, to a temp file. Then it invokes &lt;code&gt;hooks/session-end.py&lt;/code&gt; as a real subprocess with a realistic hook-input JSON on stdin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;test_session_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doctor-roundtrip-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;transcript_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SCRIPTS_DIR&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doctor-transcript-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;test_session_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# ... write dummy turns ...
&lt;/span&gt;
&lt;span class="n"&gt;hook_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;test_session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doctor-roundtrip&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transcript_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transcript_path&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cwd&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ROOT_DIR&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;env&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;WIKI_FLUSH_TEST_MODE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;proc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;executable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_end_script&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hook_input&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the &lt;code&gt;WIKI_FLUSH_TEST_MODE=1&lt;/code&gt; environment variable. That is the trick. The downstream script, &lt;code&gt;flush.py&lt;/code&gt;, checks it at startup and, if it is set, skips the real Agent SDK call and writes a marker file to a known location instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# In flush.py
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;WIKI_FLUSH_TEST_MODE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;TEST_MARKER_FILE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FLUSH_TEST_OK session=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; ts=...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The test then polls for that marker file with a 15-second timeout, verifies it contains the right session ID, and cleans up. If any link in the chain is broken — if &lt;code&gt;session-end.py&lt;/code&gt; does not spawn &lt;code&gt;flush.py&lt;/code&gt;, if &lt;code&gt;flush.py&lt;/code&gt; fails to import, if the environment is not inherited correctly — the marker never appears and the test fails with a clear message.&lt;/p&gt;

&lt;p&gt;This is an actual end-to-end test, not a smoke test. It exercises the real subprocess invocation, real environment inheritance, real stdin/stdout piping, and real timing. The only thing it fakes is the API call itself, because that would cost money and pollute the real daily log.&lt;/p&gt;

&lt;p&gt;On my machine right now:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[PASS] flush_roundtrip: session-end -&amp;gt; flush.py chain completed in test mode
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If I had had this test two weeks earlier, I would have caught the &lt;code&gt;MIN_TURNS = 4&lt;/code&gt; bug on the first realistic transcript. It would not have needed to be clever. A visible skip where a spawn was expected would have been enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  The lessons, short enough to remember
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Smoke tests are not end-to-end tests, and they do not substitute for one.&lt;/strong&gt; I had nine smoke checks in &lt;code&gt;doctor.py&lt;/code&gt;, and all of them were correct in isolation. None of them ran the actual production chain from a realistic input to a verifiable output. If you have a multi-process pipeline, you need at least one test that exercises the whole thing. It does not have to be fast and it does not have to run on every commit. It just has to exist somewhere meaningful in your gate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Observability is a design choice, not an afterthought.&lt;/strong&gt; My hooks were writing perfectly good operational logs. I just was not reading them, and my gate was not reading them either. Adding a check that summarizes those logs took about forty lines of code and would have turned a silent three-day outage into a visible &lt;code&gt;[attention]&lt;/code&gt; marker from day one. Logs you do not read are not much better than logs you never wrote.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. If a test could have caught the bug, it belongs in the gate — even if adding it feels obvious in hindsight.&lt;/strong&gt; The wrong question is "why didn't I add this on day one?" The better question is "what class of future bugs does this protect me from now?" Hindsight is always perfect about the bug you already know. What you want is &lt;strong&gt;general immunity&lt;/strong&gt; to the class of bugs you just learned about.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you are building similar hook-based systems, the code for the project where this happened is at &lt;a href="https://github.com/ub3dqy/llm-wiki" rel="noopener noreferrer"&gt;github.com/ub3dqy/llm-wiki&lt;/a&gt;. It is a markdown-first memory system for Claude Code and Codex CLI, and both fixes described here — the content-based threshold and the roundtrip test — live in &lt;code&gt;scripts/doctor.py&lt;/code&gt; and &lt;code&gt;hooks/session-end.py&lt;/code&gt;. No API keys, no vector database, and it boots with &lt;code&gt;uv run python scripts/setup.py&lt;/code&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claude</category>
      <category>ai</category>
      <category>debugging</category>
      <category>python</category>
    </item>
  </channel>
</rss>
