Two fix iterations. Both correct. Both landed in the wrong worktree and were cleaned up before anyone noticed. That is how a three line error handling patch became a P6 recovery item.
The problem
_claude_p shells out to claude p via subprocess.run. When the subprocess times out, Python raises subprocess.TimeoutExpired. Nothing was catching it. The exception propagated up through the drafting layer and crashed the caller with a traceback pointing at subprocess internals instead of anything meaningful about what failed.
Callers should not need to import subprocess just to handle a drafting timeout. They want a RuntimeError with a message they can log and move on.
The fix
try:
result = subprocess.run(cmd, timeout=timeout,...)
except subprocess.TimeoutExpired:
raise RuntimeError(f"claude p drafting timed out after {timeout}s")
Three lines. The tradeoff: TimeoutExpired carries the original command and the timeout value as attributes; RuntimeError does not. I decided that was acceptable. Callers log the message. The message has the timeout duration in it. If retry logic needs a different ceiling, the call site already has that context.
Why it took three attempts
I delegated the first two fixes to free debug workers running in isolated git worktrees. Both workers implemented the change correctly, ran the full test suite, and exited clean. I saw "tests passing" in the task output and assumed the fix had propagated to the main tree.
It had not. The changes existed in throwaway branches inside throwaway worktrees. Neither branch was merged. The main tree was untouched. The fix evaporated twice.
Third attempt: opened the file directly, made the edit, ran the tests, committed. Four minutes.
What I would do differently
Do not delegate a three line patch. The overhead of spinning up a subprocess worker, managing worktree lifecycle, and parsing task output exceeds the cost of the edit itself. Reserve agent dispatch for things that genuinely benefit from parallelism or isolation. A single file fix in a known location does not meet that bar.
Treat agent output and repo state as distinct things. A worker reporting passing tests tells you the logic is correct in its branch. It tells you nothing about your working tree. If you delegate a code change and care where it lands, verify the diff in the target tree before closing the task. Passing in the worker is necessary but not sufficient.
The _claude_p function now raises RuntimeError on timeout with the duration in the message. The upstream catch block logs it and skips the draft. The pipeline recovers cleanly instead of crashing on a subprocess internal.
Top comments (0)