DEV Community

UB3DQY
UB3DQY

Posted on

Remote-WSL broke my AI agent hooks with one malformed cwd

I spent most of this week debugging what looked like a flaky hook pipeline.

The project itself is simple on purpose: a local, markdown-first knowledge base that I use with coding agents. Hooks capture session output, a small filter decides what is worth keeping, and everything stays in git. No separate backend, no database to babysit, no infrastructure just because a text-heavy workflow could be turned into infrastructure.

That simplicity is exactly why the bug annoyed me so much.

The failure looked like an application problem. It smelled like an application problem. It even produced the kind of vague symptoms that make you think, “great, another intermittent pipeline bug.”

It was not an application problem.

It was one malformed working directory.

What I thought I was debugging

At first, I thought I was dealing with one recurring failure in my capture pipeline.

Sometimes a hook would appear to fail. Sometimes downstream logging would be empty. Sometimes a task would complete, but the surrounding automation would look half-dead. It was inconsistent enough to feel flaky and consistent enough to feel real.

That is an especially bad combination.

I started by doing the usual careful thing: grouping similar failures together and checking how often they happened.

That worked for about ten minutes.

Once I looked at timing, the nice tidy picture fell apart. Some failures happened almost immediately. Others took much longer. Same surface symptom, completely different execution pattern.

That was the first sign that I might be flattening two different problems into one label.

Then I went digging through the SDK layer and ran into a second problem: the tooling was not especially generous with useful error details. In a couple of places I could confirm that something had failed, but not why. The exact traceback or stderr I wanted either was not there or was being abstracted away into something much less helpful.

At that point I stopped treating this as a Python problem and started treating it as an environment problem.

That turned out to be the right move.

The setup that made it visible

I use two agents against the same repository:

  • Claude Code inside VS Code
  • Codex in parallel for implementation and verification

That setup is productive, but it puts real pressure on local tooling. You are suddenly relying on editors, terminals, shells, path translation, process spawning, and hook execution to behave consistently across a stack that spans Windows and Linux at the same time.

Naturally, I decided to make it more convenient.

I wanted a cleaner dual-window workflow inside VS Code instead of bouncing between an editor and a separate terminal. That pushed me deeper into VS Code Remote-WSL and duplicated workspace setups.

That is where the really confusing symptom showed up.

Codex could successfully answer a simple prompt, but the UI would still show hooks as failed.

That combination should immediately make you suspicious.

If the task result exists, but the hooks around it are marked failed and your own logging pipeline shows no fresh entries, then one of two things is true:

  1. the hook command really is starting and dying very early, or
  2. the command is never starting in a sane execution context to begin with.

The second explanation was uglier, but it fit the evidence better.

Manual tests made things weirder, not clearer

I checked the obvious suspects:

  • hook config
  • executable paths
  • shell path
  • Python runner
  • manual execution of the same command
  • exit code
  • runtime

Everything looked healthy.

The hook command worked perfectly when I ran it manually.

That made the problem harder, not easier. A command that fails only when launched through an editor integration is usually telling you that the command itself is fine and its environment is not.

So I stopped staring at the hook script and went looking for process-level clues on the Codex side.

That was when I realized I had been looking in the wrong place for logs. I expected plain text logs. The active event data I needed was actually in a SQLite log store.

Once I queried that, the whole thing cracked open.

The line that explained the whole day

Inside the recorded turn data, the working directory looked like this:

/mnt/c/.../Microsoft VS Code/e:\work\my-project
Enter fullscreen mode Exit fullscreen mode

That path is nonsense.

It is a broken hybrid:

  • the VS Code install directory on the Windows side
  • a raw Windows-style workspace path

It is neither a valid WSL workspace path nor a valid normal working directory for a Linux-side process.

And once I saw it, the rest of the symptoms stopped being mysterious.

The issue was not that my hooks were unreliable.

The issue was that, in this Remote-WSL setup, the VS Code extension was handing Codex a malformed cwd.

Instead of turning something like:

E:\work\my-project
Enter fullscreen mode Exit fullscreen mode

into:

/mnt/e/work/my-project
Enter fullscreen mode Exit fullscreen mode

something in the chain appeared to be combining the raw Windows path with the wrong base first.

One bad cwd is enough to poison an entire process tree.

Once child processes inherit it, you start getting misleading secondary failures:

  • hooks reported as failed even though the commands are valid
  • subprocess behavior that differs from manual shell execution
  • empty downstream logs because the real work never starts in a usable context

That was exactly what I was seeing.

Why this bug was so misleading

This is my least favorite kind of tooling bug: the kind that breaks at the seams.

Nothing explodes cleanly.

The agent still answers.

The UI still looks alive.

The hook command still works in isolation.

The repository still exists where you expect it to exist.

Only one inherited bit of process state is wrong, and that is enough to make the system feel haunted.

From the outside, it looks like a flaky automation problem.
From the inside, it is just a bad path string.

That difference matters, because it changes what you should inspect first.

If a command works manually but fails only through an editor or agent integration, do not immediately assume the logic is wrong. Compare launch context first:

  • cwd
  • shell
  • environment variables
  • path translation
  • parent process

I should have done that earlier.

The workaround was almost embarrassingly simple

Once I knew what was broken, the local workaround was not clever at all:

do not run Codex through the VS Code extension in that setup.

Run it directly from a normal WSL shell in the project root:

cd /mnt/e/work/my-project
codex
Enter fullscreen mode Exit fullscreen mode

Same machine. Same repository. Same hooks. Same scripts.

Different launch path.

And in that mode, everything immediately got boring again, which is exactly what you want from tooling:

  • clean cwd
  • hooks marked completed
  • logging pipeline resumes
  • downstream capture behaves normally

That was the moment I realized I had nearly talked myself into a much bigger solution for a much smaller problem.

At one point I was already mentally drifting toward “maybe I should move more of this workflow to a server” or “maybe the local-first design is too fragile.”

Nope.

The local-first design was fine.
The markdown-first architecture was fine.
The scripts were fine.
The hooks were fine.

The working directory was wrong.

What I changed after that

I stopped trying to force the elegant version of the workflow.

My stable setup now is much more boring:

  • Claude Code inside VS Code
  • Codex in a separate WSL terminal
  • both pointing at the same repository
  • shared append-only logs
  • no editor-managed cwd surprises

It is less stylish than the version I was trying to build.
It is also dramatically more reliable.

And honestly, that feels like the right ending for this story.

I spent a day chasing what looked like a deep pipeline bug.
It turned out that the best fix was to put the CLI tool back in a terminal.

What I took away from it

Three things.

First: if multiple failures share a label, that does not mean they share a cause.

Second: editor integrations often fail in ways that look like application bugs when they are really process-launch bugs.

Third: if you are debugging anything that touches Windows, WSL, and an editor extension at the same time, inspect cwd much earlier than feels necessary.

One malformed working directory was enough to waste an entire day.

I would rather somebody else not lose the same one.

Top comments (0)