The test test_warmup_eval_no_flag_subprocess_exits_0 was running the real warmup eval command, no , dry run, against the live production state file. It had a comment claiming "in memory fallback." That comment was wrong.
The editable install (pip install e) resolves STATE_PATH at import time to wherever the real state.json lives. There is no separate test context, no isolated state, no in memory anything. Every time that test ran, it loaded the live warm up anchor, evaluated it, and wrote back to the same file my launchd job reads in production.
The reason this went unnoticed: the code path the test exercised was a pass through. Warm up state with no updates needed does nothing, so nothing got written back. The test passed, the file was unchanged, nobody noticed the test was running against live data.
Then I landed a fix in 1e1e53d that seeds the dwell anchor on the first warmup evaluate call. Now that code path writes something. The next full test run seeded the live anchor with a test timestamp and immediately violated the invariant that tests do not touch production state. The test was not lying about its intent. It was lying about its isolation.
The fix is straightforward once you see it. Add an X_ENGINE_STATE_PATH environment variable that overrides config.STATE_PATH when set. Default is unchanged so nothing in production is affected. All three warmup eval subprocess tests now point at a tmp_path fixture file instead of the real one. Subprocess tests are the trickiest case here because the subprocess inherits the parent environment and then imports config fresh, so the override has to be in the subprocess environment, not a monkeypatch on the parent process.
I verified it by running the full suite (141 tests) and diffing state.json before and after. Identical byte for byte.
What I would do differently
Add the env override at the same time I write any subprocess test that touches stateful code. The pattern is obvious in retrospect. Subprocess tests shell out to the installed package, which resolves paths against the real install location, not some test scoped mock. If your config has a file path and that path points somewhere real, you need an override mechanism before the test even exists.
Also: read your own test comments skeptically. "In memory fallback" is the kind of comment that gets written once, never verified, and silently becomes a lie the next time a code path changes. If the comment is load bearing, meaning someone might skip adding isolation because they read it and trust it, it needs a test that proves it. An unverified comment about isolation is just technical debt with a nice wrapper.
The real tell was that the pass through path masked the mutation. Tests that only pass because they happen to be a no op are not safe tests. They are time bombs waiting for the first code change that makes the path do something real. 1e1e53d was that change.
Now the suite is actually isolated. A full run can no longer advance, roll back, or halt the live warm up sequence. That is the bar subprocess tests should have been held to from the beginning.
Top comments (0)