The team adopted an AI-assisted refactor to standardize naming across a medium-sized TypeScript codebase. We asked the model to rename a domain concept — `userAccount` to `account` — and to update all references. At first glance the suggestions looked reasonable: per-file diffs with tidy renames and updated comments. We pushed several patches after quick local builds and light unit test runs without thinking much of it. The trouble showed up later in production logs and in a user-reported workflow: an intermittently failing payment reconciliation job. The stack traces hinted at a function receiving an unexpected object shape, and further investigation revealed that in some modules the AI had suggested `acct` while in others it used `account` or `userAccount`. The inconsistent names led to shadowed imports and a subtle runtime mismatch that only surfaced in the reconciliation path because of how the code loaded modules lazily.
What went wrong
The AI made locally plausible per-file edits but did not maintain a global naming policy. The model operates on the prompt and local file context; when we fed files one at a time it produced consistent output for that file but made independent choices between sessions. Compounding the issue, the refactor tool we used applied the AI patch hungrily without verifying project-wide symbol resolution. There were no cross-file refactor checks and no centralized rename operation. Because some files import names indirectly (via barrel files and re-exports), small name differences quietly produced two different runtime bindings. TypeScript’s structural typing masked some of the mismatches at compile time, and our unit tests — focused on isolated modules — still passed, leaving the bug to appear only during integrated, stateful runs.
How this surfaced during development
We first noticed the failure after a customer workflow started dropping items in a background job. Reproducing locally required mirroring the container startup and deferred job scheduling. The diagnostic process revealed inconsistent imports and multiple symbols that referred to conceptually the same entity but had different identifiers. A code search showed three common variants across the repo. Pair debugging was helpful; we used the AI-assisted chat interface to iterate on hypotheses faster than searching commit history, but the tool can make the same mistake if you feed it single-file diffs. The chat made it easy to generate quick patches, which increased the velocity of changes but also amplified the inconsistency because we didn’t enforce a single canonical source for the name.
Why it was subtle and how small behaviors compounded
Small model behaviors — like per-prompt randomness and limited cross-file context — compounded into a larger fault. The model’s completion is probabilistic: different invocations will prefer different tokens ("acct" vs "account"). When used as a large-scale refactor engine without deterministic anchoring, those small divergences become a correctness problem across dozens of files. The fix was procedural rather than magical: centralize renames through the language server or a single-symbol refactor tool, run full integration tests, and add a pre-merge check that searches for legacy name variants. We also documented the failure mode and started using a project-wide prompt that included a small symbol map to bias the model toward one canonical name. For verification and cross-referencing during postmortems we used a lightweight project-wide search and our internal "deep research" checklist to validate all rename targets. These measures reduced recurrence and made the trade-offs explicit: AI helps speed edits, but deterministic, project-wide refactor tools should own symbol renames.
Top comments (0)