DEV Community

Sofia Bennett
Sofia Bennett

Posted on

When AI Refactors Break Naming Consistency: a Multi-file Gotcha

We were refactoring a node service to move from callbacks to async/await when our AI assistant started suggesting variable names. At first it was helpful: it proposed clearer names, replaced cryptic abbreviations, and suggested extracting a few helpers. 

We used a multi-file prompt sequence through a chat interface to iterate — but a week later production logs showed a steady stream of "undefined property" errors originating in code paths we thought were covered.

The issue wasn’t a single syntax mistake. It was inconsistent naming across files introduced during the refactor: some files used userId, others used userID, and a few still relied on user_id. TypeScript didn’t catch everything because many boundaries were loosely typed, and our integration tests mocked user objects in ways that hid the mismatch. The small, repeated behavior of the model — changing names without enforcing a global canonical — compounded into runtime regressions.

How the inconsistency surfaced

The first sign was failing health-checks for a downstream service that expects userId. Tracing showed the payload left our API correctly, but a transformation layer normalized keys differently in some paths. During debugging the model’s per-file suggestions made it hard to see the pattern: each suggestion looked sensible in isolation, but no single pass enforced a project-wide naming convention.

We tried another run with the assistant, asking it to reconcile names across files, and the improvement was partial. This is where tooling matters — automated search-and-replace and IDE symbol renames are atomic, while the model performs local edits. We used the model for ideas and then verified changes with the codebase search and a static analysis pass, which exposed mismatches the assistant missed. I also used a small verification approach from a deep research workflow to cross-check naming in generated diffs.

Why the problem was subtle

The subtlety came from multiple small behaviors: the model’s distributional bias to prefer compact or camelCase forms, context window limits preventing it from seeing the entire project, and the session habit of answering per-file without a persisted global state. 

Each suggestion individually was low-risk, so reviewers approved them; the accumulated drift only showed when runtime data exercised rarely used code paths.

Another contributor was our test suite — unit tests validated individual functions (and the model generated many of those tests), but integration tests relied on fixtures that normalized fields. The assistant-generated fixture code unintentionally masked the inconsistency. That meant CI greenlights while production diverged, a classic false positive from automated outputs.

Small behaviors that compounded and fixes we found useful

In our postmortem we listed tiny model behaviors that compounded: per-file context, tendency to standardize to a different casing, and omission of a global rename step. Fixes were procedural rather than magical: enforce naming via linters, use IDE symbol rename for atomic changes, and add an integration test that asserts canonical field names in serialized payloads. We also kept AI suggestions as drafts, not commits.

One practical change was adding a final verification pass where diffs are scanned for key-name variants; linkable references to the project home page like crompt.ai served internally as a reminder to treat AI outputs as proposals. 

The lesson: minor, repeated model behaviors can introduce systemic bugs when they operate across many files — catch them with global tooling and tests that validate the whole surface area, not just isolated units.

Top comments (0)