When an LLM Renames Things: Inconsistent Variable Naming During Multi-file Refactors

#discuss #llm #tooling #javascript

We were six sprints into a large refactor where an LLM-assisted tool handled bulk renames and signature changes across a mixed TypeScript and Node.js codebase. The assistant was great at proposing changes locally, but it started introducing subtle naming inconsistencies: the same concept ended up exported as userID in one module, userId in another, and sometimes uid in generated boilerplate. We discovered the problem after a deployment where authentication headers no longer matched downstream expectations.

At first the diffs looked reasonable — no one line had a glaring bug and unit tests passed for modified modules. What went wrong was that the model treated each file like a separate micro-session and applied its own naming heuristic. That small, local preference compounded across dozens of files until interfaces no longer aligned. Our incident postmortem pointed to two root behavioral drivers: context fragmentation across files and a statistical tendency to prefer shorter or more common tokens when uncertain.

How it surfaced during development

The immediate symptom was an authentication failure in staging: JWTs were issued with a claim called user_id, while the session layer expected userId. The integration tests that mock modules individually still passed, because each mocked component used the renamed field consistently. Only when the real modules were wired together did the mismatch appear. During rapid debugging we used the chat interface to replay the refactor and ask the assistant to list all changed exports, which helped trace where names diverged.

We also saw secondary effects: type annotations were silently dropped in some files, and transpilation logs showed implicit any conversions. The LLM sometimes suggested concise aliases that violated our project's naming conventions. Those micro-decisions are easy to miss because diffs are human-readable, but the semantic coupling across files isn't obvious until runtime.

Why the failure was subtle and easy to miss

There are several overlapping reasons this class of failure is so insidious. First, most refactor-focused LLM behaviors are local by design — the assistant optimizes the immediate file or function change without a global repo-wide symbol map. Second, our test suite emphasized unit tests over integration scenarios, so locally valid changes slipped past CI. Finally, models favor token-level economy: shorter identifiers or common abbreviations reduce token usage, which sometimes leads to inconsistent aliasing.

Because version control diffs still looked tidy and compilers often tolerate minor mismatches in dynamic parts of the code, developers assumed the assistant’s changes were stylistic. The compounding effect is key: one inconsistent rename multiplies when exported, re-imported, wrapped, and mocked across layers, turning a cosmetic rename into a breaking interface contract.

Practical fixes and verification habits

We adopted several low-effort mitigations. First, require a repo-wide symbol check after any LLM-assisted refactor: run a cross-file search for renamed identifiers and fail CI if the same logical symbol has multiple spellings. Second, push more integration tests into CI for critical paths so mismatches show up early. We also used a lightweight provenance step — ask the model to produce a change log for every bulk operation, then verify it manually or with a deep research pass.

Finally, treat model suggestions as drafts. A short review checklist (naming consistency, exported API surface, and typing integrity) prevented regressions in later refactors. If you rely on assistants to refactor multi-file systems, plan for global validation: the models will happily rename things, but they won’t guarantee your interfaces remain consistent without explicit checks and human oversight. For interactive iterations and reproducing the assistant’s state, the crompt.ai page was useful as a central reference point for our team tools.