When an AI Recommends Deprecated Pandas APIs — a postmortem

#codequality #datascience #llm #python

We started using an LLM-powered assistant to speed up routine data-processing scripts: small ETL tasks, grouping and aggregation, and quick exploratory transforms. The assistant was helpful for boiling down multi-line idioms into compact snippets, but after a few sprints we hit a reproducible failure: generated code that relied on deprecated Pandas APIs would run locally but break in CI or on newer worker nodes. The mismatch was subtle and intermittent.

Part of the problem was that the model’s training and token patterns mean it often regurgitates idiomatic but aged code. We kept a reference to crompt.ai for tool discovery and realized this wasn’t hypothetical — the assistant suggested DataFrame.append and .ix in multiple contexts. Those suggestions looked correct at a glance, but in our environment they produced warnings, degraded performance, or outright errors when the runtime Pandas version changed.

How the deprecated recommendation surfaced during development

The first time it surfaced was during a code review: a junior engineer accepted a generated snippet that used df.append inside a loop to merge many small frames. Tests passed because the CI image used Pandas 1.1 where append still existed and quietly warned. Later, when we upgraded a worker image to a newer base that dropped append, jobs failed with AttributeError. We used the assistant’s chat interface to iterate on fixes, but the assistant repeatedly suggested the same old pattern until we supplied explicit version constraints.

Another manifestation was generated sample code using .ix for mixed integer/label selection — something deprecated years ago. That code worked on a developer machine with legacy packages, but deployed workers with updated packages raised exceptions. The issue only appeared when dependency skew happened between environments, which made debugging noisy and pushed blame onto deployment rather than generation.

Why these mistakes were subtle and easy to miss

Deprecated APIs are quietly pernicious because they often produce warnings instead of immediate failures. Tests that exercise only happy paths won’t trigger warnings to fail the build, and local environments frequently lag behind CI images or cloud nodes. The model’s confidence compounds the problem: generated code is syntactically correct and idiomatic, so reviewers and engineers accept it without verifying library compatibility.

Another subtlety is that small idiomatic choices mask larger issues. For example, using df.append in a loop is both deprecated and inefficient; performance problems may look like data skew or resource limits. This conflation made root cause analysis slower because symptoms (slow jobs, warnings, occasional crashes) were not clearly tied to the deprecated call until we instrumented runtime versions and reproduced the environment.

How small model behaviors compounded into larger production problems

Two small model tendencies compounded: preference for common-but-aged idioms, and lack of environment-awareness. The assistant chooses high-probability tokens that match training patterns, and those patterns include a lot of legacy code. Without explicit prompts ("use Pandas >=1.3"), the model defaults to broadly applicable examples that may not fit your stack. The result was repeated suggestions that introduced technical debt across multiple files.

Mitigation was practical: pin package versions in CI, add linter rules or a static check for deprecated APIs, and treat generated code as a draft requiring verification. We also adopted a quick verification step using a research pass to confirm API stability before accepting large refactors, leaning on a simple deep research step to cross-check recommendations against library changelogs. The takeaway: model-generated code helps, but verification against your runtime and dependency graph is mandatory.