AI writes you a working Python script in about ninety seconds. It runs. You move on.
But the script has a long afterlife. It picks up a hardcoded Downloads path. It sprouts a # edit before running!! comment. Someone copies it to script_v2_FINAL.py. And six weeks later a teammate asks to use it, feeds it a slightly different CSV, and it dies with a raw traceback — or worse, it silently writes a half-finished file and exits 0.
That gap between "runs on my machine" and "I'd let someone else run this" — that distance is the actual engineering work. And AI almost never closes it for you, because you never asked. (That's an observation from watching a lot of generated code, not a measured statistic — but I'd bet you've seen it too.)
This post is a teardown. I'm going to take one deliberately typical messy script, name every defect, and walk through exactly what changes to ship it. By the end you'll have a 10-point checklist you can run on your own code today, with zero tools beyond Python and pytest.
The "before" — a script that works exactly once
Here's the real starting script. It totals an expenses CSV by category and flags anything over a limit. It works. That's the trap. (Trimmed in the middle for length; the structure is what matters.)
# expense report script v2 FINAL (working!!)
import csv
INPUT = "/Users/me/Downloads/expenses.csv" # <- edit this before running!!
LIMIT = 500
print("starting...")
rows = []
try:
f = open(INPUT)
r = csv.reader(f)
next(r) # skip header
for row in r:
rows.append(row)
except:
print("error reading file")
for row in rows:
cat = row[1]
amt = float(row[2])
# ...total by category, flag over LIMIT, print a report...
out = open("report.txt", "w")
# ...write each category total...
print("done, wrote report.txt")
Count the landmines:
-
A hardcoded home path (
/Users/me/Downloads/expenses.csv) you must edit in source to use the tool. -
A bare
except:that swallows every error — including bugs in your own code — and then keeps running on an emptyrowslist. -
Positional column access (
row[1],row[2]) that explodes on a missing column and gives no clue which one. -
float(row[2])that crashes the whole run on one malformed cell. -
print()for everything — diagnostics, results, and the "FLAGGED" lines all jammed into stdout together. -
A non-atomic write (
out = open("report.txt", "w")) that leaves a corrupt file if the program dies mid-loop. - No tests, no install path, no
--help.
None of this is exotic. This is what working AI-generated Python looks like. Let's ship it.
The checklist (run this on any script)
This is the rubric I run before code goes to other people. Score each 0–2, twenty points max; be strict, "partially" is a 1. Each item is a Failure → Fix pair so you can skim it.
-
Error handling — Failure: a bare
except:(automatic 0), or raw tracebacks on bad input. Fix: wrap I/O, parsing, and network calls in specific exceptions; failures produce actionable messages and a nonzero exit code. -
Secrets & config — Failure: hardcoded keys, tokens, or home paths. Fix: config comes from arguments or env. Grep your own code for
api_key =,password =,token =before you ship. - Inputs & validation — Failure: the script assumes well-formed input. Fix: check every external input — empty file? missing column? a path with spaces?
-
Logging & observability — Failure:
print()for diagnostics, or total silence on failure. Fix:loggingwith levels; user output separated from debug noise. - Tests — Failure: none (most scripts). Fix: a pytest suite covering the happy path and at least three failure modes, running green.
- Dependency hygiene — Failure: undeclared or unpinned deps, dead imports. Fix: declare dependencies with version bounds.
-
Interface & UX — Failure: values you edit in source to run it. Fix: a real CLI (
--help, exit codes) or a documented API. -
Packaging & install — Failure: "clone it and run
python script.pyand hope." Fix:pip install .works, an entry point is defined, it runs from any directory. - Documentation — Failure: no runnable example. Fix: a README with one-line purpose, install, a copy-pasteable example, and expected output.
-
Portability — Failure:
open()withoutencoding=blows up on a non-UTF-8 file. Fix: always passencoding="utf-8"; state the Python version; verify from a fresh venv.
The one rule that makes this checklist actually useful: every finding must cite a file and line. "Improve error handling" is banned. "open() at line 14 crashes on a missing file" is the standard.
Now the three fixes that matter most.
Fix 1: bare except → specific exceptions + real exit codes
The doctrine is short: catch specific, never broad; catch at the edge, not in the middle. Inner code raises; it doesn't apologize. Only main() translates an error into a message and an exit code — 0 success, 1 runtime failure, 2 usage error — or you break every shell pipeline and cron job built on the script.
class AppError(Exception):
"""Expected, user-facing failure. Message says what to do, not just what broke."""
REQUIRED_COLUMNS = ("date", "category", "amount")
def read_expenses(path: Path) -> list[dict[str, str]]:
if not path.is_file():
raise AppError(f"input file not found: {path}")
with open(path, newline="", encoding="utf-8") as f:
reader = csv.DictReader(f)
missing = [c for c in REQUIRED_COLUMNS if c not in (reader.fieldnames or [])]
if missing:
raise AppError(f"{path} is missing required column(s): {', '.join(missing)}")
return list(reader)
Crucially, main() catches AppError only. An unexpected exception still surfaces loudly with its traceback — because a swallowed crash is undiagnosable, while a loud one tells you exactly what to fix. The output write also became atomic — write to a .tmp file, then rename it into place — so a crash never leaves a corrupt report behind.
Fix 2: print() → logging without breaking your pipes
The distinction nobody gets right: program output is not logging. If the script's job is to print a report to stdout, that stays print(). Logging is the diagnostic narration around it, and it goes to stderr. Mix them and you break script.py > out.csv for every user.
def setup_logging(verbosity: int = 0) -> None:
level = [logging.WARNING, logging.INFO, logging.DEBUG][min(verbosity, 2)]
logging.basicConfig(level=level, handlers=[logging.StreamHandler(sys.stderr)])
Level discipline: per-row detail is DEBUG; "wrote report.csv" is INFO; "skipped 3 rows" is WARNING. Use the lazy %s form so formatting is skipped when the level is off.
Fix 3: loose script → installable, tested CLI
The hardcoded INPUT path became an argparse CLI with --help, --limit, --output, --force, -v, and --version. The loose file became a src/ layout package with a pyproject.toml entry point:
[project.scripts]
expense-report = "expense_report.cli:main"
That's what turns "clone it and run python script.py and hope" into a real command. I verified the whole chain in a fresh virtual environment before writing this:
python3 -m venv .venv && .venv/bin/pip install ".[dev]"
.venv/bin/python -m pytest # 16 passed
.venv/bin/expense-report --help
All sixteen tests passing, clean install, runs from any directory. Evidence over confidence — that's the whole point.
Run it on your own code
That checklist is yours; the worked example above is the whole method. Audit one of your "finished" scripts against the ten categories, fix the blocking issues first, and prove each fix with a failure you can actually trigger.
If you'd rather your AI assistant enforce this loop instead of doing it by hand, I packaged the discipline as eight Claude Code skills — the scored audit (ship-check), plus harden-errors, add-logging, make-cli, add-tests, package-it, write-readme, and a release-prep gate — with the full before/after sample project. Full disclosure: I built it, and it's a paid kit ($19) at jackiecole.gumroad.com/l/lcscdf. The checklist in this post stands on its own either way.
Top comments (0)