2-glitch

Posted on Jun 15

I Shipped One Messy Python Script. Here's the 10-Point Checklist That Got It There.

#ai #testing #python #productivity

AI writes you a working Python script in about ninety seconds. It runs. You move on.

But the script has a long afterlife. It picks up a hardcoded Downloads path. It sprouts a # edit before running!! comment. Someone copies it to script_v2_FINAL.py. And six weeks later a teammate asks to use it, feeds it a slightly different CSV, and it dies with a raw traceback — or worse, it silently writes a half-finished file and exits 0.

That gap between "runs on my machine" and "I'd let someone else run this" — that distance is the actual engineering work. And AI almost never closes it for you, because you never asked. (That's an observation from watching a lot of generated code, not a measured statistic — but I'd bet you've seen it too.)

This post is a teardown. I'm going to take one deliberately typical messy script, name every defect, and walk through exactly what changes to ship it. By the end you'll have a 10-point checklist you can run on your own code today, with zero tools beyond Python and pytest.

The "before" — a script that works exactly once

Here's the real starting script. It totals an expenses CSV by category and flags anything over a limit. It works. That's the trap. (Trimmed in the middle for length; the structure is what matters.)

# expense report script v2 FINAL (working!!)
import csv

INPUT = "/Users/me/Downloads/expenses.csv"   # <- edit this before running!!
LIMIT = 500

print("starting...")

rows = []
try:
    f = open(INPUT)
    r = csv.reader(f)
    next(r)  # skip header
    for row in r:
        rows.append(row)
except:
    print("error reading file")

for row in rows:
    cat = row[1]
    amt = float(row[2])
    # ...total by category, flag over LIMIT, print a report...

out = open("report.txt", "w")
# ...write each category total...
print("done, wrote report.txt")

Count the landmines:

A hardcoded home path (/Users/me/Downloads/expenses.csv) you must edit in source to use the tool.
A bare except: that swallows every error — including bugs in your own code — and then keeps running on an empty rows list.
Positional column access (row[1], row[2]) that explodes on a missing column and gives no clue which one.
float(row[2]) that crashes the whole run on one malformed cell.
print() for everything — diagnostics, results, and the "FLAGGED" lines all jammed into stdout together.
A non-atomic write (out = open("report.txt", "w")) that leaves a corrupt file if the program dies mid-loop.
No tests, no install path, no --help.

None of this is exotic. This is what working AI-generated Python looks like. Let's ship it.

The checklist (run this on any script)

This is the rubric I run before code goes to other people. Score each 0–2, twenty points max; be strict, "partially" is a 1. Each item is a Failure → Fix pair so you can skim it.

Error handling — Failure: a bare except: (automatic 0), or raw tracebacks on bad input. Fix: wrap I/O, parsing, and network calls in specific exceptions; failures produce actionable messages and a nonzero exit code.
Secrets & config — Failure: hardcoded keys, tokens, or home paths. Fix: config comes from arguments or env. Grep your own code for api_key =, password =, token = before you ship.
Inputs & validation — Failure: the script assumes well-formed input. Fix: check every external input — empty file? missing column? a path with spaces?
Logging & observability — Failure: print() for diagnostics, or total silence on failure. Fix: logging with levels; user output separated from debug noise.
Tests — Failure: none (most scripts). Fix: a pytest suite covering the happy path and at least three failure modes, running green.
Dependency hygiene — Failure: undeclared or unpinned deps, dead imports. Fix: declare dependencies with version bounds.
Interface & UX — Failure: values you edit in source to run it. Fix: a real CLI (--help, exit codes) or a documented API.
Packaging & install — Failure: "clone it and run python script.py and hope." Fix: pip install . works, an entry point is defined, it runs from any directory.
Documentation — Failure: no runnable example. Fix: a README with one-line purpose, install, a copy-pasteable example, and expected output.
Portability — Failure: open() without encoding= blows up on a non-UTF-8 file. Fix: always pass encoding="utf-8"; state the Python version; verify from a fresh venv.

The one rule that makes this checklist actually useful: every finding must cite a file and line. "Improve error handling" is banned. "open() at line 14 crashes on a missing file" is the standard.

Now the three fixes that matter most.

Fix 1: bare `except` → specific exceptions + real exit codes

The doctrine is short: catch specific, never broad; catch at the edge, not in the middle. Inner code raises; it doesn't apologize. Only main() translates an error into a message and an exit code — 0 success, 1 runtime failure, 2 usage error — or you break every shell pipeline and cron job built on the script.

class AppError(Exception):
    """Expected, user-facing failure. Message says what to do, not just what broke."""

REQUIRED_COLUMNS = ("date", "category", "amount")

def read_expenses(path: Path) -> list[dict[str, str]]:
    if not path.is_file():
        raise AppError(f"input file not found: {path}")
    with open(path, newline="", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        missing = [c for c in REQUIRED_COLUMNS if c not in (reader.fieldnames or [])]
        if missing:
            raise AppError(f"{path} is missing required column(s): {', '.join(missing)}")
        return list(reader)

Crucially, main() catches AppError only. An unexpected exception still surfaces loudly with its traceback — because a swallowed crash is undiagnosable, while a loud one tells you exactly what to fix. The output write also became atomic — write to a .tmp file, then rename it into place — so a crash never leaves a corrupt report behind.

Fix 2: `print()` → logging without breaking your pipes

The distinction nobody gets right: program output is not logging. If the script's job is to print a report to stdout, that stays print(). Logging is the diagnostic narration around it, and it goes to stderr. Mix them and you break script.py > out.csv for every user.

def setup_logging(verbosity: int = 0) -> None:
    level = [logging.WARNING, logging.INFO, logging.DEBUG][min(verbosity, 2)]
    logging.basicConfig(level=level, handlers=[logging.StreamHandler(sys.stderr)])

Level discipline: per-row detail is DEBUG; "wrote report.csv" is INFO; "skipped 3 rows" is WARNING. Use the lazy %s form so formatting is skipped when the level is off.

Fix 3: loose script → installable, tested CLI

The hardcoded INPUT path became an argparse CLI with --help, --limit, --output, --force, -v, and --version. The loose file became a src/ layout package with a pyproject.toml entry point:

[project.scripts]
expense-report = "expense_report.cli:main"

That's what turns "clone it and run python script.py and hope" into a real command. I verified the whole chain in a fresh virtual environment before writing this:

python3 -m venv .venv && .venv/bin/pip install ".[dev]"
.venv/bin/python -m pytest        # 16 passed
.venv/bin/expense-report --help

All sixteen tests passing, clean install, runs from any directory. Evidence over confidence — that's the whole point.

Run it on your own code

That checklist is yours; the worked example above is the whole method. Audit one of your "finished" scripts against the ten categories, fix the blocking issues first, and prove each fix with a failure you can actually trigger.

If you'd rather your AI assistant enforce this loop instead of doing it by hand, I packaged the discipline as eight Claude Code skills — the scored audit (ship-check), plus harden-errors, add-logging, make-cli, add-tests, package-it, write-readme, and a release-prep gate — with the full before/after sample project. Full disclosure: I built it, and it's a paid kit ($19) at jackiecole.gumroad.com/l/lcscdf. The checklist in this post stands on its own either way.

DEV Community

I Shipped One Messy Python Script. Here's the 10-Point Checklist That Got It There.

The "before" — a script that works exactly once

The checklist (run this on any script)

Fix 1: bare `except` → specific exceptions + real exit codes

Fix 2: `print()` → logging without breaking your pipes

Fix 3: loose script → installable, tested CLI

Run it on your own code

Top comments (0)

The "before" — a script that works exactly once

The checklist (run this on any script)

Fix 1: bare except → specific exceptions + real exit codes

Fix 2: print() → logging without breaking your pipes

Fix 3: loose script → installable, tested CLI

Run it on your own code

Fix 1: bare `except` → specific exceptions + real exit codes

Fix 2: `print()` → logging without breaking your pipes