DEV Community

Olivia Craft
Olivia Craft

Posted on • Originally published at gist.github.com

CLAUDE.md for Python: 14 Rules That Make AI Write Clean, Idiomatic Code

A drop-in CLAUDE.md for Python projects. Place it at the repository root so AI coding assistants (Claude Code, Cursor, GitHub Copilot, Aider) follow your house rules before they touch a .py file.

Targets Python 3.11+, type-checked with mypy --strict or pyright, tested with pytest, packaged with the src/ layout.


Rule 1: Type hints on every function signature — no exceptions

AI ships untyped helpers because untyped Python "just runs". Three months later you are reading def process(data, opts=None) and you don't know what either argument is.

BAD

def process(data, opts=None):
    out = []
    for x in data:
        if opts and opts.get("upper"):
            x = x.upper()
        out.append(x)
    return out
Enter fullscreen mode Exit fullscreen mode

GOOD

from collections.abc import Iterable

def process(data: Iterable[str], *, upper: bool = False) -> list[str]:
    return [x.upper() if upper else x for x in data]
Enter fullscreen mode Exit fullscreen mode

Why: Types are documentation the type checker enforces. mypy --strict catches the bug before it reaches the test suite.


Rule 2: dataclass (or pydantic.BaseModel) for structured data — never a dict with string keys

AI defaults to {"user_id": 1, "name": "..."} because it scraped a decade of tutorials. The dict has no schema, no autocomplete, and silently accepts typos like usr_id.

BAD

def make_user(name, email):
    return {"name": name, "email": email, "active": True}

u = make_user("Ada", "ada@example.com")
print(u["emial"])   # KeyError at runtime, not import time
Enter fullscreen mode Exit fullscreen mode

GOOD

from dataclasses import dataclass

@dataclass(frozen=True, slots=True)
class User:
    name: str
    email: str
    active: bool = True

u = User(name="Ada", email="ada@example.com")
Enter fullscreen mode Exit fullscreen mode

Why: A dataclass is a typed contract; typos and missing fields fail at construction, not at the database boundary three layers down.


Rule 3: pathlib.Path for filesystem work — never os.path string concatenation

os.path.join(base, "data", filename) is the 2014 idiom AI keeps emitting. It returns a string, drops platform separators on the floor, and forces every helper to re-parse the path.

BAD

import os

def load(base, name):
    full = os.path.join(base, "data", name)
    if not os.path.exists(full):
        raise FileNotFoundError(full)
    with open(full) as f:
        return f.read()
Enter fullscreen mode Exit fullscreen mode

GOOD

from pathlib import Path

def load(base: Path, name: str) -> str:
    target = base / "data" / name
    if not target.exists():
        raise FileNotFoundError(target)
    return target.read_text(encoding="utf-8")
Enter fullscreen mode Exit fullscreen mode

Why: Path is an object with methods (.exists(), .read_text(), .with_suffix()); strings are not. Every os.path call is a missed method.


Rule 4: Context managers for every resource — no manual close(), no try/finally ceremony

Files, locks, DB connections, HTTP sessions, subprocesses, temp dirs — all of them have __enter__/__exit__. AI still writes f = open(...); f.close() and leaks file descriptors when an exception fires between the two.

BAD

f = open("data.json")
data = json.load(f)
f.close()                  # never reached if json.load raises
Enter fullscreen mode Exit fullscreen mode

GOOD

from pathlib import Path
import json

with Path("data.json").open(encoding="utf-8") as f:
    data = json.load(f)
Enter fullscreen mode Exit fullscreen mode

Why: with guarantees cleanup on normal exit AND on exceptions; manual close() does not.


Rule 5: No bare except: — catch the exception type you mean, and re-raise what you don't

except: (or except Exception: at module top level) swallows KeyboardInterrupt, SystemExit, and the bug you were supposed to fix. AI uses it because it makes the linter quiet.

BAD

try:
    user = fetch_user(uid)
except:
    user = None    # which error? a 404? a network blip? a bug in fetch_user?
Enter fullscreen mode Exit fullscreen mode

GOOD

import logging
from myapp.errors import UserNotFound

log = logging.getLogger(__name__)

try:
    user = fetch_user(uid)
except UserNotFound:
    user = None
except TimeoutError:
    log.warning("fetch_user timed out for uid=%s", uid)
    raise
Enter fullscreen mode Exit fullscreen mode

Why: Bare except hides bugs and breaks Ctrl-C. Catching specific types makes intent explicit and lets unexpected errors bubble up where they can be fixed.


Rule 6: No mutable default arguments — use None and assign inside

def f(items=[]): items.append(1); return items returns [1], then [1, 1], then [1, 1, 1]. AI writes this every week because it doesn't remember that defaults are evaluated once.

BAD

def add_tag(item, tags=[]):
    tags.append(item)
    return tags

add_tag("a")    # ['a']
add_tag("b")    # ['a', 'b']  — surprise!
Enter fullscreen mode Exit fullscreen mode

GOOD

def add_tag(item: str, tags: list[str] | None = None) -> list[str]:
    tags = list(tags) if tags is not None else []
    tags.append(item)
    return tags
Enter fullscreen mode Exit fullscreen mode

Why: Default values are evaluated at function definition, not at every call — mutable defaults become shared state across calls.


Rule 7: pydantic.BaseModel at every external boundary — never trust raw JSON

The HTTP handler receives a dict from request.json(), passes it three layers down, and crashes on KeyError deep in business logic. AI plumbs the raw dict because validation feels like ceremony.

BAD

def create_user(payload: dict):
    return User(name=payload["name"], age=int(payload["age"]))
Enter fullscreen mode Exit fullscreen mode

GOOD

from pydantic import BaseModel, EmailStr, Field

class CreateUserRequest(BaseModel):
    name: str = Field(min_length=1, max_length=100)
    email: EmailStr
    age: int = Field(ge=0, le=150)

def create_user(req: CreateUserRequest) -> User:
    return User(name=req.name, email=req.email, age=req.age)
Enter fullscreen mode Exit fullscreen mode

Why: Pydantic validates, coerces, and produces a typed object at the boundary; everything inside the boundary can trust its inputs.


Rule 8: Dependency injection via constructor — never import a singleton inside a function

from myapp.db import db in the middle of a service is how you end up with tests that need a real Postgres to run. AI does this because it's the shortest path to "it works on my machine".

BAD

from myapp.db import db    # module-level singleton

def get_user(uid: int) -> User:
    return db.query("SELECT * FROM users WHERE id = %s", uid)
Enter fullscreen mode Exit fullscreen mode

GOOD

from typing import Protocol

class UserRepo(Protocol):
    def get(self, uid: int) -> User: ...

class UserService:
    def __init__(self, repo: UserRepo) -> None:
        self._repo = repo

    def get_user(self, uid: int) -> User:
        return self._repo.get(uid)
Enter fullscreen mode Exit fullscreen mode

Why: Constructor injection makes dependencies visible in the type signature and trivially substitutable in tests — no monkeypatching, no global teardown.


Rule 9: async def end-to-end — never call sync I/O from inside an async function

A single requests.get(...) inside an async def blocks the entire event loop. AI mixes requests and httpx.AsyncClient because both look like HTTP clients.

BAD

import requests

async def fetch(url: str) -> dict:
    return requests.get(url).json()    # blocks the event loop
Enter fullscreen mode Exit fullscreen mode

GOOD

import httpx

async def fetch(client: httpx.AsyncClient, url: str) -> dict:
    response = await client.get(url)
    response.raise_for_status()
    return response.json()
Enter fullscreen mode Exit fullscreen mode

Why: Async only buys concurrency if every I/O call yields. One sync call inside an async coroutine serializes the whole loop.


Rule 10: asyncio.gather with return_exceptions=False and a TaskGroup for structured concurrency

AI fires tasks with asyncio.create_task(...) and never awaits them. The task raises, the exception is logged to stderr, and the parent function returns success.

BAD

async def sync_all(users):
    for u in users:
        asyncio.create_task(sync_one(u))    # fire-and-forget; errors lost
Enter fullscreen mode Exit fullscreen mode

GOOD

import asyncio

async def sync_all(users: list[User]) -> None:
    async with asyncio.TaskGroup() as tg:
        for u in users:
            tg.create_task(sync_one(u))
Enter fullscreen mode Exit fullscreen mode

Why: TaskGroup (Python 3.11+) cancels siblings on the first failure and re-raises an ExceptionGroup — orphaned tasks become impossible.


Rule 11: pytest with fixtures and parametrize — no unittest.TestCase, no test classes

AI writes class TestFoo(unittest.TestCase): def test_bar(self): self.assertEqual(...). Pytest doesn't need any of it; the result is half the lines and twice the readability.

BAD

import unittest

class TestSlugify(unittest.TestCase):
    def test_basic(self):
        self.assertEqual(slugify("Hello World"), "hello-world")

    def test_unicode(self):
        self.assertEqual(slugify("Olá Mundo"), "ola-mundo")
Enter fullscreen mode Exit fullscreen mode

GOOD

import pytest

@pytest.mark.parametrize(
    "raw, expected",
    [
        ("Hello World", "hello-world"),
        ("Olá Mundo", "ola-mundo"),
        ("", ""),
    ],
)
def test_slugify(raw: str, expected: str) -> None:
    assert slugify(raw) == expected
Enter fullscreen mode Exit fullscreen mode

Why: parametrize turns N similar tests into one declarative table; failures report which row failed, not which method.


Rule 12: logging.getLogger(__name__) per module — never print(), never the root logger

print("got user", user) ships to production, fills CloudWatch with noise, and gives the SRE no way to filter it out. AI uses print because the prompt didn't ask for logging.

BAD

def charge(amount: int) -> None:
    print(f"charging {amount}")    # not structured, not leveled, not filterable
Enter fullscreen mode Exit fullscreen mode

GOOD

import logging

log = logging.getLogger(__name__)

def charge(amount_cents: int) -> None:
    log.info("charging amount_cents=%d", amount_cents)
Enter fullscreen mode Exit fullscreen mode

Why: Per-module loggers inherit configuration, support levels and handlers, and produce structured output your log aggregator can parse.


Rule 13: src/ layout with pyproject.toml — never code at the repo root, never setup.py

AI scaffolds projects with mypackage/__init__.py next to tests/ at the repo root. Tests import from the working directory by accident, the install is broken, and pip install -e . imports the wrong thing.

BAD

myproject/
├── mypackage/
│   └── __init__.py
├── tests/
└── setup.py
Enter fullscreen mode Exit fullscreen mode

GOOD

myproject/
├── pyproject.toml
├── src/
│   └── mypackage/
│       └── __init__.py
└── tests/
    └── test_mypackage.py
Enter fullscreen mode Exit fullscreen mode
# pyproject.toml
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "mypackage"
version = "0.1.0"
requires-python = ">=3.11"

[tool.hatch.build.targets.wheel]
packages = ["src/mypackage"]
Enter fullscreen mode Exit fullscreen mode

Why: The src/ layout forces tests to import the installed package, not whatever cwd happens to expose — the same code the user will run.


Rule 14: Composition over inheritance — Protocol for duck typing, not deep class hierarchies

AI loves class PremiumUser(User): class TrialPremiumUser(PremiumUser):. Three months later you're debugging which __init__ ran in what order, and super().method() resolves to a class nobody remembers writing.

BAD

class User:
    def discount(self) -> float: return 0.0

class PremiumUser(User):
    def discount(self) -> float: return 0.1

class TrialPremium(PremiumUser):
    def discount(self) -> float: return 0.05
Enter fullscreen mode Exit fullscreen mode

GOOD

from dataclasses import dataclass
from typing import Protocol

class DiscountPolicy(Protocol):
    def rate(self) -> float: ...

class FlatDiscount:
    def __init__(self, rate: float) -> None:
        self._rate = rate
    def rate(self) -> float:
        return self._rate

@dataclass
class User:
    name: str
    discount: DiscountPolicy

ada = User("Ada", FlatDiscount(0.1))
Enter fullscreen mode Exit fullscreen mode

Why: Composed objects are independently testable and swappable; deep inheritance couples behavior to type identity and breaks the moment requirements change.


Wrapping up

These 14 rules don't replace PEP 8 or the Python docs — they encode the failure modes AI repeats most often in real Python codebases. Type hints over untyped helpers, dataclasses over dicts, pathlib over os.path, context managers over manual cleanup, specific except over bare, no mutable defaults, pydantic at the boundary, DI over import-singletons, async end-to-end, TaskGroup over fire-and-forget, pytest over unittest, per-module loggers over print, src/ layout over root-level packages, and composition over inheritance — that's the difference between Python that ships and Python that gets rewritten in six months.

Drop this file at the root of your repo. The next AI prompt produces Python your future self won't have to apologise for in a code review.

— OliviaCraft · oliviacraft.lat


Want 35+ more production rules across 40+ stacks? → https://oliviacraftlat.gumroad.com/l/skdgt


Original Gist: https://gist.github.com/oliviacraft/8ea9ea2459902e31c5e24da39b534e73

Want 35+ more production rules across 40+ stacks? → https://oliviacraftlat.gumroad.com/l/skdgt

Top comments (0)