Peyton Green

Posted on Mar 28

Claude Code 2.1.85: My Python Project Setup After Six Months of Daily Use

#python #claudecode #ai #productivity

Claude Code just hit v2.1.85 — it skipped 2.1.82 entirely and pushed three patch versions in roughly 24 hours before promoting 2.1.85 to latest. Whatever was getting fixed, it was getting fixed fast.

I've been using Claude Code as my primary AI coding tool for Python projects for about six months now. The rapid patch cycle is a good moment to write up what I've actually learned — the setup that works, the patterns that save time, and the things that looked useful but weren't.

This isn't a "Claude Code is amazing" post. It's the practical stuff I wish someone had written when I started.

The thing most people skip: CLAUDE.md

Claude Code reads a CLAUDE.md file at the root of your project before every session. Most tutorials mention it briefly and move on. It's actually the highest-leverage configuration you can do.

Without it, Claude Code is making educated guesses about your project. With a good one, it knows exactly how to run your tests, what your linting setup is, which files to avoid, and how you prefer to structure code.

Here's the template I use for Python projects:

# Project: [name]

## Stack
- Python 3.12
- pytest + pytest-cov for testing
- ruff for linting/formatting (replaces black + flake8)
- mypy for type checking (strict mode)
- [any frameworks: FastAPI / SQLAlchemy / etc]

## Commands
- Run tests: `pytest tests/ -x --tb=short`
- Run single test: `pytest tests/test_foo.py::test_bar -v`
- Lint: `ruff check . && ruff format --check .`
- Type check: `mypy src/`
- All checks: `make check`

## Project structure
- `src/[package]/` — main package code
- `tests/` — pytest tests (mirrors src/ structure)
- `scripts/` — one-off scripts, not production code

## Conventions
- Type hints on all public functions
- Docstrings on public classes and non-obvious functions
- Tests use fixtures from conftest.py — check there before creating new ones
- No bare `except:` — always catch specific exceptions

## Don't touch
- `migrations/` — hands off unless explicitly asked
- `.env` files — never read or modify
- `pyproject.toml` dependency versions — propose changes, don't apply

The "Don't touch" section is underrated. Claude Code is eager to help, which means it'll sometimes edit things adjacent to what you asked for. Explicit boundaries save time.

The three commands I use every day

1. claude (plain, in project root)

Opens an interactive session with full project context. This is my primary mode. I describe what I want, it proposes changes, I approve or reject. The approval flow feels slow the first week, then becomes natural — you're reviewing diffs, not babysitting output.

2. claude -p "task description"

One-shot mode. I use this for well-defined tasks where I know exactly what I want:

claude -p "Write a pytest fixture that creates a temp SQLite database for testing, tears it down after each test, and lives in tests/conftest.py"

One-shot is faster than interactive for tasks that are specific enough to describe completely. The key is being precise — vague one-shot prompts produce vague results.

3. claude --continue

Resumes the previous session. More useful than I expected. When I leave something half-done and come back, --continue picks up the thread. It's not perfect context restoration, but it's better than re-explaining everything.

Patterns that actually work

Test-first, always

The single most effective workflow I've found: describe the behavior you want as a test, then let Claude Code implement the function.

Me: Write the test first. I want a function `parse_config(path: str) -> Config`
that reads a YAML file and returns a validated Config dataclass.
The test should cover: valid input, missing required field,
invalid type for a field, and file not found.

Claude: [writes tests/test_config.py]

Me: Now implement parse_config to pass those tests.

This works because tests constrain the solution space. Claude Code is much better at implementing against a spec than guessing what you want.

Ask for the plan before the code

For anything non-trivial, I ask "what's your plan?" before "write the code." This takes an extra exchange but prevents long stretches of code I'd reject immediately.

Me: I need to add rate limiting to our API endpoints.
What's your plan before you write anything?

You'll catch misunderstandings early. "I'd add a Redis-backed rate limiter using..." when you don't have Redis in your stack is much cheaper to correct as text than as 200 lines of code.

Keep sessions focused

Claude Code's context window is large but not infinite, and performance degrades as context fills. I've learned to:

Start a new session for each distinct task
Not drag in unrelated files "just in case"
Use --continue only for the same task, not for adjacent ones

One task per session. It sounds inefficient but it's faster in practice.

The prompt library problem

Claude Code is a tool, not a solution. The quality of what it produces is directly proportional to the quality of what you ask for.

After a few months of using it, I noticed I was writing the same types of prompts repeatedly — for test generation, for refactoring, for API design, for documentation. I started keeping a local file of prompts that had worked well.

That file turned into the AI Developer Toolkit, which I published last week — 272 prompts organized by task type (testing, refactoring, debugging, architecture decisions, code review, etc.). It's what I actually use, not a "1000 ChatGPT prompts" dump.

The relevant point here: the prompts that work best with Claude Code are the ones that give it a clear output format, explicit constraints, and an example or two. Generic prompts get generic code.

What doesn't work

Asking it to "look at the codebase and suggest improvements"

You'll get a list of things that are either obvious, wrong for your context, or technically correct but not worth doing. Claude Code is good at executing specific tasks. It's not good at prioritizing.

Letting it run multiple steps without checkpoints

I have autoAcceptEdits: false in my settings. Some people turn this on for speed. My experience: you end up reviewing three interdependent changes at once instead of one at a time, and when something is wrong it's harder to identify where.

Using it for first drafts of complex architecture

For a simple CRUD endpoint, first draft is fine. For something genuinely complex — a custom event loop, a tricky state machine, distributed coordination — I've had better results sketching the design myself first and using Claude Code to fill in the implementation.

The stable vs latest question

Claude Code ships stable, latest, and next tags via npm. I run latest (currently 2.1.85). stable is still pinned at 2.1.77 — well behind.

For daily use on Python projects, I haven't hit stability issues on latest. If you're running it in CI or as part of automated workflows, I'd be more conservative. But for interactive development sessions, latest is fine.

The 2.1.85 patch cycle

I don't know what's in 2.1.83 through 2.1.85 specifically — the npm metadata doesn't include changelogs. But three patches in 24 hours before promotion suggests something was wrong and then fixed. That pattern is familiar to anyone who's shipped software: sometimes you catch a regression mid-release and hotfix until it's stable.

Whatever it was, the tool feels solid on my current Python projects. I'll update this if anything notable surfaces.

Quick-start CLAUDE.md for a Python package

If you're starting from scratch, here's the minimal version:

# [Package Name]

## Test command
pytest tests/ -x

## Lint command
ruff check . && ruff format --check .

## Python version
3.12

## Don't modify
- pyproject.toml (propose changes only)

Add more as you learn what Claude Code consistently gets wrong for your project. The CLAUDE.md is a living document — I update mine every few weeks.

That's what six months of daily Python use has taught me. The tool is good. The setup matters more than most people realize.

If you're using Claude Code on Python projects and have patterns that work for you, I'd like to hear them in the comments — especially around testing workflows and larger codebases.

Top comments (1)

klement Gunndu • Mar 28

The one-task-per-session discipline is the biggest win here. Context bleed across tasks was our number one quality killer until we enforced the same boundary.