DEV Community: Peyton Green

Structured LLM Outputs with Pydantic v2: Stop Parsing Freeform JSON and Start Typing Your AI

Peyton Green — Tue, 19 May 2026 14:00:07 +0000

The biggest source of subtle bugs in AI applications isn't the model — it's the gap between what you asked for and what you got.

You prompt for {"score": 8, "issues": ["missing error handling"]} and you get {"score": "8/10", "issues": "missing error handling"}. Both are technically valid JSON. One breaks your downstream code. Neither triggers an exception until hours later when you're wondering why the aggregation is wrong.

Pydantic v2 eliminates this class of bugs. Here's how to structure your LLM outputs so type errors are caught at the boundary, not buried in production.

The problem with freeform JSON parsing

Most developers start here:

import json
from anthropic import Anthropic

client = Anthropic()

def analyze_code(code: str) -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"Analyze this code and return JSON with: severity (int 1-10), issues (list of strings), has_security_risk (bool).\n\n{code}"
        }]
    )
    return json.loads(response.content[0].text)

This fails in three ways you won't notice until production:

Type coercion silently wrong. The model returns "severity": "8" instead of 8. json.loads parses it as a string. Your downstream severity > 7 comparison evaluates to False for every input.
Missing fields. The model occasionally omits has_security_risk when it seems obvious from context. KeyError three calls in, two hours into a batch job.
Schema drift. You update the prompt. The model starts returning an extra field. Your downstream code ignores it. A week later you realize the data you've been storing is inconsistent.

The Pydantic v2 fix

Define your output schema first:

from pydantic import BaseModel, Field, field_validator
from typing import Annotated

class CodeAnalysis(BaseModel):
    severity: Annotated[int, Field(ge=1, le=10)]
    issues: list[str]
    has_security_risk: bool
    summary: str = ""  # optional with default

    @field_validator("issues")
    @classmethod
    def issues_not_empty_strings(cls, v: list[str]) -> list[str]:
        return [issue.strip() for issue in v if issue.strip()]

Now parse with validation:

def analyze_code(code: str) -> CodeAnalysis:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"""Analyze this code. Return a JSON object with exactly these fields:
- severity: integer from 1 to 10 (10 = critical)
- issues: array of strings describing specific problems found
- has_security_risk: boolean
- summary: one sentence describing the overall assessment

Code:
{code}"""
        }]
    )

    raw = extract_json(response.content[0].text)
    return CodeAnalysis.model_validate(raw)

The model_validate call coerces "8" to 8, raises ValidationError on missing required fields, and runs your custom validators. The error surfaces at the boundary, not downstream.

Extracting JSON from model responses

Models don't always return clean JSON — they sometimes wrap it in markdown code blocks or add explanation text. A reliable extractor:

import re

def extract_json(text: str) -> dict:
    """Extract JSON from model response, handling markdown code blocks."""
    # Try markdown code block first
    match = re.search(r"```

(?:json)?\s*(\{.*?\})\s*

```", text, re.DOTALL)
    if match:
        return json.loads(match.group(1))

    # Try raw JSON object
    match = re.search(r"\{.*\}", text, re.DOTALL)
    if match:
        return json.loads(match.group(0))

    raise ValueError(f"No JSON found in response: {text[:200]}")

This handles the three most common response formats:

{"key": "value"} — raw JSON
json\n{"key": "value"}\n — markdown json block
\n{"key": "value"}\n — unlabeled code block

Prompt patterns that produce consistent schema adherence

The prompt matters as much as the parser. Patterns that reduce schema drift:

Explicit field types in the prompt:

Return JSON with exactly:
- score: integer (1-100, NOT a string, NOT "X/100")
- tags: array of strings (NOT a comma-separated string)
- confident: boolean (true/false, NOT "yes"/"no")

Spelling out "NOT a string" sounds redundant. It cuts type coercion errors by ~80% in practice.

Repeat the schema in the system prompt:

system_prompt = """You analyze Python code and return structured assessments.

ALWAYS return a valid JSON object matching this exact schema:
{
    "severity": <integer 1-10>,
    "issues": [<string>, ...],
    "has_security_risk": <boolean>,
    "summary": <string>
}

Never include markdown formatting. Never add extra fields. Never omit required fields."""

A system-level schema reminder significantly reduces missing-field errors on longer outputs where the model might "forget" the schema by the time it finishes generating.

Temperature for structured outputs:

For strict schema adherence, use lower temperature (0.2-0.4). The default temperature trades creativity for consistency — fine for prose, wrong for structured data.

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    temperature=0.3,  # deterministic enough for reliable JSON
    ...
)

Handling validation errors gracefully

Validation errors are expected in production — the model occasionally hallucinates out-of-range values or mis-types a field. Don't let them crash your application:

from pydantic import ValidationError
import logging

logger = logging.getLogger(__name__)

def analyze_code_safe(code: str) -> CodeAnalysis | None:
    try:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            temperature=0.3,
            messages=[...],
        )
        raw = extract_json(response.content[0].text)
        return CodeAnalysis.model_validate(raw)

    except ValidationError as e:
        logger.warning(
            "Schema validation failed",
            extra={"errors": e.errors(), "code_snippet": code[:100]}
        )
        return None

    except (ValueError, json.JSONDecodeError) as e:
        logger.error("JSON extraction failed", extra={"error": str(e)})
        return None

Log the validation errors — e.errors() returns structured error data (field path, expected type, actual value) that tells you when your schema is drifting from what the model produces. Pattern-match on these logs to update your prompt before the failure rate climbs.

Nested schemas

For complex outputs, compose Pydantic models:

from pydantic import BaseModel
from typing import Literal

class SecurityFinding(BaseModel):
    severity: Literal["low", "medium", "high", "critical"]
    cwe_id: str | None = None
    location: str
    description: str
    remediation: str

class CodeReview(BaseModel):
    overall_score: Annotated[int, Field(ge=1, le=10)]
    security_findings: list[SecurityFinding] = []
    style_issues: list[str] = []
    performance_notes: list[str] = []
    approved: bool
    reviewer_summary: str

Pydantic v2 handles nested model validation — if security_findings contains an item that doesn't match SecurityFinding, you get a validation error pointing to the exact path (security_findings[2].severity).

For the model prompt, represent nested schemas as a JSON example rather than a description:

schema_example = """{
    "overall_score": 7,
    "security_findings": [
        {
            "severity": "high",
            "cwe_id": "CWE-89",
            "location": "function get_user, line 45",
            "description": "Unsanitized user input in SQL query",
            "remediation": "Use parameterized queries"
        }
    ],
    "style_issues": ["Line 12: variable name too short"],
    "performance_notes": [],
    "approved": false,
    "reviewer_summary": "Significant security issue requires remediation before merge."
}"""

A JSON example is more reliably followed than a prose schema description for nested objects.

Streaming with structured outputs

For long outputs where you want to stream but still validate:

import json

def analyze_code_streaming(code: str) -> CodeAnalysis:
    chunks = []

    with client.messages.stream(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        temperature=0.3,
        messages=[...],
    ) as stream:
        for text in stream.text_stream:
            chunks.append(text)
            # optionally yield chunks to caller here

    full_response = "".join(chunks)
    raw = extract_json(full_response)
    return CodeAnalysis.model_validate(raw)

Validate on the complete response, not mid-stream — partial JSON won't validate and you'll get false errors. Stream for latency perception; validate at the end for correctness.

A complete working pattern

Here's the full pattern assembled, ready to adapt:

import json
import re
import logging
from typing import Annotated
from pydantic import BaseModel, Field, ValidationError, field_validator
from anthropic import Anthropic

logger = logging.getLogger(__name__)
client = Anthropic()

class CodeAnalysis(BaseModel):
    severity: Annotated[int, Field(ge=1, le=10)]
    issues: list[str]
    has_security_risk: bool
    summary: str = ""

    @field_validator("issues")
    @classmethod
    def clean_issues(cls, v: list[str]) -> list[str]:
        return [issue.strip() for issue in v if issue.strip()]

def extract_json(text: str) -> dict:
    match = re.search(r"```

(?:json)?\s*(\{.*?\})\s*

```", text, re.DOTALL)
    if match:
        return json.loads(match.group(1))
    match = re.search(r"\{.*\}", text, re.DOTALL)
    if match:
        return json.loads(match.group(0))
    raise ValueError(f"No JSON found in response")

def analyze_code(code: str) -> CodeAnalysis | None:
    try:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            temperature=0.3,
            system="""Return JSON matching exactly:
{"severity": <int 1-10>, "issues": [<strings>], "has_security_risk": <bool>, "summary": <string>}
No markdown. No extra fields.""",
            messages=[{"role": "user", "content": f"Analyze:\n\n{code}"}],
        )
        raw = extract_json(response.content[0].text)
        return CodeAnalysis.model_validate(raw)

    except ValidationError as e:
        logger.warning("Validation failed", extra={"errors": e.errors()})
        return None
    except Exception as e:
        logger.error("Analysis failed", extra={"error": str(e)})
        return None

What this gives you that freeform parsing doesn't

Type safety end-to-end. analysis.severity is always int. Your type checker knows it. Your IDE autocompletes it.
Validation at the boundary. Bad model output fails at model_validate, not three function calls later.
Structured error logging. ValidationError.errors() tells you which field, which constraint, which value. Useful for monitoring model drift over time.
Schema as documentation. The Pydantic model is the ground truth for what your AI endpoint produces. CodeAnalysis.model_json_schema() generates the JSON schema automatically for documentation or OpenAPI spec.

The prompts in the AI Dev Toolkit use this pattern throughout — parameterized prompts with explicit schema definitions for each task type, tuned for consistent output across code review, documentation generation, and API design workflows.

MCP and A2A Together in Python: Tool Calls That Cross Agent Boundaries (Without the Cloud Lock-In)

Peyton Green — Tue, 12 May 2026 14:00:06 +0000

In March, CVE-2025-6514 was published: command injection in mcp-remote, CVSS 9.6, around 500,000 downloads affected.

MCP is in production. Real deployments, real users, real security surface.

The MCP Dev Summit NYC ran April 2–3. Six sessions on authentication. Aaron Parecki — OAuth 2.1 spec author — delivered a talk called "Evolution, Not Revolution: How MCP Is Reshaping OAuth." The consistent message: these protocols are stable, the architecture is settled, and the question now is how to build on them correctly.

A2A joined the Linux Foundation in February with AWS, Cisco, Microsoft, and Salesforce as co-signers. The spec also now includes an official statement: "MCP handles tool/resource integration, A2A handles agent-to-agent coordination — complementary, not competing."

If you're looking at both protocols and asking "do I have to rebuild everything?" — the answer is no. They solve different problems and they're designed to work together in the same stack.

Here's what that looks like in code.

The One Diagram That Settles It

┌──────────────────────────────────────────────────┐
│              Orchestrator Agent                  │
│  "Find me the top-3 cited papers on RAG"         │
├──────────────────────────────────────────────────┤
│  A2A: delegates tasks to specialist agents       │
│  sends task → research_agent (localhost:9998)    │
├──────────────────────────────────────────────────┤
│  MCP: calls tools, reads files, fetches data     │
│  research_agent uses fetch + filesystem tools    │
└──────────────────────────────────────────────────┘

MCP connects your agent to external resources — file systems, databases, APIs, browser automation. It answers "what can this agent access?"

A2A connects your agent to other agents — specialist workers, parallel executors, remote services. It answers "what other agents can this agent delegate to?"

They operate at different abstraction layers. MCP is your tool belt. A2A is your interoperability protocol. You need both.

A quick note on A2A's status: on February 27, 2026, A2A joined the Linux Foundation with AWS, Cisco, Microsoft, and Salesforce as co-signers. It's no longer a Google-only proposal — it's an industry standard under neutral governance. The adoption question is settled.

The notable holdout is OpenAI. A contributor submitted a complete A2A implementation to openai-agents-python and the maintainers declined: "we don't have immediate plans to add A2A support to this SDK." Until OpenAI ships an A2A client, cross-framework interoperability between OpenAI agents and A2A agents will require a wrapper or translation layer. For teams that aren't locked into the OpenAI SDK, this is a non-issue — LangGraph, CrewAI, PydanticAI, and Google ADK all either have or are implementing A2A. But if your orchestrator is built on OpenAI agents, factor in that gap today.

What Each Protocol Handles

Problem	Protocol	Mechanism
My agent needs to read a local file	MCP	Tool call to filesystem server
My agent needs to query a database	MCP	Tool call to database server
My agent needs to call a REST API	MCP	Tool call to fetch/http server
My agent needs to hand off a task to a specialist	A2A	Task submission to an A2A agent
My agent needs to run subtasks in parallel	A2A	Multiple A2A task submissions
My orchestrator needs to coordinate across frameworks	A2A	A2A's standard task schema works across LangGraph, CrewAI, ADK

The confusion comes from the fact that both feel like "ways for an agent to do more things." But the mechanism is completely different.

In MCP, your agent calls a tool and gets a result back synchronously. The tool doesn't have goals, memory, or identity. It's a function.

In A2A, your agent submits a task to another agent that has its own goals, memory, and task lifecycle. The downstream agent can run for seconds, minutes, or longer — and send back streaming updates while it works.

A Concrete Example: Both Protocols in One Flow

Here's an orchestrator that uses A2A to delegate to a research agent, which uses MCP to do its actual work.

The research agent (research_agent.py) — an A2A-compliant server that internally uses httpx to fetch pages (you could swap this for an MCP fetch tool):

# research_agent.py
import asyncio
import uvicorn
import httpx
from a2a.server.apps import A2AStarletteApplication
from a2a.server.agent_execution import AgentExecutor, RequestContext
from a2a.server.events import EventQueue
from a2a.server.request_handlers import DefaultRequestHandler
from a2a.server.tasks import InMemoryTaskStore, TaskUpdater
from a2a.types import AgentCapabilities, AgentCard, AgentSkill
from a2a.utils import new_agent_text_message


class ResearchAgentExecutor(AgentExecutor):
    async def execute(self, context: RequestContext, event_queue: EventQueue) -> None:
        updater = TaskUpdater(event_queue, context.task_id, context.context_id)
        await updater.start_work()

        query = context.get_user_input()

        # MCP layer would go here in a real implementation.
        # This agent calls external tools (fetch, filesystem, search API)
        # via whatever MCP server it has configured. For the demo, we simulate.
        result = f"Research results for: '{query}'\n"
        result += "1. Smith et al. (2023) — RAG survey, 1,241 citations\n"
        result += "2. Lewis et al. (2020) — original RAG paper, 4,800+ citations\n"
        result += "3. Gao et al. (2023) — advanced RAG techniques, 890 citations"

        await event_queue.enqueue_event(new_agent_text_message(result))
        await updater.complete()

    async def cancel(self, context: RequestContext, event_queue: EventQueue) -> None:
        raise NotImplementedError


skill = AgentSkill(
    id="research",
    name="Literature Research",
    description="Searches and summarizes academic papers on a given topic.",
    tags=["research", "papers", "citations"],
    examples=["Find the top-cited papers on RAG", "Summarize recent work on agent memory"],
)

agent_card = AgentCard(
    name="Research Agent",
    description="A specialist agent that researches academic topics and returns citations.",
    url="http://localhost:9998/",
    version="1.0.0",
    default_input_modes=["text"],
    default_output_modes=["text"],
    capabilities=AgentCapabilities(streaming=False),
    skills=[skill],
)

app = A2AStarletteApplication(
    agent_card=agent_card,
    http_handler=DefaultRequestHandler(
        agent_executor=ResearchAgentExecutor(),
        task_store=InMemoryTaskStore(),
    ),
)

if __name__ == "__main__":
    uvicorn.run(app.build(), host="0.0.0.0", port=9998)

The orchestrator (orchestrator.py) — uses A2A to dispatch, would use MCP for its own tool access:

# orchestrator.py
import asyncio
import httpx
from a2a.client import A2ACardResolver, ClientFactory, ClientConfig, create_text_message_object


async def delegate_to_research_agent(query: str) -> str:
    """
    Delegates a research task to the research agent via A2A.
    The research agent handles its own tool access (MCP or direct).
    """
    async with httpx.AsyncClient() as http:
        resolver = A2ACardResolver(httpx_client=http, base_url="http://localhost:9998")
        card = await resolver.get_agent_card()
        print(f"→ Delegating to: {card.name}")

        factory = ClientFactory(config=ClientConfig(httpx_client=http))
        client = factory.create(card)
        message = create_text_message_object(content=query)

        async for event in client.send_message(message):
            if hasattr(event, "parts"):
                for part in event.parts:
                    if hasattr(part.root, "text"):
                        return part.root.text
            elif isinstance(event, tuple):
                task, _ = event
                if task.history:
                    for msg in task.history:
                        if msg.role.value == "agent":
                            for part in msg.parts:
                                if hasattr(part.root, "text"):
                                    return part.root.text
    return "No result"


async def main():
    print("Orchestrator: processing research request")
    print("─" * 50)

    # MCP tools would be used here for tasks the orchestrator handles itself
    # (reading local context, checking a database, querying an API).
    # For tasks requiring specialist knowledge, we delegate via A2A.

    query = "top-3 cited papers on retrieval-augmented generation"
    result = await delegate_to_research_agent(query)

    print(f"\nResult from research agent:\n{result}")


asyncio.run(main())

Run it:

# Terminal 1
python research_agent.py

# Terminal 2
python orchestrator.py

Output:

Orchestrator: processing research request
──────────────────────────────────────────────────
→ Delegating to: Research Agent

Result from research agent:
Research results for: 'top-3 cited papers on retrieval-augmented generation'
1. Smith et al. (2023) — RAG survey, 1,241 citations
2. Lewis et al. (2020) — original RAG paper, 4,800+ citations
3. Gao et al. (2023) — advanced RAG techniques, 890 citations

The key point in the orchestrator: delegate_to_research_agent() doesn't know or care how the research agent gets its data. It might use MCP filesystem tools, a fetch tool, a search API, or a local knowledge base. The A2A interface is agnostic to that. The orchestrator says "here's the task" — the specialist agent handles its own tooling.

When to Reach for Each

Reach for MCP when:

Your agent needs to read a file, query a database, or call an API
The operation is synchronous and can return in milliseconds to a few seconds
You're connecting to infrastructure that doesn't have its own agent identity
You want your agent to have access to tools from a pre-built ecosystem (Anthropic, Zapier, etc.)

Reach for A2A when:

You want to delegate a task to a specialized agent that owns its own execution logic
You need a long-running subtask with progress updates back to the orchestrator
You want your orchestrator to be framework-agnostic (works with LangGraph agents, CrewAI agents, Google ADK agents, or a Python script you wrote this afternoon)
You're building multi-agent pipelines where specialists need to be swappable

The practical test: If the downstream thing is a function (read file, query DB, call API) — it's MCP. If the downstream thing is an agent (with its own goals, state, and decision-making) — it's A2A.

Migrating from MCP-only to MCP + A2A

If you're already using MCP, nothing changes. Your existing MCP tool servers are still useful. You're adding a new layer, not replacing one.

The migration pattern:

Keep your MCP tool servers as-is
Identify specialist capabilities in your current monolithic agent that would benefit from isolation (a web researcher, a code analyzer, a document processor)
Extract those into A2A-compliant specialist agents
Your orchestrator calls the specialists via A2A; the specialists use MCP for their own tool access

Your existing Python automation scripts work here too. An AgentExecutor wrapper is about 15 lines — the same pattern from the A2A quickstart. Each script becomes a callable specialist that any A2A-compatible orchestrator can dispatch.

The Stack in One View

Your Domain
└── Orchestrator agent
    ├── MCP: filesystem, database, APIs (your own tools)
    ├── A2A → Research specialist
    │         └── MCP: fetch, search, knowledge base (research agent's tools)
    ├── A2A → Code analyzer specialist
    │         └── MCP: filesystem, linter, AST tools
    └── A2A → Notification agent
              └── MCP: email, Slack, SMS tools

Each specialist owns its own MCP tooling. The orchestrator coordinates via A2A. The boundaries are clean.

Authentication in an MCP + A2A Stack

Two network boundaries need authentication:

Client → MCP server (your agent calling its tools)
Orchestrator → A2A agent (your orchestrator delegating tasks)

Both protocols converge on the same auth model: OAuth 2.1, with an external authorization server. Your auth infrastructure is reusable across both.

MCP auth (mcp>=1.27,<2)

# research_agent_mcp.py — MCP server exposing RFC 9728 resource metadata
from mcp.server.fastmcp import FastMCP
from mcp.server.auth.middleware.bearer import BearerAuthBackend, BearerAuthProvider
import httpx

app = FastMCP("research-agent")

# RFC 9728: tell clients where to get a token for this resource server.
# The MCP server validates tokens — it does NOT issue them.
@app.custom_route("/.well-known/oauth-protected-resource", methods=["GET"])
async def oauth_resource_metadata(request):
    from starlette.responses import JSONResponse
    return JSONResponse({
        "resource": "https://research-agent.example.com",
        "authorization_servers": ["https://auth.example.com"]
    })

@app.tool()
async def search_papers(query: str) -> str:
    """Search for research papers on the given topic."""
    # Your tool implementation here
    return f"Results for: {query}"

Install: pip install "mcp>=1.27,<2"

The MCP server validates tokens from the external AS. It does not issue them. For the full implementation (token endpoint, PKCE, Dynamic Client Registration), see FastAPI + MCP: Adding Real OAuth 2.1 Auth to Your Python MCP Server.

A2A auth (a2a-sdk==0.3.25)

A2A uses OAuth 2.1 with device code flow (RFC 8628) and PKCE. Implicit and password flows are removed in v1.0. The agent card advertises where clients can get a token.

# research_agent_a2a.py — A2A agent with auth metadata in agent card
from fasta2a import FastA2A

app = FastA2A(
    name="Research Agent",
    description="Searches and summarizes research papers.",
    url="http://localhost:9998/a2a",
    version="1.0.0",
    # Agent card auth surface — points clients to the authorization server
    authentication={
        "schemes": ["Bearer"],
        "credentials": "https://auth.example.com/.well-known/oauth-authorization-server"
    },
    capabilities={"streaming": False, "pushNotifications": False},
    defaultInputModes=["text"],
    defaultOutputModes=["text"],
    skills=[{"id": "research", "name": "Research", "description": "Searches research papers"}],
)

The combined auth flow

Authorization Server (auth.example.com)
  └── issues tokens for both MCP servers and A2A agents

Orchestrator
  ├── Bearer token → MCP server (tool calls)
  └── Bearer token → A2A agent (task submissions)

MCP server           A2A agent
  └── validates        └── validates
      token via             token via
      RFC 9728              agent card
      discovery             credentials

A single authorization server can protect both protocol layers. The RFC 9728 discovery pattern (/.well-known/oauth-protected-resource) is the same for both.

Testing FastAPI Endpoints Without Spinning Up a Server

Peyton Green — Tue, 05 May 2026 15:58:15 +0000

The most common FastAPI testing setup I see in the wild: the test suite starts the full server with uvicorn, runs requests against localhost:8000, and tears down at the end.

It works. It's also unnecessary. FastAPI ships with a TestClient that runs your app in-process — no server, no ports, no network. Once you understand how it works, you write faster tests and catch a class of dependency-injection bugs that the full-server approach misses.

TestClient: what it actually does

TestClient wraps httpx.Client around your FastAPI app. When you call client.get("/endpoint"), it routes the request through FastAPI's routing machinery without any network I/O. The request goes in as an ASGI request dict; the response comes back as an httpx.Response.

from fastapi import FastAPI
from fastapi.testclient import TestClient

app = FastAPI()

@app.get("/health")
def health():
    return {"status": "ok"}

client = TestClient(app)

def test_health():
    response = client.get("/health")
    assert response.status_code == 200
    assert response.json() == {"status": "ok"}

No uvicorn. No localhost. No port binding. The test runs entirely in-process.

What this means for test speed: A full server startup adds 200-500ms per test file (or more with slow dependencies). In-process routing adds ~0ms. On a test suite with 50 test files, this is the difference between a 25-second run and a 5-second run.

Dependency overrides: the pattern that changes everything

FastAPI's dependency injection system is the reason TestClient is genuinely useful rather than just fast.

Every endpoint can declare dependencies — database connections, auth tokens, service clients. In production, FastAPI resolves them from the real providers. In tests, you can swap them out per-test with app.dependency_overrides:

from fastapi import Depends, FastAPI
from fastapi.testclient import TestClient

app = FastAPI()

# Production dependency
def get_db():
    db = create_db_connection()
    try:
        yield db
    finally:
        db.close()

@app.get("/users/{user_id}")
def get_user(user_id: int, db=Depends(get_db)):
    user = db.query(User).get(user_id)
    if not user:
        return {"error": "not found"}, 404
    return user.to_dict()

# Test override
def get_test_db():
    db = create_in_memory_db()
    db.add(User(id=1, name="Alice"))
    yield db

def test_get_user():
    app.dependency_overrides[get_db] = get_test_db
    client = TestClient(app)

    response = client.get("/users/1")
    assert response.status_code == 200
    assert response.json()["name"] == "Alice"

    app.dependency_overrides.clear()

The key behavior: dependency_overrides is a dict on the app object. You set it before the test, the test runs with the override, you clear it after. FastAPI resolves get_db → looks it up in dependency_overrides → finds get_test_db → uses that instead.

The bug it catches: If your real get_db wraps a production database and your test override wraps an in-memory database with different schema or constraints, the tests will pass and the prod code will fail. The right pattern is an override that uses the same ORM models and schema as production — just a different database URL (SQLite in-memory is fine for most tests).

Fixture pattern: TestClient with scoped overrides

The cleanest way to handle this in a real test suite is to push the override into a pytest fixture:

import pytest
from fastapi.testclient import TestClient
from app.main import app
from app.dependencies import get_db
from tests.fixtures import get_test_db

@pytest.fixture
def client():
    app.dependency_overrides[get_db] = get_test_db
    with TestClient(app) as c:
        yield c
    app.dependency_overrides.clear()

Using TestClient as a context manager (the with block) is important for async endpoints — it ensures the lifespan context runs correctly. For sync-only apps it's optional, but it's a good habit.

Now every test that takes client as a fixture gets the overridden dependencies automatically:

def test_get_user_not_found(client):
    response = client.get("/users/999")
    assert response.status_code == 404

def test_create_user(client):
    response = client.post("/users", json={"name": "Bob"})
    assert response.status_code == 201
    assert response.json()["name"] == "Bob"

No setup or teardown in the test functions. The fixture owns the lifecycle.

Auth: overriding security dependencies

Auth is where dependency overrides pay off most clearly. Most FastAPI apps use Depends for auth:

from fastapi import Depends, HTTPException, Security
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials

security = HTTPBearer()

def get_current_user(credentials: HTTPAuthorizationCredentials = Security(security)):
    token = credentials.credentials
    user = verify_token(token)  # hits an external auth service
    if not user:
        raise HTTPException(status_code=401)
    return user

@app.get("/me")
def get_me(current_user=Depends(get_current_user)):
    return current_user.to_dict()

In tests, you don't want to hit the external auth service. Override it:

@pytest.fixture
def authenticated_client():
    fake_user = User(id=42, name="TestUser", role="admin")

    def override_auth():
        return fake_user

    app.dependency_overrides[get_current_user] = override_auth
    with TestClient(app) as c:
        yield c
    app.dependency_overrides.clear()

def test_get_me(authenticated_client):
    response = authenticated_client.get("/me")
    assert response.status_code == 200
    assert response.json()["name"] == "TestUser"

You can create multiple fixtures for different auth states — admin_client, readonly_client, unauthenticated_client — each with a different user in the override. The endpoint code doesn't change.

Async endpoints: the one gotcha

If your endpoints are async def, TestClient handles them correctly when used as a context manager. If you need to test async behavior more directly (async fixtures, async teardown), use httpx.AsyncClient with ASGITransport instead:

import pytest
import httpx
from app.main import app

@pytest.fixture
async def async_client():
    async with httpx.AsyncClient(
        transport=httpx.ASGITransport(app=app),
        base_url="http://test"
    ) as client:
        yield client

@pytest.mark.anyio
async def test_async_endpoint(async_client):
    response = await async_client.get("/async-endpoint")
    assert response.status_code == 200

This requires anyio (for pytest.mark.anyio) or pytest-asyncio. For most test suites, synchronous TestClient is sufficient — it handles async endpoints correctly in a sync context.

What to test vs what to skip

TestClient tests are fast and isolated. That means you can afford to test more coverage per request. What to focus on:

Status codes: every non-200 path (404, 422 validation errors, 401/403 auth failures)
Response shape: the JSON structure your callers depend on
Validation: FastAPI auto-validates request bodies via Pydantic — confirm it rejects bad input with 422
Dependency injection edge cases: what happens when your DB dependency returns None?

What to leave for integration tests (or skip entirely):

The exact SQL queries your ORM generates
Database migration behavior
Third-party API behavior (mock that at the HTTP level)

The boundary: TestClient tests should test your code (routing, business logic, response serialization). They should not test FastAPI itself or your ORM.

Putting it together: a minimal test setup

# tests/conftest.py
import pytest
from fastapi.testclient import TestClient
from app.main import app
from app.db import get_db
from tests.db import get_test_db

@pytest.fixture(scope="module")
def client():
    """TestClient with test database override. Module-scoped for speed."""
    app.dependency_overrides[get_db] = get_test_db
    with TestClient(app) as c:
        yield c
    app.dependency_overrides.clear()

# tests/test_users.py
def test_list_users_empty(client):
    response = client.get("/users")
    assert response.status_code == 200
    assert response.json() == []

def test_create_user(client):
    response = client.post("/users", json={"name": "Alice", "email": "alice@example.com"})
    assert response.status_code == 201
    data = response.json()
    assert data["name"] == "Alice"
    assert "id" in data

def test_get_user_not_found(client):
    response = client.get("/users/999")
    assert response.status_code == 404

def test_create_user_invalid_email(client):
    response = client.post("/users", json={"name": "Bob", "email": "not-an-email"})
    assert response.status_code == 422  # Pydantic validation failure

No server. No ports. Runs in under a second.

The Claude Code config files I actually use in Python projects (CLAUDE.md, hooks, slash commands)

Peyton Green — Thu, 30 Apr 2026 10:31:22 +0000

Most Claude Code advice is about prompting.

I've found the leverage is somewhere else: the config layer. The files that tell Claude Code how to behave before you ask it to do anything.

After six months of daily Claude Code use across Python API projects, data pipelines, and agent builds, here are the config files I actually use — and what each one does.

The CLAUDE.md problem

A blank CLAUDE.md is worse than no CLAUDE.md.

I tried the "just dump your stack into it" approach first. Claude Code read it and mostly ignored the parts that mattered. The problem: CLAUDE.md needs to tell Claude Code what to do and what not to do — not just what your stack is.

The structure that works for me:

## Stack
[versions + tools — so it doesn't guess]

## Project structure
[directory layout — so it knows where things belong]

## Conventions
[the rules that aren't obvious from the code]

## What to do when [common task]
[explicit workflow — the thing you'd tell a new dev on day one]

## What to avoid
[the list of things it'll do wrong without guidance]

## Before committing
[the checklist — ruff, mypy, pytest]

The "what to do when" and "what to avoid" sections are the highest leverage. They encode the decisions you'd otherwise explain repeatedly.

I maintain five templates — one for each project type I work in.

Template 1: Python API / FastAPI

The core conventions for a FastAPI project:

## Conventions

**Routes:** Keep route handlers thin. No business logic in handlers — call service functions.

**Schemas:** Use separate request/response schemas. Never return SQLAlchemy model objects directly.

**Error handling:** Raise HTTPException with meaningful detail strings. Log at ERROR level before raising.

The "never return SQLAlchemy models directly" line prevents a whole class of serialization bugs that are annoying to debug after the fact.

Full template: [in the config pack]

Template 2: Data pipeline

Idempotency and data contracts are the conventions that save you at 3am:

**Idempotency:** Every pipeline step must be safe to re-run. Writes should be upserts or write-to-temp-then-rename.

**Explicit data contracts:** Use Pydantic models at every stage boundary. Never pass raw dicts between pipeline stages.

**Logging:** Log record counts at each stage: extracted N, transformed N, loaded N.

That last one is a simple addition that turns silent failures into obvious ones.

Template 3: LLM agent / AI app

Two conventions that prevent the most common agent bugs:

**Structured outputs everywhere:** Use Pydantic models for all agent inputs and outputs. Never parse free-text agent responses manually.

**Observability:** Log every agent invocation: model, prompt tokens, completion tokens, tool calls made, structured output. Costs are invisible without this.

The observability line stops you from running up unexpected API costs and having no idea why.

Template 4: Test-heavy project (the pytest-centric template)

The fixture naming convention that I always forget to explain:

**Naming:** Fixtures that return objects are nouns (`user`, `db_session`, `api_client`). Fixtures that perform actions are verbs (`create_user`, `seed_database`).

**No fixture logic in test functions:** If setup code exceeds 3 lines, it belongs in a fixture.

And the parametrize guidance:

Use `@pytest.mark.parametrize` for:
- Multiple input/output pairs on the same logic path
- Edge cases (empty string, None, 0, max value)
- Error conditions with different error types

Don't parametrize across fundamentally different behaviors — write separate tests.

This last distinction is the one that takes the most explanation — Claude Code will over-parametrize without it.

Template 5: Multi-service / team

This one is about restricting autonomy, not expanding it:

**You must ask before:**
- Creating new files outside the service directory you were asked to work in
- Modifying anything in `libs/` — shared libraries affect all services
- Changing database migrations
- Installing new dependencies

In shared codebases, conservative defaults plus an explicit ask-first list produces fewer surprises than the default behavior.

Hooks that actually change the workflow

The three hooks I've kept running:

pre-edit-check.sh

Runs ruff + mypy on a file before Claude Code edits it.

Why: if there are type errors in the existing code, Claude Code will sometimes work around them in ways that make things worse. Surfacing them first means it has the full picture.

# In settings.json:
"hooks": {
  "PreToolUse": [
    {
      "matcher": "Edit",
      "hooks": [{ "type": "command", "command": ".claude/hooks/pre-edit-check.sh" }]
    }
  ]
}

post-edit-test.sh

Finds the pytest test file corresponding to the edited module and runs it automatically.

Assumes the convention tests/unit/test_<module>.py → <module>.py. If you use a different convention, edit the path lookup:

# In the hook script:
for pattern in \
  "tests/unit/test_${BASENAME}.py" \
  "tests/test_${BASENAME}.py" \
  "test_${BASENAME}.py"
do
  if [[ -f "$pattern" ]]; then
    TEST_FILE="$pattern"
    break
  fi
done

context-logger.sh

Appends every file Claude Code touches to .claude-session.log:

2026-03-27T14:23:01Z | Edit | app/services/user_service.py
2026-03-27T14:23:14Z | Write | tests/unit/test_user_service.py

Useful for reviewing what actually changed at the end of a long session, or auditing before commit.

Slash commands worth having

Five commands I've kept around:

/py-review — Python-specific code review. Checks types, error handling, resource management, idioms, and test coverage gaps. Not a general review — specifically for Python patterns that generic review prompts miss.

/gen-tests — Generates pytest tests with the right structure: one happy path, edge cases, error cases, parametrize where appropriate. The key instruction is what not to do — "no mocking of code I own" and "each test must have exactly one logical assertion."

/debug-async — Async bugs follow a short list of patterns (sync-in-async, unwaited tasks, swallowed CancelledError, race conditions on shared state). The command works through them systematically instead of random debugging.

/refactor-safe — Forces a plan before touching anything. Makes Claude Code identify every caller before changing a signature, agree on the target shape, and do changes one step at a time.

/pre-deploy — Walks through environment variables, error handling, logging, rate limiting, database safety, and secrets. Five minutes before a deploy to catch the things you can't test locally.

All five drop into .claude/commands/ and appear as slash commands.

Subagent patterns

Two patterns that work well for longer projects:

Research → implement → review as three separate Claude Code passes. The research pass is read-only (structured findings output). The implementation pass works from the research output. The review pass is a fresh context that only sees the finished code.

The reviewer subagent specifically works better as a separate dispatch because it has no context about why the implementation made particular choices — which is exactly the perspective you want in a reviewer.

Test writer as a separate pass after implementation is complete. Reading the finished implementation and writing tests for the actual behavior produces better coverage than writing tests and implementation simultaneously.

Where to start

If you're not already using CLAUDE.md templates: pick the template that matches your main project type, drop it in, edit the stack section and directory structure. That alone cuts the number of times you have to re-explain conventions.

If you already have CLAUDE.md: add a "What to avoid" section. List the three things Claude Code does wrong most often in your codebase. It'll stop doing them.

Hooks are optional but context-logger.sh is zero-friction — it just runs in the background and gives you a session audit trail.

The full config pack (CLAUDE.md templates, settings.json presets, hooks, slash commands, subagent templates) is available at https://kazdispatch.gumroad.com/l/claude-code-python-config — $29 one-time.

Everything in this article is free to use as-is — the pack is for the files themselves, ready to drop in.

Python Type Hints That Actually Catch Bugs (Not Just Satisfy mypy)

Peyton Green — Tue, 28 Apr 2026 10:13:01 +0000

Most type hint guides teach you syntax. This one is about which annotations actually prevent bugs.

After two years of adding types to codebases that didn't have them — and watching which annotations caught real issues and which were just ceremony — I've settled on four patterns. They're not the four most commonly taught. They're the four that have caught production bugs before they shipped.

The gap between "mypy passes" and "your code is correct"

mypy can pass on code that will throw a TypeError at runtime. This happens often enough that it's worth understanding why.

The core issue: mypy checks the annotations you write, but it can't verify the annotations are accurate. If you annotate a function as accepting str when it sometimes receives None, mypy will accept the annotation and check callers against it — but the function will still fail at runtime when None shows up.

This matters because the value of type hints isn't in satisfying the type checker. It's in forcing yourself to be precise about what your code actually accepts and returns. The annotation is a claim. The bug comes from making a wrong claim.

The patterns below are the ones where getting the annotation right catches a class of bug at annotation time rather than runtime.

Pattern 1: `Optional[X]` vs `Union[X, None]` — and why you should use neither

Optional[str] is syntactic sugar for Union[str, None]. It's the most common type annotation that developers get wrong in a consistent direction: they use it too rarely when they mean "this can be None" and too often when they mean "this is usually a string but I'm not sure."

The bug pattern it catches:

# Without types
def get_user_name(user_id: int):
    user = db.get(user_id)
    return user.name  # AttributeError if user is None

# With the wrong annotation — mypy doesn't help you
def get_user_name(user_id: int) -> str:
    user = db.get(user_id)
    return user.name  # mypy is happy; still crashes at runtime

# With the right annotation — mypy catches the unguarded access
def get_user_name(user_id: int) -> Optional[str]:
    user = db.get(user_id)
    if user is None:
        return None
    return user.name  # safe

The rule: use Optional[X] only when you've verified that the value actually can be None and you're prepared to handle it at every call site. The annotation should reflect reality, not intent.

Modern Python (3.10+) allows str | None as equivalent syntax. Use whichever your codebase has standardized on.

What it catches: Unguarded None access. The annotation forces you to decide whether None is a valid state, and mypy will flag every call site that treats the return value as definitely non-None without a guard.

Pattern 2: `TypeGuard` for narrowing in conditionals

Type narrowing is how mypy (and your IDE) understand what type a variable has inside a conditional block. The built-in narrowing handles the common cases:

def process(value: str | int) -> str:
    if isinstance(value, str):
        return value.upper()  # mypy knows value is str here
    return str(value)         # mypy knows value is int here

But narrowing fails when the check is in a helper function:

def is_string(value: object) -> bool:
    return isinstance(value, str)

def process(value: str | int) -> str:
    if is_string(value):
        return value.upper()  # mypy error: int has no attribute upper
    return str(value)

mypy can't infer that is_string narrows the type. TypeGuard fixes this:

from typing import TypeGuard

def is_string(value: object) -> TypeGuard[str]:
    return isinstance(value, str)

def process(value: str | int) -> str:
    if is_string(value):
        return value.upper()  # mypy is now happy — and correct
    return str(value)

What it catches: Type errors that only appear when narrowing is delegated to a helper. This pattern comes up frequently in validation code — the kind of code that checks "is this a valid X?" before dispatching on it. Without TypeGuard, your validation code provides no type safety downstream.

Pattern 3: `Protocol` for duck typing that mypy understands

Python is built on duck typing — if it has the right methods, it works. Type annotations fight this by default, because isinstance checks against concrete classes.

Protocol bridges the gap:

from typing import Protocol

class Serializable(Protocol):
    def to_dict(self) -> dict: ...

def save(item: Serializable) -> None:
    data = item.to_dict()
    # persist data...

Now any class with a to_dict() -> dict method satisfies Serializable — no inheritance required. mypy checks structural compatibility at the call site.

The bug pattern this catches is subtle. Without Protocol, you have two options for typing heterogeneous inputs:

Use Any — no type safety
Create a common base class — requires touching all implementations, breaks if they're in third-party code

With Protocol, you can annotate exactly the interface you use. This matters most for:

Library boundaries — you want to accept anything that looks like a file, a logger, or a cache, regardless of concrete type
Test doubles — your mock objects can satisfy the protocol without inheriting from the real class
Third-party classes — you can write a protocol that matches a third-party class without modifying it

class HasClose(Protocol):
    def close(self) -> None: ...

def cleanup(resource: HasClose) -> None:
    resource.close()

# Works with anything that has close(): file objects, database connections,
# custom resources, test doubles — no inheritance required

What it catches: Type errors at the boundary between components. If your component uses three methods of an object, Protocol lets you express exactly that dependency. If a caller passes something that doesn't have those three methods, mypy catches it.

Pattern 4: `overload` for functions with multiple valid signatures

Some functions behave differently depending on argument type. The usual workaround is Union return types, but that forces callers to handle cases that can't actually occur:

def parse(value: str | bytes) -> str | bytes:
    if isinstance(value, bytes):
        return value.decode()
    return value

result = parse("hello")
result.upper()  # mypy error: bytes has no attribute upper
# mypy doesn't know that str input → str output

@overload solves this:

from typing import overload

@overload
def parse(value: str) -> str: ...
@overload
def parse(value: bytes) -> str: ...

def parse(value: str | bytes) -> str:
    if isinstance(value, bytes):
        return value.decode()
    return value

result = parse("hello")
result.upper()  # correct — mypy knows this is str

The overloaded signatures are the type-checker declarations; the actual implementation is below them. mypy uses the overloads to determine the return type at each call site.

What it catches: Return type ambiguity in multi-dispatch functions. This is common in any function that accepts multiple input types and transforms them — parsers, converters, and coercions. Without overload, you either use Any (no safety) or accept false errors at call sites (noise that causes engineers to disable type checking).

When type hints don't help

Type hints are least valuable when:

You're already handling all the cases (the annotation confirms what you already know)
The function is trivially short (the annotation adds ceremony without reducing cognitive load)
You're annotating purely for documentation rather than checking (mypy ignores annotations on unannotated call sites)

They're most valuable at boundaries — function signatures that cross module or component lines, public API surfaces, places where data changes shape. The interior of a private implementation function is often better left unannotated; the signature that other modules call should almost always be.

A practical approach: run mypy --strict on your public API surface and your test fixtures. Let the interior be more permissive. You'll catch the bugs that matter without spending time on annotations that don't.

The four patterns at a glance

Pattern	What it catches	When to use
`Optional[X]` correctly	Unguarded None access	Any function that may return None
`TypeGuard[X]`	Narrowing failures in helper functions	Validator and predicate functions
`Protocol`	Interface mismatches across component boundaries	Anything using duck typing across modules
`@overload`	Return type ambiguity in multi-dispatch	Functions that return different types for different input types

Python Developer AI Toolkit, Part 2: Five CLI scripts that automate the prompts

Peyton Green — Wed, 22 Apr 2026 17:00:05 +0000

Part 1 of this series covered the prompt library — 272 prompts for Python and backend development organized by task type. This is Part 2: the five CLI scripts that turn those prompts into automation tools you can run from the command line or wire into your development workflow.

The goal isn't to replace your IDE or your existing toolchain. It's to make AI-assisted code review, test generation, documentation, commit messages, and multi-step analysis available as shell commands — so the prompts run when you need them, not when you remember to open a chat window.

All five scripts use the same dependencies (just requests) and support both Claude and GPT via environment variable. They're in the scripts/ directory of the AI Dev Toolkit.

Script 1: ai_code_reviewer.py

What it does: Sends a source file to AI and returns a structured code review.

python ai_code_reviewer.py src/app.py
python ai_code_reviewer.py src/app.py --format markdown > review.md
python ai_code_reviewer.py src/app.py --format json | jq '.issues[]'

The review covers four categories: security issues, performance concerns, style and readability, and actionable suggestions. The structured output is the key part — "review this code" produces whatever the model feels like. A structured prompt with defined categories produces consistent output you can act on.

Three output formats:

text (default): readable console output
markdown: Markdown-formatted output suitable for saving as a file or pasting into a PR comment
json: machine-parseable output for scripting

# Review every changed file before committing
git diff --name-only HEAD | grep '\.py$' | xargs -I {} python ai_code_reviewer.py {}

# Generate markdown reviews for all files in a PR
git diff origin/main --name-only | grep '\.py$' | while read f; do
  python ai_code_reviewer.py "$f" --format markdown > "reviews/$(basename $f .py)-review.md"
done

When I actually use this: Before any PR that touches business logic. Running it on the file before writing the commit message catches issues I'd otherwise miss — especially the edge cases that don't show up in the happy-path test.

Script 2: test_generator.py

What it does: Analyzes a Python source file and generates a pytest test suite for it.

python test_generator.py src/utils.py
python test_generator.py src/utils.py --output tests/test_utils.py
python test_generator.py src/api/routes.py --output tests/test_routes.py --model gpt

The output is runnable pytest code. The generator analyzes function signatures, type hints, and docstrings to infer what to test — including edge cases and error conditions that aren't always obvious when you're writing the function.

# Generate and immediately run the tests
python test_generator.py src/utils.py --output tests/test_utils.py && python -m pytest tests/test_utils.py -v

The generated tests aren't always perfect out of the box. Complex dependency injection, database interactions, and external API calls need adjustments. But for utility functions, data transformations, and validation logic, the first pass is often 80% of what you'd write manually — in about 15 seconds instead of 20 minutes.

The workflow that's changed how I write code: Run test_generator.py on a new module immediately after writing it. The generated tests are a rapid sanity check on whether the function contract matches what I intended. If the generated tests don't make sense for how I expect the function to be used, the function is probably poorly named or badly documented.

Script 3: doc_generator.py

What it does: Generates structured Markdown documentation from Python source files or entire directories.

# Document a single file
python doc_generator.py src/utils.py

# Document an entire package
python doc_generator.py src/ --output docs/

# Generate and redirect to a file
python doc_generator.py src/api/routes.py > docs/routes.md

The output is structured Markdown: module overview, function/class signatures, parameter descriptions, return types, and usage examples. It reads docstrings if they exist and supplements them — or generates from scratch if there are no docstrings.

For packages with clear type hints and function names, the output requires minimal editing. For older codebases with minimal documentation, it produces a first pass that's faster to edit than to write from scratch.

# Document all Python files in a project and write to docs/
find src/ -name "*.py" | while read f; do
  python doc_generator.py "$f" --output "docs/$(dirname ${f#src/})/$(basename $f .py).md"
done

Script 4: commit_message_ai.py

What it does: Generates a conventional commit message from your staged git diff.

# See the suggested message
git add -p
python commit_message_ai.py

# Copy to clipboard
python commit_message_ai.py --copy

# Commit directly
python commit_message_ai.py --apply

# Install as a git hook (runs automatically on every `git commit`)
python commit_message_ai.py --install

The generated messages follow the Conventional Commits format: type(scope): description. The model reads the actual diff — not just the filenames — and produces a message that accurately describes what changed and why.

The --install flag is the useful one. It installs the script as a prepare-commit-msg git hook. After that, every time you run git commit, your editor opens with an AI-generated message pre-filled. Keep it, edit it, or replace it — but you're starting from something specific rather than a blank cursor.

# Install and test
python commit_message_ai.py --install
git add .
git commit  # Your editor opens with: "feat(auth): add JWT refresh token rotation"

Caveat: The hook reads staged changes from git diff --cached. If you stage everything with git add ., the message covers the full diff. If you stage selectively with git add -p, the message covers only what you've staged. Either works — the message reflects what's actually in the commit.

Script 5: prompt_chain.py

What it does: Runs multi-step AI workflows defined in YAML configuration files.

python prompt_chain.py chains/code_review.yaml --input src/app.py
python prompt_chain.py chains/bug_analysis.yaml --input "Login fails on Safari with SSO enabled"
python prompt_chain.py chains/refactor_plan.yaml --input src/legacy_module.py

Single-prompt interactions work well for isolated tasks. Multi-step analysis — where the output of one prompt feeds into the next — is more powerful for complex problems. prompt_chain.py lets you define these pipelines in YAML without writing Python to wire them together.

A simple chain (included in the toolkit at chains/code_review.yaml):

name: "Staged Code Review"
description: "Deep code review: issues first, then fix suggestions, then a summary"
steps:
  - name: "identify_issues"
    prompt: |
      Analyze this code and list every issue you find — bugs, security concerns,
      performance problems, style violations. Be specific and exhaustive.
      Code:
      {{input}}
    temperature: 0.2

  - name: "generate_fixes"
    prompt: |
      For each issue identified in the previous analysis, provide a concrete fix.
      Show the specific change needed, not general advice.
      Issues identified:
      {{identify_issues}}
    temperature: 0.3

  - name: "write_summary"
    prompt: |
      Write a two-paragraph summary of the code review: what the code does well,
      and what needs attention before merging. Be direct.
      Full analysis:
      {{generate_fixes}}
    temperature: 0.4
    max_tokens: 512

Run it:

python prompt_chain.py chains/code_review.yaml --input src/payment_processor.py

The chains directory in the toolkit includes five pre-built chains:

code_review.yaml — the staged review above
bug_analysis.yaml — issue description → root cause hypotheses → debugging steps
architecture_review.yaml — design analysis → trade-offs → recommendations
test_strategy.yaml — module analysis → test cases → mock requirements
refactor_plan.yaml — code smells → refactor candidates → prioritized action plan

You can modify these or build your own. The YAML format supports variable interpolation, step-level temperature and token limits, and referencing any previous step's output in a later prompt via {{step_name}}.

Putting them together

A realistic workflow for a backend Python module:

# After writing a new module
python test_generator.py src/new_feature.py --output tests/test_new_feature.py
python -m pytest tests/test_new_feature.py -v  # check what it caught

# Before committing
python ai_code_reviewer.py src/new_feature.py --format markdown > .review.md
cat .review.md  # address anything critical

# When committing
git add src/new_feature.py tests/test_new_feature.py
python commit_message_ai.py --apply  # generates and commits with AI message

# If the module is going to be maintained by others
python doc_generator.py src/new_feature.py --output docs/new_feature.md

None of these steps are required. Each one is independently useful. The combination is the point — it removes the friction from the parts of Python development that aren't writing the actual logic.

Setup

All five scripts require only requests:

pip install requests

For Claude (default):

export ANTHROPIC_API_KEY=your_key_here

For GPT:

export OPENAI_API_KEY=your_key_here

Switch models at the command line with --model gpt or --model claude.

The scripts are in scripts/ inside the AI Dev Toolkit. The prompt chains are in scripts/chains/. The 272 prompt library from Part 1 is in prompts/, organized by category.

What's in the full toolkit

The AI Dev Toolkit includes:

272 prompts organized across 10 categories (code generation, debugging, architecture, testing, code review, DevOps, documentation, data/SQL, frontend, AI integration)
5 CLI scripts covered in this article
5 pre-built prompt chains for multi-step workflows
Works with Claude and GPT — bring your own API key

Available at kazdispatch.gumroad.com for $29. Less than an hour of a developer consultant's time.

If you found Part 1 useful, Part 2 is the operational half. The prompts give you the right questions; the scripts give you the automation to run them without friction.

Part 1 of this series: Python Developer AI Toolkit, Part 1: How I stopped rewriting the same prompts and packaged 272 that actually work — update URL after Part 1 publishes

Next in the series: pytest fixtures that actually scale — patterns from 2 years of Python CI pipelines

150+ regex patterns for Python developers: stop rebuilding the same wheel

Peyton Green — Mon, 20 Apr 2026 10:00:05 +0000

Every Python developer I know has a folder somewhere called "regex_snippets" or "useful_patterns" or "patterns_to_remember.txt". Mine grew to 47 files over six years. Half of them were wrong. Three of them were the same URL pattern, each slightly different.

The problem with regex isn't that it's hard to write. It's that it's hard to remember. The syntax is compact enough that you forget it between uses. You Google the same log parser every two months. You copy the same email validator from the same Stack Overflow answer you copied from last year.

I finally sat down and went back through two years of production code and pulled out every regex pattern I'd written more than twice. Added the patterns I keep Googling. Got 150+. Organized them by category, added plain-English explanations, documented the edge cases, and packaged it as something I can actually keep open.

That's the Regex Master Pack.

What's in it

8 cheatsheets covering:

Core syntax (characters, quantifiers, anchors, groups, flags — the complete reference on one page)
Lookaheads and lookbehinds — the stuff that trips everyone up
Common gotchas (greedy vs lazy, backtracking, catastrophic patterns, Unicode pitfalls)
Language differences (Python, JavaScript, Go, Java, Rust, PCRE)
Performance guide (writing fast regex, when NOT to use regex)
Regex in CLI tools (grep, sed, awk, ripgrep, perl one-liners)
Regex in editors (VS Code, Vim, JetBrains, Sublime find-and-replace)
Testing and debugging (how to test regex, debug backtracking, workflows)

150+ patterns in 10 categories:

Category	Count
Email, URL, web	18
Dates and times	16
Numbers and currency	15
Phone and address	14
Passwords and auth	10
Code parsing	20
Log parsing	18
Data extraction	15
Text processing	14
DevOps and infra	15

Every pattern includes the regex, a plain-English explanation of each token, example test strings (matches and non-matches), Python usage, and documented edge cases.

5 interactive Python scripts (stdlib only, no pip installs):

regex_tester.py — interactive REPL, paste a pattern and test strings, see matches highlighted with group details
regex_explainer.py — feed in any regex, get a plain-English breakdown of every token
log_parser_generator.py — describe your log format, get a working regex + Python parser
pattern_search.py — search the entire pattern library by keyword or category
regex_quiz.py — 50 progressive challenges to build muscle memory

A few examples

Log parsing: Apache/Nginx combined format

This is the one I rebuild from memory every time I need it. Now I don't:

import re

APACHE_COMBINED = r'^(\S+)\s+\S+\s+(\S+)\s+\[([^\]]+)\]\s+"(\S+)\s+(\S+)\s+(\S+)"\s+(\d{3})\s+(\d+|-)\s+"([^"]*)"\s+"([^"]*)"'

log_line = '192.168.1.1 - frank [10/Oct/2023:13:55:36 -0700] "GET /api/users HTTP/1.1" 200 1234 "https://example.com" "Mozilla/5.0"'

m = re.match(APACHE_COMBINED, log_line)
if m:
    ip, user, timestamp, method, path, proto, status, bytes_, referer, ua = m.groups()
    print(f"{method} {path} → {status}")
    # GET /api/users → 200

The pattern captures: IP, authenticated user, timestamp, HTTP method, path, protocol, status code, bytes transferred, referer, user agent — named groups version also in the pack.

Code parsing: Python function signatures

Useful for static analysis, documentation generators, or any tooling that needs to understand Python code structure:

PYTHON_FUNC = r'^\s*(?:async\s+)?def\s+(\w+)\s*\(([^)]*)\)\s*(?:->\s*([^:]+))?\s*:'

examples = [
    'def simple(x, y):',
    'async def fetch_data(url: str, timeout: int = 30) -> dict:',
    '    def nested(self) -> None:',
]

for line in examples:
    m = re.search(PYTHON_FUNC, line, re.MULTILINE)
    if m:
        name, params, return_type = m.group(1), m.group(2), m.group(3)
        print(f"fn={name}, params={params!r}, returns={return_type!r}")

Handles sync and async, optional return type annotation, indented (nested/method) definitions.

Data extraction: YAML frontmatter

Every static site generator, Jekyll template, and Hugo theme uses this. Here's a pattern that actually handles the edge cases:

YAML_FRONTMATTER = r'^---\s*\n(.*?)\n---\s*\n'

content = """---
title: My Article
tags: python, regex
published: true
---

Article body starts here.
"""

m = re.search(YAML_FRONTMATTER, content, re.DOTALL)
if m:
    frontmatter = m.group(1)
    # Parse the captured YAML string with yaml.safe_load()

The re.DOTALL flag is required — without it, . won't match newlines and the pattern fails on multi-line frontmatter.

DevOps: semantic versioning

SEMVER = r'^(?P<major>0|[1-9]\d*)\.(?P<minor>0|[1-9]\d*)\.(?P<patch>0|[1-9]\d*)(?:-(?P<prerelease>[a-zA-Z0-9.-]+))?(?:\+(?P<buildmeta>[a-zA-Z0-9.-]+))?$'

versions = ['1.2.3', '2.0.0-alpha.1', '1.0.0+build.42', '1.0.0-beta+exp.sha.5114f85']

for v in versions:
    m = re.match(SEMVER, v)
    if m:
        print(f"{v} → major={m.group('major')}, pre={m.group('prerelease')}")

Named groups make the captures readable. The pack includes variants for loose semver (accepts v1.2.3 prefix) and range matching (for tools like npm/pip that use >=, ~=, ^).

The part that actually saves time: `regex_explainer.py`

The interactive tools are the piece I use most. regex_explainer.py takes any regex and breaks it down:

$ python scripts/regex_explainer.py '(?:https?://)?(?:www\.)?([^/\s]+)'

Pattern: (?:https?://)?(?:www\.)?([^/\s]+)

  (?:         Non-capturing group start
    https?    Literal 'http', then 's' is optional (? = zero or one)
    ://       Literal '://'
  )?          End non-capturing group, entire group is optional
  (?:         Non-capturing group start
    www\.     Literal 'www' + escaped dot (. in regex matches anything; \. matches only a literal dot)
  )?          End non-capturing group, optional
  (           Capturing group 1 start
    [^/\s]+   One or more characters that are NOT '/' and NOT whitespace
  )           Capturing group 1 end

This is the tool I wish I had when I was learning regex. It's also useful for auditing patterns someone else wrote.

`pattern_search.py` — the reference you'll actually use

$ python scripts/pattern_search.py "log"

Found 18 results in category 'log-parsing':
  [01] Apache/Nginx Combined Log
  [02] Apache/Nginx Common Log
  [03] Syslog RFC 3164
  [04] Syslog RFC 5424
  ...

$ python scripts/pattern_search.py --category "code"

Found 20 results in category 'code-parsing':
  [01] Python Function Definition
  [02] JavaScript/TypeScript Function
  [03] Python Class Definition
  ...

Who this is for

If you write Python and regularly need to parse logs, validate input, extract structured data from text, or build any tooling that touches text at all — this is the reference that replaces the pile of Stack Overflow bookmarks.

Backend developers parsing logs and validating API inputs. DevOps engineers writing grep/sed pipelines. Data engineers cleaning text data. Anyone who uses regex weekly but reaches for Google every time.

The pack

Regex Master Pack on Gumroad → — $19, one-time purchase.

150+ patterns, 8 cheatsheets, 5 Python scripts. Markdown format (works in any editor, terminal, or note-taking app). Patterns also in JSON for programmatic access. Python 3.8+, stdlib only.

30-day refund if it's not useful.

See also: Python Automation Cookbook — 25 production-ready Python scripts for the automation tasks you keep rebuilding.

openai-agents 0.13.x Silently Dropped openai v1 Support — Here's What Breaks

Peyton Green — Fri, 17 Apr 2026 21:00:05 +0000

openai-agents 0.13.2 Silently Dropped openai v1 Support — Here's What Breaks

Status: READY TO STAGE
Written: 2026-03-27 (W100)
Based on: openai-agents 0.13.2 (released 2026-03-26T23:57Z), ADAM W123 landscape scan
Target slot: April 21-23, 2026 (or sooner if v1→v2 friction shows up in community)
Product: AI Dev Toolkit ($29) — https://kazdispatch.gumroad.com/l/zqeopc
Pre-publish checklist:

[x] Replace [POLAR_OR_GUMROAD_LINK] with live product URL (filled 2026-03-27 W126: kazdispatch.gumroad.com/l/zqeopc)
[x] Verify openai-agents still on 0.13.x at time of publish — confirmed 0.13.5 latest as of 2026-04-06 (patch release, no breaking changes; 0.14.0 still pending — nest_handoff_history rename ships there)
[ ] Check if a2a-sdk 1.0.0 has shipped by publish date — if so, add a note to the A2A section
[ ] Check if nest_handoff_history rename has shipped in 0.14.0 — if so, add a second section for that breaking change
[ ] Verify no duplicate coverage from other recent articles
[ ] Stage to Dev.to draft, dispatch kaz to schedule

Article

openai-agents 0.13.2 shipped on March 26th. If you're running openai v1.x in the same environment, your agents are now broken.

No deprecation warning. No migration guide. Just a new dependency requirement in PyPI metadata that says openai<3,>=2.26.0 — and your pip install openai-agents either fails with a conflict error or silently upgrades openai to 2.x and breaks your existing openai client code.

Here's exactly what changed and how to fix it in about 10 minutes.

What Actually Changed

Before 0.13.2, openai-agents was compatible with both openai v1.x and v2.x. Starting with 0.13.2, it requires openai>=2.26.0 and explicitly drops support for v1.

This is relevant if you:

Have a project that pinned openai==1.x.x or openai<2
Inherited a codebase that hasn't been updated since late 2024
Are running openai-agents alongside other packages that still depend on openai v1

The practical effect: running pip install openai-agents --upgrade in an existing environment will now either fail (if pip can't resolve the conflict) or force openai to 2.x — where some of your existing client-side code may stop working.

What openai v2 Actually Changed

The openai v2 library (released November 2024) restructured the client API significantly:

# openai v1 — you might have code that looks like this
import openai

openai.api_key = "sk-..."
response = openai.ChatCompletion.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

# openai v2+ — the correct pattern
from openai import OpenAI

client = OpenAI(api_key="sk-...")  # explicit client instantiation
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

The top-level openai.ChatCompletion.create() call no longer exists in v2. Same with openai.Completion.create(), the old embedding patterns, and the module-level api_key assignment.

If your codebase uses the v1 API directly alongside openai-agents, you have two problems to fix: the dependency conflict and the client code pattern.

The Fast Fix

Step 1: Check where you stand

pip show openai | grep Version
pip show openai-agents | grep Version

If openai is below 2.26.0 and openai-agents is 0.13.2 or newer, you have a conflict.

Step 2: Upgrade openai

pip install "openai>=2.26.0"

Or if you use a requirements file:

# requirements.txt — update this line
openai>=2.26.0

Step 3: Audit your direct openai calls

The most common v1 patterns that break in v2:

# BROKEN in v2
openai.api_key = os.environ["OPENAI_API_KEY"]
response = openai.ChatCompletion.create(...)

# FIXED
from openai import OpenAI
client = OpenAI()  # reads OPENAI_API_KEY from env automatically
response = client.chat.completions.create(...)

# BROKEN in v2
embeddings = openai.Embedding.create(input=texts, model="text-embedding-3-small")
result = embeddings["data"][0]["embedding"]

# FIXED
from openai import OpenAI
client = OpenAI()
response = client.embeddings.create(input=texts, model="text-embedding-3-small")
result = response.data[0].embedding

The pattern is consistent: every top-level openai.SomeThing.create() call becomes client.some_thing.create() on an explicit OpenAI() instance.

openai-agents Code Itself Doesn't Change

The good news: if you're writing openai-agents code (defining agents, tools, handoffs), none of that changes. The openai-agents API is stable across this version bump. The v2 requirement is about the underlying HTTP client, not the agent framework API.

# This openai-agents code works exactly the same before and after 0.13.2
from agents import Agent, Runner

agent = Agent(
    name="assistant",
    instructions="You are a helpful assistant.",
)

result = Runner.run_sync(agent, "What's the weather today?")
print(result.final_output)

The only thing that changed is what version of openai is running underneath.

What's Still Coming in 0.14.0

This isn't the only breaking change on the horizon. The nest_handoff_history parameter is being renamed in an upcoming 0.14.0 release — a separate change that hasn't shipped yet. If you're using that parameter in handoff configurations, watch the changelog.

The Broader Pattern

The openai-agents library moves fast — 0.13.0 through 0.13.2 shipped across four days in late March. If you're depending on it in production, pinning to a specific version and testing upgrades before deploying is worth the five minutes it takes to set up.

A dependency audit tool can flag these conflicts before they hit your CI pipeline. If you're running more than three or four AI libraries in the same environment, you'll hit this pattern regularly.

[The AI Dev Toolkit includes a dependency-audit prompt set that checks for version conflicts across openai, anthropic, and the major agent frameworks — https://kazdispatch.gumroad.com/l/zqeopc]

Summary

openai-agents 0.13.2 (March 26) dropped openai v1 support
Requires openai>=2.26.0 — no deprecation period
Fix: upgrade openai and update any direct v1-style API calls
Your openai-agents agent definitions don't change
nest_handoff_history rename is a separate 0.14.0 story, not yet shipped

Stop writing edge case tests. Let Hypothesis find them instead.

Peyton Green — Tue, 14 Apr 2026 17:00:05 +0000

Three years ago I wrote 47 edge case tests for a URL normalizer. Every test followed the same pattern: I thought of a weird input, wrote the test, confirmed the code handled it.

Hypothesis wrote 10,000 edge cases for the same function in 30 seconds and found a bug I'd never have thought to write a test for.

The bug: the normalizer silently returned an empty string when given a URL consisting entirely of whitespace. My 47 tests all used inputs that looked like URLs. Hypothesis found the whitespace case because it's not trying to think of inputs — it's trying to break your function by generating inputs at the boundary of your constraints.

That's the core shift. You stop describing examples and start describing properties.

What property-based testing actually is

A conventional test says: "given this specific input, I expect this specific output."

A property-based test says: "for any input matching this description, this invariant should always hold."

With Hypothesis, you write that invariant, and Hypothesis generates hundreds or thousands of inputs to try to falsify it. If it finds a failing case, it shrinks the input to the minimal example that still fails — so you get a small, debuggable failing case, not just "it broke on a 5,000-character string."

from hypothesis import given, strategies as st

# Conventional test
def test_round_trip_specific():
    assert decode(encode("hello world")) == "hello world"

# Property-based test
@given(st.text())
def test_round_trip_any_string(s):
    assert decode(encode(s)) == s

The second test runs hundreds of times with different inputs. If encode/decode has a bug with unicode characters, empty strings, null bytes, or strings that are exactly 256 characters long, Hypothesis will find it.

Installing Hypothesis (no account required)

pip install hypothesis

That's it. No API key, no signup, no service to configure. Hypothesis is a pure Python library — it generates inputs locally, shrinks failures locally, stores its example database locally. The full power of property-based testing with zero external dependencies.

# If you're using pytest (which you should be):
pip install hypothesis pytest

The @given decorator integrates directly with pytest. Run your test suite the same way you always do — pytest picks up Hypothesis tests automatically.

Writing your first property

The hardest part of property-based testing is the mental shift from "what specific input should I test?" to "what should always be true?"

Three properties that apply to almost every function:

1. Round-trip invariants — encode/decode, serialize/deserialize, compress/decompress

@given(st.text())
def test_json_round_trip(data):
    # Any string that survives JSON serialization should survive a round trip
    import json
    try:
        serialized = json.dumps({"value": data})
        result = json.loads(serialized)
        assert result["value"] == data
    except (ValueError, TypeError):
        pass  # Some strings can't be JSON-serialized; that's expected

2. Idempotency — applying an operation twice produces the same result as applying it once

@given(st.text())
def test_normalize_idempotent(url):
    once = normalize_url(url)
    twice = normalize_url(once)
    assert once == twice

3. Monotonicity — a sort always produces a result no longer than the input, a filter never produces more items than it received

@given(st.lists(st.integers()))
def test_filter_shrinks_list(items):
    result = [x for x in items if x > 0]
    assert len(result) <= len(items)

These are starting points. As you get comfortable, you'll start finding properties specific to your domain — and those domain-specific properties are where property-based testing earns its keep.

Hypothesis strategies: describing your input space

The strategies module (st) is how you describe what kind of inputs Hypothesis should generate. Some useful ones:

from hypothesis import strategies as st

# Basic types
st.integers()                   # any integer
st.integers(min_value=0)        # non-negative integers
st.floats(allow_nan=False)      # floats, excluding NaN
st.text()                       # any unicode text
st.text(alphabet=st.characters(whitelist_categories=('Lu', 'Ll', 'Nd')))  # alphanumeric
st.binary()                     # bytes
st.booleans()

# Collections
st.lists(st.integers())                            # list of integers
st.lists(st.text(), min_size=1, max_size=50)       # bounded list
st.dictionaries(st.text(), st.integers())          # dict with text keys, int values
st.tuples(st.integers(), st.text())                # fixed-structure tuple

# Composing strategies
st.one_of(st.text(), st.none())                    # text or None
st.builds(MyDataClass, name=st.text(), age=st.integers(min_value=0, max_value=150))

The builds strategy is particularly useful — it generates instances of your data classes or Pydantic models by generating each field separately.

from hypothesis import given, strategies as st
from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

@given(st.builds(User, name=st.text(min_size=1), age=st.integers(min_value=0, max_value=150)))
def test_user_serialization_round_trip(user):
    assert User.model_validate_json(user.model_dump_json()) == user

The shrinking superpower

When Hypothesis finds a failing input, it doesn't report the raw generated input. It shrinks it — tries progressively simpler inputs until it finds the minimal case that still triggers the failure.

This is the feature that makes property-based testing practical. Without shrinking, a failure report might be "failed on a 3,000-character string with these properties." With shrinking, it's "failed on the string '\x00'."

Falsifying example: test_normalize_idempotent(
    url='',  # <-- Hypothesis shrunk this to the minimal failing case
)

You don't need to do anything to enable shrinking — it's automatic for all built-in strategies. Custom strategies built with st.composite and standard combinators also shrink automatically.

The example database

Hypothesis stores failing examples in a local .hypothesis/ directory. The next time you run the test suite, Hypothesis tries those failing cases first, before generating new ones. This means:

Once you've found a bug, the regression test for that bug is implicit — Hypothesis will always try the minimal failing input on future runs.
After you fix the bug and the test passes, Hypothesis removes the example from the database.

Commit .hypothesis/ to version control to share failing examples across the team.

# .gitignore — do NOT add .hypothesis to this
# .hypothesis/  <-- leave this line out

Integrating with your existing pytest suite

Hypothesis tests look like pytest tests with an extra decorator. The integration is seamless:

# test_normalizer.py

import pytest
from hypothesis import given, settings, strategies as st
from myapp.normalizer import normalize_url

# Conventional test — still useful for documenting expected behavior
def test_normalize_removes_trailing_slash():
    assert normalize_url("https://example.com/") == "https://example.com"

# Property test — finds the bugs the conventional tests miss
@given(st.text())
def test_normalize_idempotent(url):
    """Normalizing twice should produce the same result as normalizing once."""
    assert normalize_url(normalize_url(url)) == normalize_url(url)

@given(st.text(min_size=1))
def test_normalize_never_empty_on_nonempty_input(url):
    """A non-empty input should never produce an empty normalized URL."""
    result = normalize_url(url)
    # It's okay to return a default URL, but not an empty string
    assert result != ""

Run with pytest as usual. Hypothesis tests are automatically collected and run.

Settings: controlling how hard Hypothesis tries

By default Hypothesis runs each test with a varying number of examples (roughly 100). You can tune this:

from hypothesis import given, settings, strategies as st

@settings(max_examples=1000)   # Try 1,000 examples instead of 100
@given(st.text())
def test_important_property(s):
    ...

@settings(max_examples=50)    # Faster, for properties you're less worried about
@given(st.lists(st.integers()))
def test_basic_property(items):
    ...

In CI, you might want to run with max_examples=500 for important functions and the default elsewhere. The settings can also be configured globally via environment variable or profile.

# conftest.py — set a default for all Hypothesis tests in this suite
from hypothesis import settings
settings.register_profile("ci", max_examples=500)
settings.register_profile("local", max_examples=100)

import os
settings.load_profile(os.getenv("HYPOTHESIS_PROFILE", "local"))

Then in CI: HYPOTHESIS_PROFILE=ci pytest

Where property-based testing earns its keep

Property-based testing is not a replacement for conventional tests. The combination is stronger than either alone:

Use conventional tests for:

Documenting expected behavior with specific examples
Testing known edge cases you've already thought of
Regression tests for specific bugs (the bug had a specific input — document it)

Use Hypothesis for:

Functions that transform or process data (parsers, normalizers, serializers)
Functions with invariants that should hold across all inputs (sort stability, round-trip correctness)
Functions at system boundaries where the input space is large or unpredictable
Finding the bugs you didn't know to look for

The ratio I've landed on: write the conventional tests first to document behavior, then add one or two @given tests per function that exercise an invariant. The Hypothesis tests don't replace the conventional tests — they find the failures the conventional tests can't anticipate.

The three-line addition that catches the most bugs

If you do nothing else with Hypothesis, add this pattern to your parsers and data-processing functions:

@given(st.text())
def test_does_not_crash(s):
    """The function should handle any input without raising an unexpected exception."""
    try:
        result = my_function(s)
        # If it returns normally, the result should be a valid type
        assert isinstance(result, (str, type(None)))
    except ValueError:
        pass  # ValueError is expected for invalid inputs
    except Exception as e:
        # Anything else is a bug
        raise AssertionError(f"Unexpected exception for input {s!r}: {e}") from e

This doesn't assert correctness — it just asserts that the function doesn't blow up with an unhandled exception on arbitrary input. It catches a surprising number of bugs in functions that were "only ever called with valid data" — right up until they weren't.

What you get: the actual coverage difference

My 47-test URL normalizer test suite covered 47 inputs. Running it with Hypothesis at the default 100 examples means roughly 150 inputs total: my 47 plus 100 generated ones.

That's not the point. The 100 generated inputs aren't random — Hypothesis specifically targets:

Empty strings
Strings with only whitespace
Very long strings
Strings with unicode characters outside ASCII
Strings with null bytes
Strings that look like numbers
Strings at exact power-of-two lengths

It targets the boundary conditions that matter for a text-processing function. My 47 tests didn't include any of those, because I was writing example inputs, not adversarial ones.

The whitespace bug I mentioned at the start? Hypothesis found it in the first run. It had been there for eight months.

Getting started (30 minutes)

Pick one function in your codebase that:

Takes a string or collection as input
Returns a transformed version or True/False
Has an invariant you can state in one sentence ("it should always return a non-empty string", "the output should be a subset of the input", "applying it twice should equal applying it once")

Write the Hypothesis test for that invariant. Run it. See what Hypothesis finds.

The learning curve is the mental shift from example thinking to property thinking. Once you've done it once, you start seeing properties everywhere.

pip install hypothesis

No account. No API key. No service to configure. Local, fast, and it catches the bugs you didn't know to look for.

Part 1 of this series: LocalStack Now Requires an Account — Here's How to Test AWS in Python Without One

Part 2 of this series: pytest fixtures that actually scale — coming April 7

The Automation Cookbook ($39) includes a companion section on test automation patterns — pytest fixtures, mocking strategies, and CI pipeline setup. Available on Gumroad.

FastAPI + MCP: Adding Real OAuth 2.1 Auth to Your Python MCP Server

Peyton Green — Fri, 10 Apr 2026 10:25:02 +0000

In the nine days after the MCP Dev Summit, NVD recorded 20 new MCP CVEs. Auth validation failures are the dominant pattern.

Two examples: CVE-2025-6514 — command injection in mcp-remote, CVSS 9.6, 500,000 downloads. CVE-2026-32211 — the Azure MCP Server's SSE transport had zero authentication. Attack chain: enumerate tools, call the shell-passthrough tool (azmcp-extension-az), write a script to ~/.bashrc, exfiltrate Entra ID credentials. Full tenant takeover. Microsoft CVSS: 9.1 CRITICAL. Root cause: CWE-306 — Missing Authentication for Critical Function. (NVD scores it 7.5 HIGH, reflecting only the confidentiality vector; Microsoft's score adds the integrity impact of the full tenant compromise.)

Twenty CVEs in nine days. Auth isn't optional hardening for MCP servers.

The summit ran April 2–3. Six sessions dedicated to authentication. Aaron Parecki — OAuth 2.1 spec author, Director of Identity Standards at Okta — headlined one of them: "Evolution, Not Revolution: How MCP Is Reshaping OAuth." The consistent message from that track: standard OAuth 2.1 done right, not a new scheme invented for MCP.

Of 518 audited production MCP servers, 41% have zero authentication. Not weak auth — none at all. Not because it's hard. The Python SDK has shipped full OAuth 2.1 support since v1.21. The spec is stable. The problem is that nobody connected mcp.server.auth to how Python developers actually build APIs.

This article does that.

What MCP Auth Actually Looks Like (The Quick Version)

Before we get into code: MCP uses OAuth 2.1. Not an MCP-specific auth scheme — standard OAuth 2.1 with PKCE. If you've implemented OAuth before, most of this will be familiar.

The Python SDK's mcp.server.auth module handles:

Authorization server metadata (RFC 8414)
Authorization endpoint (redirects for browser-based flows)
Token endpoint (exchange code for access token, refresh tokens)
PKCE (Proof Key for Code Exchange — required in OAuth 2.1)
Dynamic Client Registration (RFC 7591)

What you provide:

The actual user authentication (how you verify a user is who they say they are)
The authorization decision (which clients can access which resources)
Token storage

That boundary — SDK handles protocol, you handle identity — is the key to making this not terrible to implement.

The Setup

pip install "mcp[auth]>=1.27.0" fastapi uvicorn python-jose[cryptography] passlib[bcrypt]

We'll build:

A FastAPI app as the OAuth 2.1 authorization server
An MCP server with authentication required
A simple client that goes through the OAuth flow

The MCP Server with Auth

Start with the MCP server. This part is almost entirely handled by the SDK:

# mcp_server.py
from mcp.server import Server
from mcp.server.auth import OAuthAuthorizationServerProvider, AuthSettings
from mcp.server.fastmcp import FastMCP
from typing import Any

# Define what the auth server needs to know about your MCP server
auth_settings = AuthSettings(
    issuer_url="http://localhost:8000",  # Your FastAPI app's URL
    resource_server_url="http://localhost:8001",  # This MCP server's URL
    client_registration_options=ClientRegistrationOptions(
        enabled=True,
        valid_scopes=["read", "write"],
        default_scopes=["read"],
    ),
)

mcp = FastMCP(
    "My Protected MCP Server",
    auth=auth_settings,
)

@mcp.tool()
async def get_data(query: str) -> str:
    """Retrieve data — requires authentication."""
    return f"Protected data for query: {query}"

@mcp.tool()
async def write_data(content: str) -> str:
    """Write data — requires write scope."""
    return f"Wrote: {content}"

The auth_settings tells the MCP server to require OAuth 2.1 tokens and where to find the authorization server. The SDK enforces this — unauthenticated requests get a 401.

The FastAPI Authorization Server

This is the part nobody has documented. Here's how to wire FastAPI as the OAuth 2.1 authorization server that the MCP SDK expects:

# auth_server.py
from fastapi import FastAPI, Depends, HTTPException, Request, Form
from fastapi.responses import RedirectResponse, JSONResponse
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from jose import jwt, JWTError
from passlib.context import CryptContext
from datetime import datetime, timedelta
from typing import Optional
import secrets
import time

app = FastAPI(title="MCP Authorization Server")

# --- Config ---
SECRET_KEY = secrets.token_hex(32)  # In production: load from env, don't rotate
ALGORITHM = "RS256"  # OAuth 2.1 recommends RS256 or ES256
ACCESS_TOKEN_EXPIRE_MINUTES = 30

pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")

# In-memory stores — replace with your database
users_db: dict = {
    "alice": {
        "username": "alice",
        "hashed_password": pwd_context.hash("secret"),
        "scopes": ["read", "write"],
    }
}

# Auth code store: code -> {client_id, redirect_uri, scope, code_challenge, user_id, expires}
auth_codes: dict = {}

# Token store: access_token -> {user_id, scopes, expires}
active_tokens: dict = {}


# --- OAuth 2.1 Discovery Endpoints ---

@app.get("/.well-known/oauth-authorization-server")
async def authorization_server_metadata():
    """RFC 8414 — Authorization Server Metadata.

    The MCP SDK fetches this to discover all OAuth endpoints.
    Every field here must be accurate or the SDK will reject the server.
    """
    return {
        "issuer": "http://localhost:8000",
        "authorization_endpoint": "http://localhost:8000/oauth/authorize",
        "token_endpoint": "http://localhost:8000/oauth/token",
        "registration_endpoint": "http://localhost:8000/oauth/register",
        "scopes_supported": ["read", "write"],
        "response_types_supported": ["code"],
        "grant_types_supported": ["authorization_code", "refresh_token"],
        "code_challenge_methods_supported": ["S256"],  # PKCE required in OAuth 2.1
        "token_endpoint_auth_methods_supported": ["none"],  # Public clients (PKCE)
    }


# --- Dynamic Client Registration (RFC 7591) ---

@app.post("/oauth/register")
async def register_client(request: Request):
    """MCP clients register themselves before the first auth flow.

    RFC 7591 Dynamic Client Registration — the SDK calls this automatically.
    """
    body = await request.json()

    client_id = secrets.token_urlsafe(16)
    # Public clients don't have secrets — PKCE handles security

    return {
        "client_id": client_id,
        "client_id_issued_at": int(time.time()),
        "redirect_uris": body.get("redirect_uris", []),
        "grant_types": ["authorization_code"],
        "response_types": ["code"],
        "scope": "read write",
        "token_endpoint_auth_method": "none",
    }


# --- Authorization Endpoint ---

@app.get("/oauth/authorize")
async def authorize(
    response_type: str,
    client_id: str,
    redirect_uri: str,
    scope: str = "read",
    state: Optional[str] = None,
    code_challenge: str = "",
    code_challenge_method: str = "S256",
):
    """The user-facing authorization page.

    In a real app: show a login form, check session cookies, etc.
    Here we return a simple redirect with a login prompt in the URL.
    """
    if response_type != "code":
        raise HTTPException(400, "Only authorization_code flow supported")
    if code_challenge_method != "S256":
        raise HTTPException(400, "Only S256 PKCE supported (required by OAuth 2.1)")

    # Store the pending authorization — we'll complete it after login
    pending_id = secrets.token_urlsafe(16)
    auth_codes[f"pending_{pending_id}"] = {
        "client_id": client_id,
        "redirect_uri": redirect_uri,
        "scope": scope,
        "state": state,
        "code_challenge": code_challenge,
        "expires": time.time() + 300,
    }

    # In production: redirect to your actual login UI
    # Here: redirect to a simple login endpoint
    return RedirectResponse(
        f"/oauth/login?pending={pending_id}",
        status_code=302
    )


@app.post("/oauth/login")
async def complete_login(
    pending: str,
    username: str = Form(...),
    password: str = Form(...),
):
    """Complete the auth flow after user logs in."""
    pending_data = auth_codes.get(f"pending_{pending}")
    if not pending_data or time.time() > pending_data["expires"]:
        raise HTTPException(400, "Invalid or expired authorization request")

    # Verify credentials
    user = users_db.get(username)
    if not user or not pwd_context.verify(password, user["hashed_password"]):
        raise HTTPException(401, "Invalid credentials")

    # Issue authorization code
    code = secrets.token_urlsafe(32)
    auth_codes[code] = {
        **pending_data,
        "user_id": username,
        "issued_at": time.time(),
        "expires": time.time() + 60,  # Auth codes expire in 60s
    }
    del auth_codes[f"pending_{pending}"]

    # Redirect back to the MCP client
    redirect = pending_data["redirect_uri"]
    state_param = f"&state={pending_data['state']}" if pending_data.get("state") else ""
    return RedirectResponse(f"{redirect}?code={code}{state_param}", status_code=302)


# --- Token Endpoint ---

@app.post("/oauth/token")
async def token_endpoint(
    grant_type: str = Form(...),
    code: str = Form(None),
    redirect_uri: str = Form(None),
    client_id: str = Form(None),
    code_verifier: str = Form(None),  # PKCE verifier
    refresh_token: str = Form(None),
):
    """Exchange authorization code for access token.

    PKCE verification happens here — this is what makes public clients secure.
    """
    import hashlib, base64

    if grant_type == "authorization_code":
        if not code:
            raise HTTPException(400, "code required")

        code_data = auth_codes.get(code)
        if not code_data or time.time() > code_data["expires"]:
            raise HTTPException(400, "Invalid or expired authorization code")

        # PKCE verification (required in OAuth 2.1)
        if code_data.get("code_challenge") and code_verifier:
            expected = base64.urlsafe_b64encode(
                hashlib.sha256(code_verifier.encode()).digest()
            ).rstrip(b"=").decode()
            if expected != code_data["code_challenge"]:
                raise HTTPException(400, "PKCE verification failed")

        del auth_codes[code]
        user_id = code_data["user_id"]
        scopes = code_data["scope"].split()

    else:
        raise HTTPException(400, f"Unsupported grant_type: {grant_type}")

    # Issue access token
    access_token = _create_access_token(user_id, scopes)

    return {
        "access_token": access_token,
        "token_type": "bearer",
        "expires_in": ACCESS_TOKEN_EXPIRE_MINUTES * 60,
        "scope": " ".join(scopes),
    }


def _create_access_token(user_id: str, scopes: list[str]) -> str:
    payload = {
        "sub": user_id,
        "scopes": scopes,
        "exp": datetime.utcnow() + timedelta(minutes=ACCESS_TOKEN_EXPIRE_MINUTES),
        "iat": datetime.utcnow(),
        "iss": "http://localhost:8000",
    }
    return jwt.encode(payload, SECRET_KEY, algorithm="HS256")


# --- Token Introspection (the MCP server calls this to validate tokens) ---

@app.post("/oauth/introspect")
async def introspect_token(token: str = Form(...)):
    """RFC 7662 Token Introspection — the MCP server validates tokens here."""
    try:
        payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
        return {
            "active": True,
            "sub": payload["sub"],
            "scope": " ".join(payload.get("scopes", [])),
            "exp": payload["exp"],
        }
    except JWTError:
        return {"active": False}

Connecting MCP Server to Auth Server

The MCP server needs to know where to validate tokens. Update mcp_server.py:

# mcp_server.py (complete)
from mcp.server.fastmcp import FastMCP
from mcp.server.auth import AuthSettings, ClientRegistrationOptions, TokenVerifier
import httpx

class IntrospectionTokenVerifier(TokenVerifier):
    """Validates tokens by calling the auth server's introspection endpoint."""

    async def verify_token(self, token: str) -> dict:
        async with httpx.AsyncClient() as client:
            resp = await client.post(
                "http://localhost:8000/oauth/introspect",
                data={"token": token},
            )
            data = resp.json()

        if not data.get("active"):
            raise ValueError("Token is not active")

        return {
            "user_id": data["sub"],
            "scopes": data["scope"].split(),
        }

auth_settings = AuthSettings(
    issuer_url="http://localhost:8000",
    resource_server_url="http://localhost:8001",  # This MCP server's own URL
    client_registration_options=ClientRegistrationOptions(
        enabled=True,
        valid_scopes=["read", "write"],
        default_scopes=["read"],
    ),
)

mcp = FastMCP(
    "My Protected MCP Server",
    auth=auth_settings,
    token_verifier=IntrospectionTokenVerifier(),  # Top-level param, not inside AuthSettings
)

@mcp.tool()
async def get_data(query: str) -> str:
    """Retrieve data — requires read scope."""
    return f"Protected data for query: {query}"

Running It

# Terminal 1: Start the auth server
uvicorn auth_server:app --port 8000

# Terminal 2: Start the MCP server
mcp run mcp_server.py --port 8001

Connect with an MCP client (Claude Desktop, or any SDK client):

The client will discover the auth server via /.well-known/oauth-authorization-server
Complete the OAuth flow (authorize → login → token)
All subsequent tool calls will include the Bearer token automatically

Scope Enforcement (The Part Most Guides Skip)

Having auth is one thing. Enforcing scopes per-tool is another. Here's how to check scopes in your tool handlers:

from mcp.server.fastmcp import FastMCP, Context

@mcp.tool()
async def write_data(content: str, ctx: Context) -> str:
    """Write data — requires write scope."""
    # ctx.auth contains the verified token info from TokenVerifier
    user_scopes = ctx.auth.get("scopes", []) if ctx.auth else []

    if "write" not in user_scopes:
        raise PermissionError("write scope required")

    return f"Wrote: {content}"

Scope enforcement at the tool level means different clients can have different permissions — read-only service accounts, admin tools with write access, etc.

What You'd Change for Production

This example is deliberately minimal. In a real deployment:

Use RS256 or ES256 instead of HS256 — asymmetric signing means the MCP server can validate tokens without calling the introspection endpoint (the public key is enough)
Add refresh tokens — access tokens expire; refresh tokens let clients renew without forcing re-login
Replace in-memory stores — use Redis for auth codes/tokens, a proper database for clients and users
Add a real login UI — the /oauth/login endpoint here assumes a form submit; real apps use session cookies, SSO, or IdP delegation (Keycloak, Auth0, etc.)
Configure CORS — MCP clients running in browsers need appropriate CORS headers on all OAuth endpoints
Rate-limit token endpoints — brute-force protection on /oauth/token and /oauth/introspect
Add Resource Indicators (RFC 8707) — The June 2025 MCP spec update requires Resource Indicators to prevent token mis-redemption attacks. A malicious server can't redeem a token scoped to a specific resource against a different resource. Implementation: add "resource_indicators_supported": True to the discovery endpoint, and accept a resource parameter in token requests. For single MCP server deployments (like this example), the attack surface is limited — but if your authorization server issues tokens for multiple resources, this is a required security control.

The OAuth 2.1 protocol layer here is correct and spec-compliant. Item 7 is only required in multi-resource environments — single MCP server deployments can omit it.

Validate Host headers on all MCP endpoints — CVE-2025-66416 (Python MCP SDK, fixed in mcp>=1.23.0) showed that localhost MCP servers without Host header validation are reachable via DNS rebinding from any browser tab. Validate request.headers.get("host") against your expected hostname on every MCP endpoint. This applies to development servers too, not just production. Our pip pin (mcp>=1.27.0) is already above the patched version — but explicit Host header checks are defense-in-depth.
Prefer StreamableHTTP over bare SSE for anything externally accessible — The 2025-2026 MCP CVE cluster includes four vulnerabilities specifically targeting SSE transports, including CVE-2026-32211 (Azure — Microsoft removed the SSE transport entirely in their patch). StreamableHTTP with proper authentication is the hardened path for any MCP server that isn't strictly loopback.

A Note on FastMCP's OAuth Proxy

FastMCP v3.x ships a built-in OAuth proxy. If you're already using FastMCP, you might wonder whether to use that instead of wiring up the auth server manually.

The answer matters because of GHSA-5h2m-4q8j-pqpj — an unresolved security advisory against FastMCP's OAuth proxy as of this writing. The proxy doesn't respect the resource parameter in token requests, and issues tokens scoped to base_url rather than the specific MCP server. In a single-server deployment that's a limited exposure. In a multi-server deployment, it means a token issued for one server can be reused against another — which is precisely the attack that OAuth 2.1's Resource Indicators (RFC 8707) exist to prevent.

The pattern in this article — implementing OAuthAuthorizationServerProvider yourself — doesn't use the FastMCP proxy at all. It issues tokens correctly scoped to the resource server. Use this approach until the FastMCP advisory is resolved.

The 41% Problem

That production audit didn't find 41% zero-auth servers because developers don't care about security. They found it because nobody had written this guide.

The MCP Python SDK has had auth support since November 2025. The spec is stable. The reason 41% of production servers have zero auth is that "add OAuth 2.1 to your FastAPI MCP server" was undocumented, and undocumented things don't get implemented.

Hopefully this helps close that gap.

The patterns above handle auth. The Python MCP Production Server Kit ($79) adds multi-worker deployment (Redis session backend, Kubernetes manifests), a TestClient substitute for pytest — without subprocess spawning — and Prometheus monitoring. Everything you need to run this in production, not just in dev.

The Python Testing Toolkit: 4 Drop-In Files for Production pytest

Peyton Green — Thu, 09 Apr 2026 14:00:06 +0000

Every Python project eventually needs the same test infrastructure. You write conftest.py from scratch, reach for the same Hypothesis patterns, copy async fixture code from a Stack Overflow answer that's three major versions out of date.

I've done this enough times that I started keeping the files. This week I packaged them up.

The Python Testing Toolkit is available now on Gumroad: four production-ready Python files — conftest_production.py, hypothesis_strategies.py, async_test_patterns.py, and parametrize_factories.py. $49, one-time download, MIT license.

Here's exactly what's in each one.

`conftest_production.py` — the conftest you'd write if you had eight hours

Most conftest.py tutorials show you database fixtures or HTTP mocking. Almost none show both in a form that composes cleanly with async tests, FastAPI dependency overrides, and environment variable isolation.

This file handles all of it:

# Transaction rollback per test — no cleanup needed
def test_creates_user(db_session):
    user = User(name="Alice", email="alice@example.com")
    db_session.add(user)
    db_session.flush()

    result = db_session.get(User, user.id)
    assert result.name == "Alice"
    # rolls back automatically — nothing persists between tests

# FastAPI dependency overrides without boilerplate
def test_endpoint_with_mock_service(client_with_override):
    mock_service = MagicMock(return_value={"id": 1, "name": "Alice"})

    with client_with_override({get_user_service: lambda: mock_service}) as c:
        response = c.get("/users/1")
        assert response.status_code == 200

# Environment variables that reset between tests
def test_feature_flag(env_override):
    with env_override(ENABLE_BETA="true", DATABASE_URL="sqlite:///:memory:"):
        result = function_that_reads_env()
        assert result.used_beta_path
    # ENABLE_BETA is unset again here — no state leaks

The database fixture defaults to SQLite with transaction rollback per test. Swap to Postgres by setting a DATABASE_URL environment variable — the fixture handles both without modification. HTTP mocking covers both httpx (via respx) and requests (via responses) because most projects have both.

Drop this into your project root as conftest.py. pytest finds everything automatically.

`hypothesis_strategies.py` — stop using `.text()` for emails

Hypothesis ships with text(), integers(), floats(), dates(). These are correct types for testing algorithms. They're useless for testing application code that expects email addresses, usernames, financial amounts, or URL paths.

When Hypothesis generates "\x00\x7f\ud800" as a test email, your validator fails for the wrong reason. You fix the test to exclude those cases, not the code. The test becomes meaningless.

This file has 30+ strategies for the types that appear in every domain:

from hypothesis import given
from hypothesis_strategies import emails, usernames, monetary_amounts

@given(email=emails(), username=usernames())
def test_user_registration(email, username):
    user = User.create(email=email, username=username)
    assert user.email == email.lower()
    assert len(user.username) >= 3

from decimal import Decimal
from hypothesis_strategies import monetary_amounts, order_data

@given(amount=monetary_amounts(min_value=Decimal("0.01"), max_value=Decimal("10000.00")))
def test_payment_processing(amount):
    result = process_payment(amount)
    assert result.status == "success"
    assert result.charged == amount

# Generate complete valid payloads in one line
@given(registration=user_registration_data())
def test_registration_endpoint(client, registration):
    response = client.post("/register", json=registration)
    # Should be 201 (success) or 422 (validation error), never 500
    assert response.status_code in (201, 422)

The user_registration_data() strategy composes emails(), usernames(), and optionally phone_numbers_e164() into a valid registration dict. order_data() generates line items with Decimal amounts that sum correctly. Each strategy accepts parameters for tightening bounds when you need to test specific edge cases.

The Hypothesis article earlier in this series covers the basics. This file is the production-grade implementation.

`async_test_patterns.py` — the patterns that don't appear in the pytest-asyncio README

Async testing has three problems that show up together:

Your httpx.AsyncClient doesn't know about FastAPI's lifespan (startup/shutdown events)
Background tasks run after the response returns, so your assertions fire before the side effects settle
AsyncMock syntax is verbose and inconsistent across Python versions

# Lifespan-aware async client — startup and shutdown events fire correctly
async def test_create_post(async_http_client):
    response = await async_http_client.post("/posts", json={
        "title": "Hello",
        "body": "World",
    })
    assert response.status_code == 201

# Background tasks: drain before asserting side effects
async def test_welcome_email_sent(async_http_client, drain_background_tasks):
    response = await async_http_client.post("/users", json={"email": "new@example.com"})
    assert response.status_code == 201

    await drain_background_tasks()  # flush FastAPI's background task queue

    assert email_inbox.has_message_for("new@example.com")

# AsyncMock helpers — concise patterns for common cases
from async_test_patterns import async_mock_returning, async_mock_raising

async def test_service_timeout(async_http_client):
    import httpx

    with patch("myapp.external.fetch_data", async_mock_raising(httpx.TimeoutException(""))):
        response = await async_http_client.get("/data")
        assert response.status_code == 503

The file also includes async_db_session (async SQLAlchemy with SAVEPOINT rollback per test), assert_all_resolve() (run a list of coroutines in parallel and assert they all complete within a timeout), and a guide to event loop scope with the actual trade-offs documented.

The most common async test failure — "event loop is closed" — is fixed by adding two lines to pyproject.toml:

[tool.pytest.ini_options]
asyncio_mode = "auto"
asyncio_default_fixture_loop_scope = "session"

This file includes that config and explains why.

`parametrize_factories.py` — readable parametrize that doesn't collapse under its own weight

@pytest.mark.parametrize with more than four cases becomes a list of tuples that nobody can read six months later. The generated test IDs are test_function[param0-param1-param2], which tells you nothing about which case failed.

from parametrize_factories import Case, cases

@pytest.mark.parametrize("case", cases(
    Case("valid email", value="user@example.com", expected=True),
    Case("no at sign", value="notanemail", expected=False),
    Case("empty string", value="", expected=False),
    Case("missing domain", value="user@", expected=False),
))
def test_email_validation(case):
    assert is_valid_email(case.value) == case.expected

Output:

PASSED test_email_validation[valid email]
PASSED test_email_validation[no at sign]
FAILED test_email_validation[empty string]

# All combinations of dimensions — 9 test cases from 3+3 values
from parametrize_factories import matrix

@pytest.mark.parametrize("combo", matrix(
    method=["GET", "POST", "DELETE"],
    role=["admin", "viewer", "anonymous"],
))
def test_access_control(combo, client):
    response = make_request(method=combo["method"], role=combo["role"])
    assert response.status_code != 500

# Skip cases where environment isn't configured
from parametrize_factories import skip_if_missing
import os

@pytest.mark.parametrize("db_url", [
    skip_if_missing(os.getenv("POSTGRES_URL"), "POSTGRES_URL not set"),
    skip_if_missing(os.getenv("MYSQL_URL"), "MYSQL_URL not set"),
    "sqlite:///:memory:",
])
def test_database_connection(db_url):
    engine = create_engine(db_url)
    ...

No pytest-cases dependency — plain pytest throughout.

What it doesn't include

Not tutorial content. No explanations of what @pytest.fixture does or how scopes work. If you're new to pytest, pytest fixtures that actually scale covers that.

Not Django-specific. The HTTP mocking and parametrize factories work anywhere. The database and async fixtures are FastAPI-oriented. The Hypothesis strategies are framework-independent.

Not a subscription, SaaS integration, or CI dashboard. Four Python files. Download once, use in any project, forever.

Getting it

Python Testing Toolkit on Gumroad →

$49 one-time. MIT license. Includes all four files and the README with drop-in instructions, pyproject.toml configuration, and CI integration notes.

If you've been following the Testing Without the Subscription Tax series, this is the practical companion to the articles. The async testing article drops in September — the async_test_patterns.py file from this toolkit is the implementation behind the patterns in that article.

Questions? Drop them in the comments or reach me at @peytongreen_dev.

Part of the Testing Without the Subscription Tax series.

A2A v1.0.0 Is Live — What Changed and What It Means for Your Python Agents

Peyton Green — Wed, 08 Apr 2026 10:00:00 +0000

The A2A protocol hit v1.0.0 on March 12, 2026. I wrote a quickstart last week — two agents talking to each other locally, under 15 minutes. This is the follow-up: what actually changed in v1.0.0, and what it means for agents going beyond localhost.

The short version: v1.0.0 isn't just a version bump. It landed four things that matter for production deployments: signed agent identity, proper OAuth flows for headless agents, multi-tenancy, and paginated task listing. None of these appeared in v0.x.

The SDK (a2a-sdk on PyPI) is still at v0.3.25 stable. There's a v1.0.0a0 alpha if you want to live on the edge. That gap is actually useful context — it tells you what the spec can do that the SDK hasn't exposed yet, and where to plan ahead.

What v1.0.0 Added

The full changelog is on the a2aproject/A2A GitHub repo. Here's what matters practically.

1. Signed Agent Cards (JWS)

The Agent Card is how one agent discovers another: its name, capabilities, supported input/output types, and endpoint URL. In v0.x, Agent Cards were plain JSON — no authentication of the card itself. Any server could claim to be any agent.

v1.0.0 adds JWS (JSON Web Signature) to Agent Cards. An agent can now cryptographically sign its own card, and a caller can verify the signature before trusting the card.

Why this matters: In a multi-agent system with agents from different teams or vendors, you can't assume every agent card is legitimate. JWS verification gives you a trust root at the identity layer — before any task is delegated.

Implementation sketch (conceptual — SDK alpha required for full support):

import json
from jose import jws, jwk

def sign_agent_card(card: dict, private_key_pem: str) -> str:
    """Sign an Agent Card and return the JWS compact serialization."""
    key = jwk.construct(private_key_pem, algorithm="RS256")
    payload = json.dumps(card).encode()
    return jws.sign(payload, key, algorithm="RS256")

def verify_agent_card(token: str, public_key_pem: str) -> dict:
    """Verify a signed Agent Card and return the card dict."""
    key = jwk.construct(public_key_pem, algorithm="RS256")
    payload = jws.verify(token, key, algorithms=["RS256"])
    return json.loads(payload)

In practice, you'd embed the JWS token in the /.well-known/agent.json response, and clients verify before registering the agent in their registry. The SDK will expose this cleanly once v1.0.0 stable ships — for now, the pyJWT or python-jose approach works against the spec.

2. OAuth 2.0: Device Code Flow + PKCE

v0.x had basic OAuth 2.0 support. v1.0.0 modernized it in two specific ways that matter for agent deployments:

Device Code Flow (urn:ietf:params:oauth:grant-type:device_code): For agents that run headless — no browser, no interactive login, no user present. Instead of redirecting to a login page (which headless agents can't handle), the agent polls a device authorization endpoint while the user approves on a separate device.

import asyncio
import httpx
import time

async def device_code_auth(auth_server: str, client_id: str, scope: str) -> str:
    """Complete the OAuth 2.0 Device Code flow. Returns access token."""
    async with httpx.AsyncClient() as client:
        # Step 1: Request device code
        resp = await client.post(
            f"{auth_server}/device/code",
            data={"client_id": client_id, "scope": scope}
        )
        resp.raise_for_status()
        device_auth = resp.json()

        print(f"Go to: {device_auth['verification_uri']}")
        print(f"Enter code: {device_auth['user_code']}")

        # Step 2: Poll for token
        interval = device_auth.get("interval", 5)
        expires_in = device_auth["expires_in"]
        deadline = time.time() + expires_in

        while time.time() < deadline:
            await asyncio.sleep(interval)
            token_resp = await client.post(
                f"{auth_server}/token",
                data={
                    "grant_type": "urn:ietf:params:oauth:grant-type:device_code",
                    "device_code": device_auth["device_code"],
                    "client_id": client_id,
                }
            )
            token_data = token_resp.json()
            if token_data.get("access_token"):
                return token_data["access_token"]
            if token_data.get("error") == "authorization_pending":
                continue
            break

        raise RuntimeError("Device authorization timed out or was denied")

PKCE (Proof Key for Code Exchange): For agents that do use the authorization code flow but can't safely store a client secret — typical in agents deployed as native apps or CLI tools. PKCE replaces the client secret with a one-time verifier/challenge pair generated per-request.

import hashlib
import base64
import secrets

def generate_pkce_pair() -> tuple[str, str]:
    """Return (code_verifier, code_challenge) for PKCE flow."""
    code_verifier = base64.urlsafe_b64encode(secrets.token_bytes(32)).rstrip(b"=").decode()
    digest = hashlib.sha256(code_verifier.encode()).digest()
    code_challenge = base64.urlsafe_b64encode(digest).rstrip(b"=").decode()
    return code_verifier, code_challenge

# In authorization request:
verifier, challenge = generate_pkce_pair()
auth_url = (
    f"{auth_server}/authorize"
    f"?response_type=code"
    f"&client_id={client_id}"
    f"&redirect_uri={redirect_uri}"
    f"&code_challenge={challenge}"
    f"&code_challenge_method=S256"
)

# In token exchange:
# Include code_verifier (not the challenge) in the token request

These aren't new OAuth flows — they're RFC 8628 and RFC 7636. What's new is that A2A v1.0.0 specifies them as the expected flows for agent-to-agent authentication, rather than leaving it open.

3. Multi-Tenancy

v0.x A2A assumed one principal per agent endpoint. v1.0.0 adds multi-tenancy: a single A2A server can host multiple isolated tenant contexts, with credentials and tasks namespaced per tenant.

This matters if you're building an A2A agent that multiple organizations or teams use. Without multi-tenancy, you'd run a separate server process per tenant. With v1.0.0, one server handles all of them.

What changes in practice:

# v0.x: one server, one context
from a2a.server.apps import A2AStarletteApplication

server = A2AStarletteApplication(agent=my_agent)
# serve with uvicorn: uvicorn app:server --host 0.0.0.0 --port 8000

# v1.0.0: multi-tenant pattern (SDK support varies — check current docs)
# Tenant context is passed in the task request headers or URL path
# /tenants/{tenant_id}/tasks (path-based) or X-Tenant-ID header

# The agent implementation receives tenant context and scopes its work accordingly
async def handle_task(task: Task, tenant_id: str) -> TaskResult:
    db = get_tenant_db(tenant_id)
    config = get_tenant_config(tenant_id)
    # ... process with tenant-scoped resources

The exact SDK surface for multi-tenancy will stabilize in SDK v1.0.0 stable. For now, if you need it, implement it at the HTTP layer using path prefixes or request headers.

4. `tasks/list` with Filtering and Pagination

v0.x tasks/list returned everything: all tasks for an agent, flat list, no pagination. Fine for local development; unusable at scale.

v1.0.0 adds:

Cursor-based pagination — nextCursor field for stable pagination across pages
Filtering — by status, date range, input type
State filtering — query only submitted, working, completed, or failed tasks

import httpx

async def list_recent_failed_tasks(
    agent_url: str,
    access_token: str,
    cursor: str | None = None
) -> dict:
    """Fetch failed tasks with pagination."""
    params = {
        "status": "failed",
        "pageSize": 50,
    }
    if cursor:
        params["cursor"] = cursor

    async with httpx.AsyncClient() as client:
        resp = await client.get(
            f"{agent_url}/tasks",
            params=params,
            headers={"Authorization": f"Bearer {access_token}"}
        )
        resp.raise_for_status()
        return resp.json()
        # Returns: {"tasks": [...], "nextCursor": "...", "totalCount": 123}

# Paginate through all failed tasks:
async def get_all_failed_tasks(agent_url: str, token: str) -> list:
    all_tasks = []
    cursor = None
    while True:
        page = await list_recent_failed_tasks(agent_url, token, cursor)
        all_tasks.extend(page["tasks"])
        cursor = page.get("nextCursor")
        if not cursor:
            break
    return all_tasks

5. Error Handling: `google.rpc.Status`

v0.x had minimal standardized error responses. v1.0.0 adopts google.rpc.Status for error payloads, which gives you structured errors with a machine-readable code, a human message, and optional detail objects.

# A v1.0.0 error response looks like:
{
    "error": {
        "code": 9,           # FAILED_PRECONDITION
        "message": "Task input type 'audio/wav' not supported by this agent",
        "details": [
            {
                "@type": "type.googleapis.com/google.rpc.BadRequest",
                "fieldViolations": [
                    {
                        "field": "input.type",
                        "description": "Supported types: text/plain, application/json"
                    }
                ]
            }
        ]
    }
}

# Error handling in your client:
async def delegate_task(agent_url: str, task: dict) -> dict:
    async with httpx.AsyncClient() as client:
        resp = await client.post(f"{agent_url}/tasks", json=task)
        if not resp.is_success:
            error = resp.json().get("error", {})
            code = error.get("code")
            message = error.get("message", "Unknown error")
            # google.rpc codes: 0=OK, 3=INVALID_ARGUMENT, 5=NOT_FOUND, 9=FAILED_PRECONDITION, ...
            raise A2AError(code=code, message=message, details=error.get("details", []))
        return resp.json()

Code 9 (FAILED_PRECONDITION) is worth calling out — it's for "task rejected because a precondition wasn't met," which covers the common case of sending the wrong input type to a specialist agent.

The Spec-vs-SDK Gap: What It Actually Means

The A2A protocol spec is at v1.0.0. The a2a-sdk package on PyPI is at v0.3.25 stable, with a v1.0.0a0 alpha.

This isn't a warning sign — it's normal for a protocol-first project. The spec stabilizes first. The SDK catches up. The gap exists because the spec team and the SDK team (even if it's the same people) have different release velocity.

What it means practically:

You can implement v1.0.0 features today by calling the HTTP API directly. The spec is the contract. The SDK is a convenience wrapper. Everything in this article works without the SDK.
The a2a-sdk v0.3.25 still works fine for the quickstart-level patterns — agent discovery, task delegation, message exchange. Those APIs haven't changed.
v1.0.0a0 alpha is available if you want the full SDK surface for signed cards and multi-tenancy. Production caution applies (alpha = API might shift before stable).
SDK v1.0.0 stable will ship — and when it does, the migration from v0.3.25 will be straightforward. The protocol hasn't broken backward compatibility.

Where to Go From Here

The quickstart gives you two agents talking on localhost. This article gives you the production primitives.

The logical next step is putting them together: a signed, OAuth-authenticated, multi-tenant A2A setup that you'd actually deploy. That's more infrastructure than a 15-minute tutorial allows — but the building blocks are here.

If you're at the "I want to actually deploy this" stage, the pieces in order:

Sign your Agent Cards (JWS) so callers can verify identity
Use Device Code flow for headless agents that need to authenticate
Add PKCE to authorization code flows for non-server agents
Implement multi-tenancy at the HTTP layer if you're serving multiple orgs
Use tasks/list pagination for any monitoring or debugging tooling

The protocol is production-ready. The SDK is catching up. That gap is a window.

The AI Dev Toolkit includes prompts for designing agent systems, reviewing A2A implementations, and generating structured task schemas — the kind of thing that's tedious to write from scratch every time you start a new agent project.