DEV Community

Peter Huang
Peter Huang

Posted on

Your AI writes PR descriptions from your commit messages. That's the bug.

TL;DR — Commit messages describe your intentions. The diff describes reality. They drift apart over the life of a branch, and most AI PR-description tools summarize the wrong one. A good PR agent reads the diff. Here's the design, and a free agent to start from.

The PR description that lied

A reviewer pinged me on a PR last year: "The description says this adds rate limiting, but I'm looking at the diff and it also changes how we hash session tokens. Was that intentional?"

It was intentional. I'd done both on the same branch. But my PR description only mentioned the rate limiting — because I'd generated it from my commit messages, and my commit messages were wip, rate limit middleware, fix, fix again, and tweak. The token-hashing change rode in under fix again. The tool that wrote my description faithfully summarized my commits, and my commits faithfully hid half of what I'd actually done.

The reviewer caught it. But the whole point of a PR description is that the reviewer shouldn't have to reconstruct the change from the diff themselves. That's the job I was supposed to have done.

Commits are intentions; the diff is reality

Here's the core problem with generating PR descriptions from commit messages: a commit message is what you believed you were doing at the moment you typed it. It's written before the change is finished, often mid-thought, frequently as fix or address review comments. It captures intent, badly, at a point in time.

The diff against the base branch is different. It's the complete, current, factual statement of what will actually change when this merges. Every renamed function, every altered return shape, every migration, every accidental console.log — all of it is in the diff, and none of it lies, because the diff is the thing being merged.

When an AI tool paraphrases your commit messages, it's summarizing a summary — one written hastily, by you, before you were done. Garbage in, plausible-sounding garbage out. The description reads fine. It's just not true.

A PR agent should read the diff

The fix is almost embarrassingly simple to state: have the agent read git diff <base>...HEAD, not git log. But there are a few design choices that separate a PR agent you trust from one that produces filler.

Here's the shape of one (this is a real Claude Code subagent; the structure matters more than the exact wording):

---
name: pr-surgeon
description: "Use when about to open or push a pull request. Reads the"
  actual diff against the base branch and writes a tight PR title + body
  with a real test plan. Triggers on "open a PR", "PR description",
  "ready to merge".
tools: Bash, Read, Grep, Glob
model: inherit
---

You write PR descriptions that reviewers actually read.

## Procedure

1. Find the base branch (gh pr view --json baseRefName, else origin HEAD,
   else main).
2. Read the FULL diff: `git diff <base>...HEAD`. Also read the commit
   list — but the diff is the source of truth. Commit messages lie; diffs
   don't.
3. Identify the ONE thing this PR does. If it does more than one thing,
   say so explicitly under a "Note to reviewer" heading — do not hide it.

## Output

### Title  — <70 chars, imperative, matches the repo's recent PR style>
### What changed — 2–5 bullets, each a concrete behavior change, not a file
### Why — the user-visible reason. If you can't find it, ASK. Don't invent.
### Test plan — a checklist a reviewer can actually run. Real commands.
### Risk / rollback — one line: what breaks if this is wrong, how to revert.
Enter fullscreen mode Exit fullscreen mode

Three things in there are doing the real work:

1. "The diff is the source of truth. Commit messages lie." This single instruction is the whole thesis. The agent is allowed to read the commits for context, but it must reconcile them against the diff and trust the diff. That's what would have caught my token-hashing change — it's in the diff whether or not a commit mentions it.

2. "If it does more than one thing, say so explicitly." Agents, like people, want to present a clean single-purpose story. An honest PR agent surfaces scope creep instead of smoothing it over. The "Note to reviewer" heading is where the token-hashing change would have been forced into the open.

3. "Why — if you can't find it, ASK. Don't invent." This is the anti-hallucination guard. The diff tells you what changed but rarely why. A weaker agent fills that vacuum with confident fiction ("this refactor improves maintainability"). A good one admits the gap and asks you, because a made-up rationale is worse than a blank.

The section everyone skips: the test plan

Most PR descriptions — human or AI — stop at "what changed." But the highest-value part of a PR description for the person reviewing it is the test plan: a concrete, runnable checklist of how to verify the change does what it claims.

Not "tested locally." That's noise. A real test plan looks like:

- [ ] `npm run test:rate-limit` passes
- [ ] Hit `/api/login` 6√ó in 10s ‚Üí 6th returns 429
- [ ] Existing session tokens still validate after deploy (no forced logout)
Enter fullscreen mode Exit fullscreen mode

That last line is exactly the kind of thing a diff-reading agent can generate and a commit-summarizing agent never could — because the token-hashing change is in the diff, so the agent knows to tell the reviewer to check that existing sessions survive.

Build your own, or start from a free one

The pattern generalizes to any "summarize a change" agent: read the artifact that represents reality (the diff, the schema, the built output), not the artifact that represents intention (commit messages, ticket titles, your own memory).

If you want a starting point, my free, MIT-licensed Claude Code agent is a good template for the structure of a focused agent — tool scoping, fixed output format, explicit refusal rules:

üëâ github.com/allcanprophesy-ops/claude-code-shipping-coach

cp shipping-coach.md ~/.claude/agents/
Enter fullscreen mode Exit fullscreen mode

It's a pre-merge checker rather than a PR writer, but it's built on the same bones, and reading one well-structured agent file teaches you more than any amount of theory. The pr-surgeon agent sketched above — plus a few others (regression-sentinel, test-gap-hunter) — are linked from that repo's README if you'd rather not build from scratch. But honestly: read the free one first, then decide.


What's the worst PR description you've ever had to review — or write? I'm collecting examples of where "summarize the commits" goes wrong. Drop one in the comments.

Top comments (0)