In the last piece, I moved the review gate outside the agent's control. The hook reads files, not sentences. If a scan didn't write an artifact to disk, the agent doesn't get to post its review to GitHub, no matter how confidently it reports that the scan ran. That fixed the enforcement problem. It left a more basic question sitting there untouched: what are the scans actually looking for?
Five things. This piece is about those five, and about the small, slightly embarrassing moment where each one stopped being a comment I kept retyping and turned into a script that retypes it for me.
None of these is clever
I want to say that up front, because the temptation in an article like this is to make the engineering sound harder than it was. It wasn't hard. Most of these scans are a few lines of bash and a regex I would not show a real engineer without apologizing first.
The value was never in the sophistication. It's in the repetition. A script runs the same scan, the same way, against every diff, every time. I don't. I read carefully on a Tuesday morning and I read like a man late for lunch on a Thursday afternoon, and the gap between those two reviews used to be the gap between catching a passive-voice definition and shipping it to a developer who then had to parse it. The scan doesn't have a Thursday afternoon. That's the entire pitch.
Here's the pattern that produced all five of them. I'd leave a review comment on a PR. Then I'd leave the same comment the next week, on someone else's PR, in nearly the same words. By the third time you type a sentence, you start to suspect the sentence wants to be a script. Every one of these scans is just a review comment I got tired of writing.
The five scans
The real ones live in a script called style-gate.sh. Each one runs against the added prose lines from the diff, with code and link syntax stripped out first, so the scans only ever see prose. What follows is the shape of each, simplified down to the part that does the work.
Will and would: the one that started it
Documentation describes how a system behaves now, not how it will behave at some unspecified future point. So "the client will retry failed requests" is a small lie of tense. The client retries failed requests. Present, indicative, true at read time.
The comment I kept leaving: we describe current behavior in present tense, so "will retry" should be "retries." Every week. Sometimes twice.
The scan is about as dumb as it sounds:
# will/would: future and conditional framing in prose that should be present-tense
grep -nE '\b(will|would)\b' "$added_prose_lines"
The false positives are real and I left them in on purpose. Sometimes "will" is correct: a deprecation notice ("this parameter will be removed in v3") is a genuine statement about the future. So the scan flags, it doesn't block. It hands me a list of line numbers, and I spend ten seconds deciding which ones are tense errors and which are legitimate. Ten seconds I will actually spend, because the list is sitting in front of me, instead of the careful reading I would have skipped.
Passive voice: flagged, never failed
This is the noisiest scan I run, and the one I trust least to be right, which is exactly why it never blocks anything.
The comment it replaced: consider active voice here so the reader knows who does what. "The token is validated by the middleware" hides the actor at the back of the sentence. "The middleware validates the token" puts it back up front.
Detecting passive voice with a regex is a crude business. You're looking for a form of "to be" followed by a past participle, and English has opinions about that you cannot fully encode in grep:
# passive voice: a 'to be' form followed by a likely past participle
grep -nEi '\b(is|are|was|were|been|being)\s+\w+(ed|en)\b' "$added_prose_lines"
It catches real passives. It also catches "the data is embedded" when embedded is doing perfectly honest work, and it misses passives built on irregular participles. I could spend a weekend tightening it. I won't, because passive voice is the one dimension where the regex genuinely cannot tell a good call from a bad one. "The request is rate-limited" is fine. The scan can't know that. So it surfaces candidates and stays out of the decision. I read the list. I keep the ones that earn their place.
Placeholders: TODO, TBD, and the bracket that shipped
This one exists because of a specific humiliation. A [TODO: confirm the actual rate limit] once made it into a merged docs PR, sat in published documentation for a sprint, and got found by a developer rather than by me.
The comment it replaced wasn't even a sentence. It was the sigh you make when you find scaffolding in the finished building.
# placeholders: leftover scaffolding in prose
grep -nE '(TODO|TBD|FIXME|XXX|lorem ipsum)' "$added_prose_lines"
The interesting tuning problem was brackets. I wanted to flag a stray [fill this in], but markdown link syntax is full of legitimate brackets, and so is CLI documentation ([options], [--flag]). My first version flagged every link on the page and I nearly threw the whole idea out. The fix was boring: run the bracket check only on prose lines, after stripping inline code spans and link syntax. Most of tuning these scans is teaching them where not to look.
The marketing words
Documentation has a vocabulary it borrows from the marketing site, and it should give it back. "This powerful endpoint simply returns the user object." Powerful is doing no work. Simply is doing worse than no work, because it tells a reader who finds the thing hard that they were supposed to find it easy.
The comments I kept leaving were two: "powerful" is a marketing word, not a documentation word, cut it, and "simply" assumes it's simple for the reader, who may not agree.
So there's a word list:
# marketing creep and reader-blaming minimizers
grep -nEiw '(powerful|robust|seamless|effortless|blazing|simply|just|easily|obviously)' "$added_prose_lines"
"Just" and "easily" are the troublemakers here, because "just the Authorization header, not the whole set" is a legitimate use of just, and the scan can't see the difference. Same answer as everywhere else: it flags, I scan the hits, I cut the ones that minimize the reader's difficulty and keep the ones doing honest narrowing. The list is short by design. A long banned-words list produces so much noise that you start ignoring it, and a gate everyone ignores is just checkbox theater with extra steps.
When a sentence should have been a fact
This is the subtlest of the five and the one I'm still tuning. It catches hedged, conditional prose standing in for a fact the reader actually needs.
"The cache may or may not be invalidated depending on the header." That sentence has the shape of information and the content of a shrug. The reader has a yes/no question (does my request invalidate the cache?) and the sentence answers "maybe." Somewhere a fact exists: setting Cache-Control: no-store invalidates the cache; other values leave it intact. The hedge was hiding a condition I could have just stated.
The comment it replaced: this reads as uncertainty, but there's a deterministic rule underneath, so state the condition instead of hedging it.
# hedged prose where a stated condition would serve the reader better
grep -nEi '(may or may not|can either|might also|sometimes|in some cases|it depends)' "$added_prose_lines"
The honest disclosure on this one: it has the highest false-positive rate of the five after passive voice, because sometimes the uncertainty is real. Some behavior genuinely depends on runtime state you can't reduce to a clean rule. So this scan is the least mechanical of the lot in practice, even though the grep is trivial. It mostly works as a prompt: you wrote "may or may not," are you sure there isn't a fact here you're avoiding? Often there is.
What a scan can't see
A regex catches a shape, not a meaning. The passive scan flags "the token is validated by the middleware," and most of the time it's right to. But it cannot tell you that "the middleware validates the token" is the better sentence, because it doesn't know what the sentence is for. It knows the sentence has the shape of passive voice. That is the entire extent of what it knows.
So none of these scans decides anything. They surface candidates. I decide. This is the part that gets lost when people hear "automated review" and picture the machine doing the reviewing.
It's worth being precise about the division of labor, because it's the whole reason the system works. The scan is not there to replace my judgment. It's there to protect my judgment from my own inconsistency. I'm good at deciding whether a passive construction earns its place in a given sentence. I'm bad at remembering to look for it at four o'clock on a Thursday. The scan does the looking, which it never forgets to do. I do the deciding, which it can't do at all. We are each assigned the part we're actually reliable at.
What doesn't have a scan yet
This handles the failure modes I can name. The ones I've typed often enough to recognize as a pattern and reduce to a regex. There's a whole other category underneath it: the comment I haven't gotten tired of yet, because I've only had to make it once. Those don't have scans. They have a log.
That's the next piece. The gap log, and the genuinely annoying problem of how a lesson I learn on one PR survives long enough to become a scan I run on the next one. I have not fully solved it. But there's a workable shape, and I'll show you exactly where the friction still is, because pretending it's seamless would land me on my own marketing-words list.
If you run scans like these, I'd want to see your false-positive list before anything else. That's where the real knowledge lives. Mine took far longer to tune than to write, and the tuning is the part nobody puts in the article.
I write about AI-assisted documentation workflows, developer experience, and the evolving role of technical writers. If any of this resonates, let's connect on LinkedIn.

Top comments (0)