Stephanie Dover

Posted on Jun 20

Why I scrub AI prose with regex, not a second LLM

#ai #opensource #python #showdev

Written by Stephanie Dover, Software Engineer 10+ YOE, ex GitHub, Twitch, Microsoft. Creator of Klaussy.

LinkedIn · GitHub · Klaussy Desktop · Klaussy Agents

TL;DR

klaussy-agents is a free, MIT-licensed CLI (pip install klaussy-agents) that makes the prose an AI coding agent writes, PR comments, review notes, commit messages, read like a person wrote them. It works in two layers: a humanization spec baked into the agent's skills so it writes clean prose up front, and a deterministic klaussy humanize pass that scrubs the output afterward. The scrubber is rule-based regex, not an LLM, and it never touches code. There's also a part I didn't expect going in: once the AI tells are gone, what's left can read curt and run long, so the spec also handles tone (don't be rude) and length (one sentence for a reply, one to five for a review comment). Repo: github.com/steph-dove/klaussy-agents.

The problem

You can spot AI-written text now. Everyone can. And the place it grates most is a code review comment or a commit message, where the prose sits next to your name in a thread your teammates read.

The tells are consistent. The em-dash is the biggest one. Right behind it: filler openers like "It's worth noting that…" and "I wanted to point out that…", chatbot scaffolding like "Hope this helps!" and "Let me know if you have questions!", and stacked hedges like could potentially. An agent that leaves those in your PR reads like a bot, and people notice.

The obvious fix is to tell the model not to do it. Add "don't sound like AI" to the prompt and move on. That helps, inconsistently, and it regresses silently the moment you change the model or the prompt drifts. Editing every comment by hand works too, but hand-editing every comment defeats the point of having an agent write them. I wanted something I could trust without rereading.

Why "just tell the model" wasn't enough

The honest answer to "why not just prompt for it" is: a prompt asks, it doesn't enforce. The model tries to comply. Sometimes it complies fully, sometimes it slips an em-dash back in on a longer comment, and you don't find out until the tell is already in the thread. Prompt compliance is soft by nature: you can make it better, never guaranteed.

So I stopped treating the prompt as the whole answer and treated it as the first of two layers.

The first layer is prompt-side. There's a single shared humanization block, internally HUMANIZE_BLOCK, that gets substituted into every prose-output skill: review, pr, commit, explain, across all five of the agents klaussy-agents ships. The rules in it are plain: no em or en dashes, no filler openers, no chatbot scaffolding, tighten hedges, no emoji, no "Certainly", vary sentence shape, and never reword code. One spec, applied everywhere, so the agent isn't given conflicting instructions in different places.

The second layer is the part that actually enforces. After the agent writes, the text goes through klaussy humanize, a deterministic pass that makes the high-confidence edits regardless of how well the model followed instructions. The prompt asks; the scrubber enforces. Neither layer alone is enough, which is the whole reason there are two.

Why deterministic, and not a second model

The tempting design is to run the agent's prose through another LLM with a "rewrite this to sound human" instruction. That's how most "AI humanizer" tools on the market work. I deliberately didn't.

A rewrite model can fix awkward phrasing a regex never will. But it can also paraphrase something into being subtly wrong, and over technical prose that's a real risk: it can rewrite an identifier, mangle a command, or "improve" an example until it no longer runs. For text that ends up in a PR comment a teammate will act on, I didn't want a process that could introduce a new mistake while removing an em-dash.

So the scrubber is rule-based regex. The consequences of that choice are all upsides for this job: it's fast, it's free, it runs offline with no network call, and because it only ever does a fixed set of high-confidence substitutions, it can't introduce a new error. It will not restructure a sentence or paraphrase a paragraph. It does a small, reliable set of edits and stops.

How it works

The edits it makes

The scrubber does a short list of conservative transforms. Each one is a tell with a known, safe fix:

Dashes. Em-dashes (—) and en-dashes (–) in prose become commas or hyphens.
Filler openers. Sentence-initial filler ("It's worth noting that", "I noticed that", "Please note that", and similar) is stripped, and the next word is re-capitalized so the sentence still reads correctly.
Chatbot scaffolding. Trailing lines like "Let me know if…", "Hope this helps", "Feel free to…" are dropped.
Verbose phrasings. A few get tightened: in order to becomes to, could potentially becomes could.

Here's the before and after on the exact cases the test suite covers:

Input	Output
`Leaks a connection — wrap it.`	`Leaks a connection, wrap it.`
`range 1–5 here`	`range 1 - 5 here`
`It's worth noting that the handler swallows the error.`	`The handler swallows the error.`
`This races on startup.` `Let me know if you have questions!`	`This races on startup.`
`Refactor in order to avoid the N+1.`	`Refactor to avoid the N+1.`
`This could potentially deadlock.`	`This could deadlock.`

Every one of those is a rule a human editor would apply without thinking. None of them changes what the comment means. That's the bar for inclusion: if an edit could change meaning, it doesn't make the list.

Never touching code

The single biggest risk in running any text transform over developer prose is that it reaches into a code example and breaks it. A "humanizer" that turns a dash inside a shell command into a comma has made your example wrong.

The scrubber avoids this structurally. It splits the input on fenced blocks and on inline code, scrubs only the prose segments, and leaves every code segment byte-for-byte untouched. The dashes in a command stay dashes. The identifiers stay identifiers.

Concretely, given a string that mixes prose and code like Use `a — b` then:, followed by a fenced block containing x — y, the dashes inside the inline span and inside the fence are kept exactly as written. Only a dash out in the prose would be normalized. Code in, code out, unchanged.

The interface

There are two ways to use it. As a library:

from klaussy.humanize import humanize

clean = humanize("It's worth noting that this races, fix it.")
# -> "This races, fix it."

humanize(text: str) -> str. Non-string input passes through unchanged.

As a CLI, it's built to drop into a pipe or a CI gate:

# stdin to stdout
printf '%s' "$comment" | klaussy humanize

# rewrite a file in place
klaussy humanize NOTES.md --write

# CI gate: exit 1 if the file would change
klaussy humanize NOTES.md --check

The --check mode is the one I lean on most. It turns "did an AI tell slip into this doc" into a check that fails a pull request instead of a thing someone has to catch by eye.

One spec, shared across products

The rules in both layers, the prompt-side block and the CLI, are a faithful port of one source: humanize-comment.js from the klaussy desktop codebase. (That desktop app is a separate product; this is the open-source klaussy-agents package. That's the only time I'll mention it.) Porting from one canonical implementation means the prompt rules and the scrubber rules don't drift apart over time, and any pipeline, CI, the desktop app, or your own scripts, can pipe through the same behavior.

Comment hygiene, not just prose

The same instinct shows up one layer down, in code review. Beyond the prose tells, the generated review skill flags excessive or narrating comments in code, the kind that restate what the line does or read like a changelog, and the commit guard blocks committed commented-out code via ruff --select ERA. The judgment-heavy part lives in the skill where a model can weigh context; the deterministic part lives in the hook. Same division of labor as the two prose layers: ask where you need judgment, enforce where you can be certain.

Clean isn't the same as kind, or short

Removing the tells exposed a second problem I didn't plan for: stripping the filler also strips the softening. Scrub "It's worth noting that this could potentially swallow the error, you may want to wrap it" down to its substance and you get "This swallows the error. Wrap it." That's clean, and on a PR thread it's also a little cold, and cold reads as curt. Review comments are the worst case, because they land on a person's work. A real one, lightly defanged: "Personally I don't find these unit tests useful, because you are mocking everything." The tells aren't the problem there; the framing is.

So humanizing can't be purely subtractive. After removing what makes prose sound like a machine, the spec adds back what makes it sound like a considerate human, split the same two ways as everything else.

Prompt-side, a civility floor: critique the work, never the person; prefer a question over a flat verdict; keep it a light touch, not filler praise. It's a floor, not forced warmth, so a review you asked to be blunt stays blunt, it just can't tip into insulting. A second rule covers replies: read the comment you're answering for substance but not temperature, and neutralize its rudeness before drafting, so a hostile thread doesn't prime a hostile reply. And a brevity rule with actual numbers instead of "be concise": a thread reply aims for one sentence, a single review comment for one to five, anything longer gets cut, not summarized.

Deterministic-side, the scrubber's opener list grew to catch the editorializing lead-ins that prime a dismissive read, "Personally," "Honestly," "Frankly," "IMO," "In my opinion," "If you ask me," so Personally I don't find these useful. becomes I don't find these useful. with the same guarantee as the rest, and still never inside code.

The honest caveat: tone and brevity are mostly judgment, so most of it lives in the soft prompt layer, only the openers are guaranteed.

A quick demo

The clearest way to see it is to pipe a comment that has several tells stacked up:

printf '%s' "It's worth noting that this handler swallows the error, wrap it. Hope this helps!" \
 | klaussy humanize

The opener gets stripped, the next word re-capitalized, the em-dash becomes a comma, and the trailing scaffolding line is dropped. What comes out reads like a note an engineer left, because the things that made it read like a bot are gone and nothing else was touched.

The whole pass is pure standard library, no network, no model. The test suite covers every transform above plus the code-preservation cases, ported from the desktop test suite; 137 tests pass overall.

What's next, and where the line is

The deliberate limit is the headline tradeoff: deterministic means limited. A regex scrubber catches the reliable tells, but it will not rewrite genuinely awkward phrasing the way an LLM could. That's the trade I chose on purpose, the same property that makes it safe, fast, and incapable of introducing an error is the property that keeps it from doing deeper rewrites. If you want paraphrasing, this isn't that tool, and it's not trying to be.

A couple of other honest limits:

The prompt layer is soft. The {{HUMANIZE}} block depends on the model following instructions. Two layers because neither alone is enough.
It's opinionated. Some people like em-dashes. The scrubber normalizes them, and that's a stance. It's also open code, so if you disagree, the rules are right there to edit.
Scope is prose, not code. By design it won't touch anything inside a code block. The flip side is that it won't fix a tell living inside a code comment unless you wire it to do so.

None of these are things I'm hiding. They're the consequences of picking "safe and predictable" over "clever and risky" for text that goes in front of your team.

Try it

pip install klaussy-agents
printf '%s' "It's worth noting that this races, fix it. Hope this helps!" | klaussy humanize

Repo and docs: github.com/steph-dove/klaussy-agents

steph-dove / klaussy-agents

Claude Code boilerplate generator. One command to make any repo Claude Code-ready.

klaussy

Multi-agent repo boilerplate generator. One command to make any repo ready for Claude Code, Gemini CLI, Cursor, Codex, and GitHub Copilot — each gets the same conventions and the same workflow skills in its own native format.

Install

pip install klaussy-agents

Requires klaussy-repo-conventions (installed automatically).

Quick Start

cd your-repo
klaussy init

That's it. You'll be prompted for your base branch (auto-detects dev, main, etc.), then klaussy generates everything.

By default klaussy bootstraps all supported agents from the same conventions. To narrow to a subset, pass --agents:

klaussy init                                   # all agents (default)
klaussy init --agents claude                   # Claude Code only
klaussy init --agents claude,gemini,cursor     # a subset

See Multi-agent targets for what each agent gets.

What Gets Generated

klaussy discovers your repo's conventions once, then writes — for every selected agent (all five by default) — that agent's native conventions file, the workflow skills, stack-appropriate permissions…

View on GitHub

If you've got an AI tell that drives you up the wall and the scrubber doesn't catch it yet, open an issue with the before/after, that's exactly the kind of case I want to see.

DEV Community