DEV Community

Cover image for Why I Built My Own Humanizer (And Why You Should Too)
Daniel Nwaneri
Daniel Nwaneri

Posted on

Why I Built My Own Humanizer (And Why You Should Too)

There's a tool called humanizer — a Claude Code skill built by blader, inspired by Wikipedia's guide to detecting AI writing. It's good. 6,600 stars, hundreds of forks, an active community adding patterns and language support. If you want to strip AI tells from any text, it does that well.

I used it. Then I built something different.

The problem isn't that humanizer is wrong. It's that it's solving a slightly different problem than the one I actually have.

Humanizer checks your writing against a generic human baseline. It knows what AI writing looks like and flags the patterns — significance inflation, copula avoidance, the rule of three, em dash overuse. Twenty-four patterns derived from Wikipedia's AI cleanup guide. Run your draft through it, find the tells, rewrite.

That works if your goal is writing that doesn't look AI-generated.

My goal is writing that sounds like me.

Those are related but not the same thing. I can write a draft that passes every humanizer check and still sounds nothing like my published work. No AI tells, no voice. Sterile, voiceless prose is as detectable as slop — it just gets detected by different readers.


The thing I needed wasn't a list of patterns to avoid. It was a calibration against my own writing at its best.

So I built voice-humanizer. Same foundation as blader's tool — same 24 patterns, now 27 with three new ones from a community PR. But with one addition that changes what it does: a CORPUS.md file containing your own published writing, from which the skill extracts your voice fingerprint before it checks anything else.

Voice check first. AI pattern check second.

The fingerprint tracks what you reach for and — just as important — what you don't. Rhythm, specificity, the patterns absent from your corpus that signal drift.

Here's what that looks like in practice. I ran voice-humanizer on a draft of this post before publishing. It caught this:

Before (draft):

The fingerprint tracks rhythm patterns, paragraph opening style, specificity signals, what you reach for when you need a concrete detail, and — just as important — what you don't do.

Flag:

Voice drift — list of five items where your corpus shows you compress to two. Em dash doing emotional emphasis work your corpus handles structurally.

After:

The fingerprint tracks what you reach for and — just as important — what you don't. Rhythm, specificity, the patterns absent from your corpus that signal drift.

No AI pattern was triggered. A generic humanizer would have passed this. Voice-humanizer caught it because the corpus knew this author compresses lists. That's the difference.


When it flags something now, it doesn't just say "this pattern looks like AI." It says "this reads as Claude because it uses three parallel items where your corpus shows you compress to two. Here's what you'd likely do instead."

That's a different kind of feedback.


The corpus approach also solves a problem humanizer can't: false positives.

My writing uses em dashes. Not excessively, but deliberately — once per piece, structurally. A generic humanizer would flag that. Voice-humanizer won't, because it appears in the corpus. It's my pattern, not AI bleeding through.

Same for any other stylistic choice that looks like an AI tell in isolation but is actually part of your voice. The corpus is the ground truth.


You can use voice-humanizer with your own writing. The repo is public: github.com/dannwaneri/voice-humanizer

CORPUS.md is gitignored — your writing stays private. CORPUS.example.md shows you what to put there. Five questions in SETUP.md help you extract your own voice fingerprint before you start.

It won't work without a corpus. That's intentional. A humanizer calibrated to nobody's voice in particular isn't calibrated to yours.

Credit to blader for the foundation — the pattern list and skill format this is built on. Voice-humanizer solves a narrower problem for a specific kind of writer: someone who's been writing long enough to know what their best work sounds like — and doesn't want AI assistance to flatten it.

Top comments (16)

Collapse
 
alifunk profile image
Ali-Funk

I saved it to read it again a second time. Before I build it I will test out the standard market tools first. Cool project. @dannwaneri

Collapse
 
dannwaneri profile image
Daniel Nwaneri

That's exactly the right order. Test the standard tools first, know what they do and don't catch, then decide if calibrating to your own voice is worth the extra setup.

Curious what you find when you run your writing through the generic humanizer. what it flags might tell you something about your patterns.

Collapse
 
alifunk profile image
Ali-Funk

Your ideas are genuinely refreshing and I need to try this out first thing tomorrow

Collapse
 
ingosteinke profile image
Ingo Steinke, web developer

Question remains: what is "my tone" and how does training on past material not hold back developing a better writing style? I always felt like my sentences are too long and text too hard to understand, still some of my posts became quite popular. I'd choose on popular + recent + revised posts with a high readability score and retrain the system with newer material again in a few months from now. And I'd have to split technical DEV posts and cultural blogging for a more general audience.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

The em dash problem is real. On a Mac it's Option+Shift+Hyphen, on Windows it's Alt+0151. worth adding to muscle memory if you write a lot.
The corpus staleness question is the sharpest thing in this thread. you're right that training on old material risks optimizing for who you were. Weight recent pieces more heavily, revisit the corpus every few months, and treat it as a living document not a fixed baseline.

The split corpus idea. technical posts separate from cultural writing — is something I hadn't considered and probably should implement. different registers, different fingerprints. worth building into the setup instructions.

Collapse
 
ingosteinke profile image
Ingo Steinke, web developer

Linux doesn't seem to have a similar preconfigured way to type that character and I haven't even been missing it at all. I suppose that you also use special typographic quotation marks instead of the ASCII 34 " replacement?

Collapse
 
ingosteinke profile image
Ingo Steinke, web developer

My writing uses em dashes.

I always wondered how people do this. There is no such symbol in the standard German keyboard layout, so for me it's less likely than using an emoji in my text.

Collapse
 
matthewhou profile image
Matthew Hou

The corpus-first approach is the right call. I've been working on something adjacent — maintaining voice consistency across different content types (articles, comments, product copy) — and the lesson is the same: generic detection catches AI tells but misses voice drift. The false positive problem you describe is real. I use em dashes deliberately too, and every generic checker flags them. Having a ground truth corpus that says 'this is actually how this person writes' changes the signal entirely. One thing I'd add: the corpus should probably evolve. Your writing voice shifts over months. A static CORPUS.md calibrated to writing from a year ago might start penalizing your current voice. Have you thought about a rolling window approach?

Collapse
 
dannwaneri profile image
Daniel Nwaneri

The rolling window problem is real and I don't have a clean solution yet. My working approach is manual.Revisit the corpus every few months, weight recent pieces more heavily, remove anything that no longer represents how I write.

The harder version of your question. if your voice is shifting blc of AI assistance, the corpus starts capturing AI-influenced voice as authentic. At what point does the ground truth stop being ground truth?
maintaining voice consistency across content types is the adjacent problem I haven't solved. articles and comments already feel like different registers.Product copy is a third one entirely. Separate corpus files per content type is the obvious answer but adds friction most people won't accept.

Collapse
 
klement_gunndu profile image
klement Gunndu

The corpus decay problem is real — we ran into this building content pipelines where the "voice fingerprint" drifted after just a few months of AI-assisted editing. Curious if you've considered versioning the CORPUS.md to track that drift over time?

Collapse
 
dannwaneri profile image
Daniel Nwaneri

Versioning CORPUS.md is the right instinct. Treating voice fingerprint as a living document rather than a fixed calibration. Haven't implemented it yet but the approach would be semantic versioning tied to publishing milestones: v1.0 captures pre-AI-assisted writing, each major drift gets a new version, rollback available if the current voice diverges too far from the baseline.
The harder question you're pointing at: at what point does the drifted version become the authentic voice rather than a corrupted one? if the writing genuinely improved through AI collaboration, penalizing that drift is the wrong call. The version history is also the record of how the voice evolved which is different from how it degraded.

Collapse
 
milosevic2020 profile image
Dejan

Congratulations on your success. However, human evaluation of text or speech has common flaws. Some search systems evaluate sentences of less than 10 characters as over 80% AI-generated. This is a problem elsewhere, not a methodological one. Speaking of the corpus you mentioned, it knows how to recognize sentences. Whether it's finding a word, learning the probability of the next word, and then constructing the entire sentence, each model is different. And there are many common methods. You probably have many identification tools that can evaluate my commit as AI-generated. However, it's important to remember that AI must recognize that humans are prone to and prone to making mistakes. This doesn't mean reinforcement learning or strong AI. It needs to be able to analyze the sentiment of the entire sentence. I think that's the answer. In any case, congratulations on your success, and I sincerely hope you continue to post articles like this.

Collapse
 
azadarjoe profile image
adam raphael

Hello. I saved it so I can read it again later. I also want to give this a try.

Collapse
 
benjamin_nguyen_8ca6ff360 profile image
Benjamin Nguyen

Wow!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.