DEV Community

Why I Built My Own Humanizer (And Why You Should Too)

Daniel Nwaneri on February 24, 2026

There's a tool called humanizer — a Claude Code skill built by blader, inspired by Wikipedia's guide to detecting AI writing. It's good. 6,600 star...

Read full post

MaxxMini • Feb 25

The "not sounding like AI" vs "sounding like me" distinction is the real insight here. I've been writing on Dev.to for a few months now — around 50 articles — and noticed this empirically before understanding why.

My articles that performed best (reactions, comments, genuine engagement) were all ones where I wrote from a specific lived experience. A finance app I built with no backend, a 23-tool collection, a post about the $0 revenue phase. The ones that flopped? Every single listicle and tutorial I published following "proven" content structures.

Same AI tells removed in both cases. But the high-performers had voice — specificity that a generic humanizer wouldn't flag because it's not about patterns to avoid. It's about patterns you uniquely reach for.

Your CORPUS.md approach formalizes something I was doing by intuition: rereading my best pieces before writing new ones, trying to reverse-engineer what made them mine. Having a tool that extracts that fingerprint is a different order of feedback.

Two questions:

Corpus decay — does your voice fingerprint become stale? If you're writing 3x more with AI assistance now than before, the ratio of "pure you" samples in the corpus shifts. At what point does your corpus reflect your AI-influenced voice rather than your pre-AI baseline?
Minimum viable corpus — is there a threshold where the fingerprint becomes reliable? Five pieces? Twenty? I'm curious if you hit diminishing returns or if more samples keep refining the signal.

Daniel Nwaneri • Feb 25

"Patterns you uniquely reach for" is exactly the frame and the fact that you arrived at it empirically before having a name for it is the best evidence the tool is solving a real problem.

On corpus decay. I don't have a clean answer yet. My working assumption is that you weight recent pieces more heavily and treat the corpus as a living document rather than a fixed baseline. Revisit every few months, remove pieces that no longer represent how you write. But the ratio shift you're describing is real.
if AI assistance is already shaping your output, your corpus starts capturing AI-influenced voice as "authentic." the honest answer is I don't know where that threshold is yet.
on minimum viable corpus. five pieces feels like the floor for rhythm detection, twenty for reliable specificity fingerprinting. But this is intuition not data. what I'd actually say is use your best pieces not your most recent ones. quality over quantity.

The pieces where you felt most like yourself not the ones that performed best by metrics. Those two sets aren't always the same.

Both of these are open questions worth tracking. if you build a corpus and notice the fingerprint degrading or improving with size, I'd genuinely want to know what you find.

Ali-Funk • Feb 24

I saved it to read it again a second time. Before I build it I will test out the standard market tools first. Cool project. @dannwaneri

Daniel Nwaneri • Feb 24

That's exactly the right order. Test the standard tools first, know what they do and don't catch, then decide if calibrating to your own voice is worth the extra setup.

Curious what you find when you run your writing through the generic humanizer. what it flags might tell you something about your patterns.

Ali-Funk • Feb 24

Your ideas are genuinely refreshing and I need to try this out first thing tomorrow

Ingo Steinke, web developer • Feb 25

Question remains: what is "my tone" and how does training on past material not hold back developing a better writing style? I always felt like my sentences are too long and text too hard to understand, still some of my posts became quite popular. I'd choose on popular + recent + revised posts with a high readability score and retrain the system with newer material again in a few months from now. And I'd have to split technical DEV posts and cultural blogging for a more general audience.

Daniel Nwaneri • Feb 25

The em dash problem is real. On a Mac it's Option+Shift+Hyphen, on Windows it's Alt+0151. worth adding to muscle memory if you write a lot.
The corpus staleness question is the sharpest thing in this thread. you're right that training on old material risks optimizing for who you were. Weight recent pieces more heavily, revisit the corpus every few months, and treat it as a living document not a fixed baseline.

The split corpus idea. technical posts separate from cultural writing — is something I hadn't considered and probably should implement. different registers, different fingerprints. worth building into the setup instructions.

Ingo Steinke, web developer • Feb 25

Linux doesn't seem to have a similar preconfigured way to type that character and I haven't even been missing it at all. I suppose that you also use special typographic quotation marks instead of the ASCII 34 " replacement?

Ingo Steinke, web developer • Feb 25

My writing uses em dashes.

I always wondered how people do this. There is no such symbol in the standard German keyboard layout, so for me it's less likely than using an emoji in my text.

Matthew Hou • Feb 26

The corpus-first approach is the right call. I've been working on something adjacent — maintaining voice consistency across different content types (articles, comments, product copy) — and the lesson is the same: generic detection catches AI tells but misses voice drift. The false positive problem you describe is real. I use em dashes deliberately too, and every generic checker flags them. Having a ground truth corpus that says 'this is actually how this person writes' changes the signal entirely. One thing I'd add: the corpus should probably evolve. Your writing voice shifts over months. A static CORPUS.md calibrated to writing from a year ago might start penalizing your current voice. Have you thought about a rolling window approach?

Daniel Nwaneri • Feb 26

The rolling window problem is real and I don't have a clean solution yet. My working approach is manual.Revisit the corpus every few months, weight recent pieces more heavily, remove anything that no longer represents how I write.

The harder version of your question. if your voice is shifting blc of AI assistance, the corpus starts capturing AI-influenced voice as authentic. At what point does the ground truth stop being ground truth?
maintaining voice consistency across content types is the adjacent problem I haven't solved. articles and comments already feel like different registers.Product copy is a third one entirely. Separate corpus files per content type is the obvious answer but adds friction most people won't accept.

Dejan • Mar 2

Congratulations on your success. However, human evaluation of text or speech has common flaws. Some search systems evaluate sentences of less than 10 characters as over 80% AI-generated. This is a problem elsewhere, not a methodological one. Speaking of the corpus you mentioned, it knows how to recognize sentences. Whether it's finding a word, learning the probability of the next word, and then constructing the entire sentence, each model is different. And there are many common methods. You probably have many identification tools that can evaluate my commit as AI-generated. However, it's important to remember that AI must recognize that humans are prone to and prone to making mistakes. This doesn't mean reinforcement learning or strong AI. It needs to be able to analyze the sentiment of the entire sentence. I think that's the answer. In any case, congratulations on your success, and I sincerely hope you continue to post articles like this.

klement Gunndu • Mar 2

The corpus decay problem is real — we ran into this building content pipelines where the "voice fingerprint" drifted after just a few months of AI-assisted editing. Curious if you've considered versioning the CORPUS.md to track that drift over time?

Daniel Nwaneri • Mar 2

Versioning CORPUS.md is the right instinct. Treating voice fingerprint as a living document rather than a fixed calibration. Haven't implemented it yet but the approach would be semantic versioning tied to publishing milestones: v1.0 captures pre-AI-assisted writing, each major drift gets a new version, rollback available if the current voice diverges too far from the baseline.
The harder question you're pointing at: at what point does the drifted version become the authentic voice rather than a corrupted one? if the writing genuinely improved through AI collaboration, penalizing that drift is the wrong call. The version history is also the record of how the voice evolved which is different from how it degraded.

adam raphael • Feb 25

Hello. I saved it so I can read it again later. I also want to give this a try.

Benjamin Nguyen • Feb 24

Wow!