I Fed My 10 Best Blog Posts to 5 Different AI Writing Tools. None of Them Nailed My Voice.

#ai #writing #webdev #productivity

Last month I published a breakdown of 500 AI-generated LinkedIn posts and why they all sound the same. The response was interesting — a lot of people agreed with the problem, but several pushed back with the same argument:

"Just give the AI examples of your writing. Problem solved."

Fair point. I should test that.

So I did. I took 10 of my best-performing blog posts — pieces with real engagement, real comments, real shares — and fed them to 5 AI writing tools that claim to learn or match your style. Then I asked each tool to write a new post on a topic I've covered before.

The results were... educational.

The Setup

I wanted this to be as fair as possible, so here's what I did:

My 10 reference posts covered a range of topics: SaaS growth, content strategy, AI tools, personal branding. Word counts ranged from 800 to 2,400 words. All had performed above my baseline engagement metrics.

The 5 tools were a mix of dedicated AI writing platforms, general-purpose LLMs with "custom instructions," and one tool specifically marketing voice matching as a core feature. I'm not naming them because this isn't a hit piece — the findings apply broadly.

The task: Write a ~600-word LinkedIn post about "why most content repurposing fails." A topic I've written about before, so I had a clear baseline for what my version looks like.

How I evaluated: I pulled 14 quantifiable markers from my actual writing — things like average sentence length, opener pattern, paragraph length distribution, vocabulary frequency, rhetorical question usage, and transition word preferences. Then I scored each AI output against those markers.

I also did a blind test: I showed all 6 versions (5 AI + my real one) to 8 people who read my content regularly and asked them to identify which one was mine.

Finding #1: They All Got the Topic Right. None Got the Texture Right.

Every tool produced a competent post about content repurposing. The arguments were sound. The structure was logical. If you just needed "a post about this topic," any of them would work.

But when I compared the outputs to my actual writing, the mismatch was measurable:

Marker	My Writing	Best AI Output	Worst AI Output
Avg. sentence length	11.3 words	16.8 words	22.4 words
Short paragraphs (1-2 sentences)	68%	41%	12%
Rhetorical questions	4 per post	1	0
Contractions used	89%	52%	31%
Opens with personal anecdote	Yes	No (4/5)	—
Uses "look" or "honestly"	3x	0x	0x

The AI outputs were consistently longer, more formal, and more structured than how I actually write. Even the tool that specifically claims to match your voice produced sentences that were 48% longer than mine on average.

Finding #2: "Custom Instructions" Are a Blunt Instrument

Three of the tools let me paste custom instructions or style notes. So I wrote detailed ones: "Use short sentences. Open with a personal story. Keep paragraphs to 1-2 sentences. Use contractions. Conversational tone."

This helped. The outputs got shorter and less formal. But here's the thing — they started sounding like a generic casual writer, not like me. Following instructions like "be conversational" produces a different output than actually writing the way I write.

It's the difference between a musician sight-reading sheet music and a musician playing from feel. The notes might be right, but the groove is off.

Finding #3: The Blind Test Was Brutal

Remember those 8 readers I asked to identify my real post? Results:

6 out of 8 correctly identified my post
0 out of 8 thought any AI version was mine
The most common reason: "Yours sounds like you're actually talking to me. The others sound like they're performing."

One reader nailed it: "Your version has this thing where you'll make a point, then immediately undercut it or add a 'but.' The AI versions just march forward."

That pattern — the self-interruption, the mid-paragraph pivot — is the kind of thing that doesn't show up in "tone: casual" instructions. It's structural. It's rhythmic. And no tool captured it.

Finding #4: More Examples Didn't Help (After a Point)

I ran a follow-up test. Instead of 10 reference posts, I tried 3, then 5, then 15, then 25.

The jump from 3 to 5 examples was significant — maybe 30% better across my markers. But from 5 to 15? Almost nothing. From 15 to 25? Actually slightly worse on two markers.

There's a plateau. And it happens much earlier than you'd expect. The bottleneck isn't data — it's how the tools use the data.

Finding #5: The Gap Isn't in Understanding — It's in Encoding

This is the part that surprised me most. When I asked each tool to describe my writing style (rather than replicate it), most of them did a reasonable job. They'd say things like "short sentences, conversational, uses rhetorical questions, personal anecdotes."

They understood the style. They just couldn't reproduce it.

It's like the difference between a music critic who can describe exactly what makes Miles Davis's trumpet playing distinctive and a trumpet player who can actually play like that. Analysis and execution are different skills.

What This Actually Means

I don't think AI voice matching is impossible. I think the current approach is wrong.

The tools I tested all treat voice as a set of preferences — tone sliders, style dropdowns, custom instructions. But voice is more like a fingerprint. It's a unique combination of 20+ markers that interact with each other in ways that simple instructions can't capture.

You need to extract those markers explicitly, quantify them, and enforce them as constraints — not suggestions. And you need the right reference examples selected for the specific content you're generating, not just a random sample of everything you've ever written.

That's the approach I'm taking with VoiceForge. Instead of asking "what tone do you want?" it analyzes your actual writing and extracts what I've been calling your Writing DNA — the measurable patterns that make your content sound like you. Early results are promising, but I'll save that data for another post.

If you've tried getting AI to match your voice — whether successfully or not — I'd love to hear what worked and what didn't. The comments on the last post were genuinely useful, and I'm still learning from this problem.

This is part of an ongoing series where I'm stress-testing AI content tools with actual data instead of vibes. Previously: I Analyzed 500 AI-Generated LinkedIn Posts.