DEV Community

Adrian Vega
Adrian Vega

Posted on

The Writing DNA Experiment: I Analyzed 20 Creators Voices and Found 6 Patterns Nobody Talks About

A few weeks ago I ran a small experiment. I took 20 creators — newsletter writers, LinkedIn thought leaders, dev bloggers, indie hackers — and ran a full Writing DNA analysis on each of them. 5 pieces of content per person, 100 pieces total.

The goal was simple: find out what actually makes a writing voice distinct. Not in a vague "tone and style" way. In a measurable, quantifiable way.

Some of the findings were predictable. But a few genuinely surprised me.

The Setup

I sourced 20 creators across 5 niches:

  • 4 tech/dev bloggers
  • 4 SaaS founders writing on LinkedIn
  • 4 newsletter operators (Substack, Beehiiv)
  • 4 solo consultants
  • 4 creator-educators (course sellers, coaches)

For each person, I collected their 5 highest-engagement posts. Then I extracted 23 quantitative markers from every piece — sentence length distribution, paragraph structure, opener patterns, vocabulary fingerprint, rhetorical device frequency, CTA style, and more.

That gave me a 20x23 matrix. Here's what jumped out.

Pattern #1: Sentence Length Variance Matters More Than Average

Every AI tool I've tested treats sentence length as a single number: "average 14 words per sentence." But when I looked at the data, average sentence length was nearly identical across most creators — between 12 and 17 words.

What actually varied was the spread.

The most distinctive writers had high variance. They'd alternate between 4-word punches and 30-word explanatory runs. The bland writers (the ones whose AI-generated content is almost indistinguishable from their real stuff) had flat distributions — every sentence within 2-3 words of the mean.

Writer Type Avg Sentence Length Std Deviation
Highly distinctive (top 5) 14.1 words 8.7 words
Average distinctiveness 14.8 words 5.2 words
Low distinctiveness (bottom 5) 15.3 words 3.1 words

That standard deviation column is doing all the work. Matching someone's average sentence length gets you maybe 10% of the way there. Matching their rhythm — the alternation between short and long — gets you to 60%.

Pattern #2: The "Signature Opener" Effect

16 out of 20 creators had a dominant opener pattern that appeared in 3+ of their 5 posts. And it wasn't random.

Here's the breakdown:

  • Question opener (7 creators): "Have you ever noticed that...?" / "What happens when...?"
  • Declarative statement (5 creators): "Here's something nobody tells you about X."
  • Personal anecdote (3 creators): "Last Tuesday I was doing X when..."
  • Contrarian hook (1 creator): "Everything you've been told about X is wrong."

The remaining 4 had no dominant pattern — they rotated between types. And here's the interesting part: those 4 were rated lowest on "voice recognizability" when I ran a blind test with readers.

Your opener isn't just a hook. It's a fingerprint. When someone reads your first sentence, they should already know it's you.

Pattern #3: Paragraph Length Is a Stronger Signal Than Vocabulary

This one surprised me. I assumed vocabulary would be the biggest differentiator — the words you use, the jargon, the catchphrases. And yes, vocabulary matters. But paragraph structure was a stronger signal by a significant margin.

When I ran a clustering algorithm on all 23 markers to see which ones separated writers most cleanly, paragraph length distribution came in #1. Specifically:

  • What percentage of paragraphs are 1 sentence?
  • What percentage are 2-3 sentences?
  • What percentage are 4+ sentences?

Some writers are "choppy" — 70%+ single-sentence paragraphs. Others are "blocky" — mostly 3-4 sentence paragraphs. And the distribution is remarkably consistent within a single writer across different topics.

The top 3 distinguishing markers, in order:

  1. Paragraph length distribution (r = 0.81)
  2. Sentence length variance (r = 0.74)
  3. Transition word preferences (r = 0.68)
  4. Vocabulary fingerprint (r = 0.61)

Vocabulary was #4. Not irrelevant — but not king.

Pattern #4: There Are Only 3 "Voice Archetypes"

When I clustered all 20 writers, they fell into three natural groups:

The Conversationalist (9/20): Short paragraphs, high use of "you" and "I," lots of rhetorical questions, casual transitions ("so," "but," "look"), soft CTAs or none at all. Most common among newsletter writers and LinkedIn creators.

The Teacher (7/20): Structured headers, medium paragraphs, uses "we" more than "I," numbered lists and frameworks, hard CTAs. Most common among course creators and consultants.

The Storyteller (4/20): Longer paragraphs, anecdote-heavy openers, low use of lists, high use of analogies and metaphors, often no CTA. Most common among dev bloggers and essayists.

Every writer was a blend, but leaned heavily into one archetype. And the interesting thing — AI tools are overwhelmingly biased toward The Teacher archetype. If you're a Conversationalist or Storyteller, that's why AI output feels "off" even with good prompting. The model's default register is Teacher.

Pattern #5: CTA Style Is Surprisingly Unique

I almost didn't track this one. Glad I did.

How a creator closes their content is one of the most distinctive markers. And it has almost no overlap with their opening style. A writer who opens with questions might close with a bold statement. A writer who opens with anecdotes might close with a direct ask.

The distribution:

  • No CTA / soft fade (6 creators): Just... ends. Sometimes with a thought-provoking question.
  • Direct ask (5 creators): "Sign up here" / "DM me" / "Try this today."
  • Callback to intro (4 creators): Circles back to the opening anecdote or question.
  • Community invitation (3 creators): "What's your experience with this? Tell me in the comments."
  • Next-in-series tease (2 creators): "Next week I'll break down..."

When I tested AI tools on these 20 writers, CTA style had the worst reproduction accuracy of any marker. Every tool defaulted to "share your thoughts in the comments" regardless of the writer's actual pattern.

Pattern #6: The "Invisible Rhythm" — Punctuation as Signature

This was the unexpected one.

Three of the most distinctive writers had unusual punctuation habits that turned out to be core to their voice:

  • One used em dashes obsessively — three to four per post — in place of commas or parentheses.
  • One used ellipsis as a pacing tool... to create pauses... where most people would use periods.
  • One used parenthetical asides (almost always with a joke or self-deprecating comment inside) in every other paragraph.

When I stripped punctuation from all 100 posts and ran the clustering again, the separation between writers got measurably worse. Punctuation carries voice information that word choice alone doesn't.

Most style-matching systems completely ignore punctuation patterns. They focus on words. That's a blind spot.

What This Means for AI Voice Matching

The takeaway from all of this is that voice is structural, not cosmetic. It lives in rhythm, paragraph shape, opener patterns, and punctuation habits — not just in word choice and "tone."

If you tell an AI "write in a casual tone," you'll get generic casual. If you tell it "write with 65% single-sentence paragraphs, sentence length standard deviation of 8+ words, open with a question, close with a callback to the intro, and use em dashes instead of parentheses" — now you're getting somewhere.

That's the whole idea behind Writing DNA. Extract the structural fingerprint. Enforce it as constraints, not suggestions.

I'm running free Writing DNA analyses for a handful of creators — if you want to see your own voice broken down into actual numbers, drop your email at tryvoiceforge.com and I'll send yours over.


This is part 3 of "The AI Voice Lab" series. Previously: I Analyzed 500 AI-Generated LinkedIn Posts and I Fed My 10 Best Blog Posts to 5 AI Tools.

Top comments (0)