Why AI Text Gets Detected - The Linguistics Behind It

#ai #machinelearning #nlp #writing

I've been building an AI text humanizer and spent weeks studying how AI detection actually works. The results surprised me - it's not about grammar, vocabulary, or even factual accuracy. It's about statistical patterns that humans produce naturally but language models don't.

Here's what I found.

The Three Metrics That Matter

AI detectors primarily measure three properties:

1. Perplexity

Perplexity measures how predictable the next word is given the previous context. Lower perplexity = more predictable text.

Language models generate text by selecting the most probable next token. This produces consistently low perplexity. Human writing has higher perplexity because we make unexpected word choices - idioms, slang, unusual metaphors, sentence fragments.

Think of it this way: if you can easily predict what word comes next, it was probably written by AI.

2. Burstiness

Burstiness measures the variation in sentence complexity across a piece of text.

AI text has low burstiness - sentences hover around 15-20 words with similar grammatical complexity. Human text has high burstiness - a 5-word sentence followed by a 40-word one, a simple declarative followed by a complex compound-complex structure.

This is the metric I find most interesting because it maps directly to how humans think. We don't maintain a consistent "complexity level." We shift between simple and complex depending on emphasis, emotion, and flow.

3. Vocabulary Distribution

Zipf's law says that in natural language, word frequency follows a specific distribution. AI text follows this distribution almost perfectly - too perfectly. Human text deviates in characteristic ways: we overuse certain words, underuse others, and occasionally use rare words that break the expected pattern.

What This Means Practically

If you're writing with AI assistance, the fix isn't to "add errors" or "dumb it down." It's to:

Vary your rhythm - short sentences. Then a longer one. Fragment. Another long one that goes on a bit longer than expected.
Break predictability - use an unexpected word where a common one would go.
Add your voice - hedges, opinions, asides. "Honestly, this part surprised me."

I built a free tool that does this automatically: GoForTool AI Humanizer. It analyzes text for these statistical patterns and adjusts them to match human writing distributions. Everything runs in the browser - no server processing.

The irony of building AI to make AI sound less like AI isn't lost on me. But the underlying linguistics are genuinely fascinating, and understanding them makes you a better writer regardless of whether AI is involved.

What patterns have you noticed in AI-generated text? I'd love to hear what bugs people most about it.