I Let AI Write My Blog Posts, Then Scored Them for Quality — The Results Were Brutal

#typescript #ai #discuss #webdev

I write a lot. Blog posts, docs, READMEs — probably 2,000 words a week. Last month I got lazy and let AI generate a few blog paragraphs for me. They looked fine. Professional, even polished.

But something felt off. So I ran them through a readability scorer.

The numbers were brutal.

The Experiment

I took four AI-generated blog paragraphs (the kind ChatGPT/Claude produce when you say "write me a blog intro about web development") and four paragraphs I'd written myself. Then I scored all eight using textlens, an open-source text analysis library.

Here's the scoring code — it's dead simple:

import { readability, sentiment } from 'textlens';

const aiText = `In today's rapidly evolving technological landscape,
developers are constantly seeking innovative solutions to streamline
their workflows and enhance productivity. The emergence of artificial
intelligence has fundamentally transformed the way we approach
software development, offering unprecedented opportunities for
automation and optimization.`;

const humanText = `I write a lot. Blog posts, docs, READMEs — probably
2,000 words a week. Last month I got lazy and let AI write three posts
for me. They looked fine. Professional, even. But something felt off.
So I ran them through a readability scorer. The numbers were bad.`;

console.log('AI:', readability(aiText));
console.log('Human:', readability(humanText));

The Scores

Here's what came back. Lower grade level = easier to read. Higher Flesch score = more readable.

Metric	AI-Written (avg)	Human-Written (avg)	Winner
Flesch Reading Ease	-4.7	73.8	Human
FK Grade Level	19.9	5.1	Human
Gunning Fog Index	24.9	7.5	Human

Read that again. The AI text scored a negative Flesch Reading Ease. That means it's harder to read than a medical research paper. The grade level was 19.9 — you'd need a PhD candidate to comfortably read a blog post intro.

The human-written text? Grade 5. Any teenager could read it.

Why AI Text Scores So Poorly

It comes down to two things that readability formulas actually measure: sentence length and syllable count.

AI defaults to long, compound sentences packed with multi-syllable jargon. Here's a real example from my test:

AI wrote:

"The implementation of sentiment analysis algorithms represents a fascinating intersection of natural language processing and machine learning technologies."

I would write:

"Sentiment analysis sounds complex, but the code is simple."

Same topic. The AI version has a Gunning Fog score of 28.4 (postgraduate level). Mine scores 7.5 (7th grade).

AI also loves filler words. "Leverage," "innovative," "comprehensive," "unprecedented" — words that add syllables without adding meaning. Real developers don't talk like that. We say "use," "new," and "full."

The Sentiment Surprise

One thing I didn't expect: AI text consistently scored more positive in sentiment analysis. Every AI paragraph came back with a positive sentiment score, while my writing was closer to neutral.

import { sentiment } from 'textlens';

const aiResult = sentiment(aiText);
// { score: 4, comparative: 0.074, positive: ['innovative', ...], ... }

const humanResult = sentiment(humanText);
// { score: -1, comparative: -0.034, positive: [], negative: ['lazy', 'bad'] }

AI text is relentlessly upbeat. Words like "exciting," "powerful," "exceptional," and "revolutionary" show up constantly. My writing had words like "lazy" and "bad" — because I was being honest. Readers can smell fake enthusiasm.

What I Actually Do Now

I didn't stop using AI for drafts. But I added a scoring step to my workflow:

import { analyze } from 'textlens';

function checkDraft(text: string) {
  const result = analyze(text);
  const { fleschReadingEase, fleschKincaidGrade } = result.readability;

  if (fleschKincaidGrade.score > 10) {
    console.warn(`⚠️ Grade level ${fleschKincaidGrade.score} — too complex`);
    console.warn('Simplify sentences and reduce jargon.');
  }

  if (fleschReadingEase.score < 50) {
    console.warn(`⚠️ Flesch score ${fleschReadingEase.score} — hard to read`);
  }

  console.log(`✅ Grade: ${fleschKincaidGrade.score} | Flesch: ${fleschReadingEase.score}`);
}

My rule: nothing ships above grade 8. If AI gives me a grade-16 paragraph, I rewrite it until the score drops. It usually takes 30 seconds — shorten the sentences, swap the fancy words.

The Takeaway

AI is great at generating volume. It's terrible at generating readable text. The irony is that AI produces writing that sounds impressive but performs poorly — high bounce rates, low engagement, readers who skim and leave.

The fix isn't to avoid AI. The fix is to measure what you publish.

Readability isn't subjective. It's math. Sentence length, syllable count, word frequency — these are numbers you can check before you hit publish.

So next time AI writes something for you, don't just skim it and ship it. Score it.

The tool I used: textlens — zero-dependency text analysis for Node.js. Readability scores, sentiment analysis, keyword extraction, and more. Install it with npm install textlens and run npx textlens "your text here" to try it.

What's your experience with AI-generated content quality? Have you measured it, or just eyeballed it? Drop a comment — I'm curious if others are seeing the same patterns.