I used LLMs to rewrite meta descriptions for 1,600 articles — honest results

#ai #webdev #seo #llm

Meta descriptions are the most underrated SEO element on content-heavy sites.

They don't affect rankings directly, but they determine whether someone clicks your result in Google. A bad meta description on a well-ranked article is traffic you're leaving on the table.

I had 1,600+ cybersecurity articles. About 40% had no meta description at all. Another 30% had descriptions that were either truncated, keyword-stuffed, or copy-pasted from the first paragraph (which almost never makes a good description).

So I automated the rewrite. Here's what actually happened.

The constraint: 140-160 characters, every time

The rule is simple and brutal: meta descriptions must be 140-160 characters. Not words — characters. Including spaces.

Under 140: Google often ignores your description and rewrites it automatically (usually badly).
Over 160: truncated with "…" in search results, which kills CTR.

This is harder than it sounds when you're generating text with an LLM. The model has no natural understanding of character counts — it optimizes for coherence, not length.

My first naive prompt:

Write a meta description for this article about {topic}.
Keep it under 160 characters.

Results: descriptions ranging from 95 to 210 characters. Useless.

The prompt engineering that actually worked

After a lot of iteration, the prompt that consistently landed in the 140-160 range:

Write a meta description for this cybersecurity article.

Rules:
- EXACTLY 140 to 160 characters (count carefully, including spaces)
- Start with an action verb or a direct hook
- Include the main topic and one concrete benefit
- No buzzwords (comprehensive, ultimate, complete)
- No "In this article" or "This guide"

Article title: {title}
Article excerpt: {excerpt}
Main keywords: {keywords}

Output only the description, nothing else.

The key changes:

"EXACTLY" instead of "under" — models respect hard constraints better than soft ones
Positive framing of what to include, not just what to avoid
Strip all meta-commentary — "Output only the description" eliminates the model explaining what it did

Even with this prompt, I got out-of-range results ~15% of the time. So I added a validation + retry loop.

The validation pipeline

import re

def validate_meta_description(desc: str) -> dict:
    length = len(desc)

    issues = []
    if length < 140:
        issues.append(f"Too short: {length} chars (min 140)")
    if length > 160:
        issues.append(f"Too long: {length} chars (max 160)")
    if desc.startswith(("In this", "This article", "This guide")):
        issues.append("Starts with forbidden phrase")
    if re.search(r'\b(comprehensive|ultimate|complete)\b', desc, re.I):
        issues.append("Contains buzzword")

    return {
        "valid": len(issues) == 0,
        "length": length,
        "issues": issues,
    }

def generate_meta_description(title: str, excerpt: str, keywords: list, 
                               max_retries: int = 3) -> str:
    for attempt in range(max_retries):
        desc = call_llm(build_prompt(title, excerpt, keywords))
        result = validate_meta_description(desc)

        if result["valid"]:
            return desc

        # Retry with explicit correction hint
        if attempt < max_retries - 1:
            hint = f"Previous attempt failed: {', '.join(result['issues'])}. Try again."
            # inject hint into next prompt

    return None  # manual review needed

After 3 retries, I flagged remaining failures for manual review. About 4% needed human intervention.

The results: honest numbers

I ran this across 640 articles (the ones with missing or clearly bad descriptions first).

Outcome	Count	%
Valid on first try	487	76%
Valid after retry	115	18%
Failed (manual review)	38	6%

Quality assessment (I manually reviewed a random sample of 80):

71% — better than what I had before
22% — similar quality
7% — worse (usually missing context that wasn't in the excerpt)

The 7% worse cases had a common pattern: articles where the excerpt was weak or missing. The model had nothing to work with. This is the content problem again — LLMs can't fix bad source material.

What I measured in search console

I waited 6 weeks after the bulk update before looking at data (Google needs time to recrawl and the signal needs to stabilize).

Results on the articles that were updated vs. a control group that wasn't:

CTR: +0.8 percentage points average (statistically significant at this scale)
Impressions: unchanged (as expected — meta descriptions don't affect rankings)
Position: unchanged (also expected)

0.8pp CTR improvement across 640 articles with meaningful traffic adds up. It's not a dramatic transformation — anyone promising dramatic results from meta description optimization is lying to you. But it's real and it's free once the pipeline is built.

The unexpected failure: duplicate descriptions

One thing I didn't anticipate: the model started producing structurally similar descriptions across articles in the same category.

When I had 50 guides about Active Directory security, many descriptions ended up following the same pattern:

"Learn how to [verb] [AD concept] to protect your environment from [threat]. Step-by-step guide with [tool]."

Technically valid. Practically, if someone searches and sees 5 results from the same site with near-identical descriptions, they'll click none of them.

Fix: I added a deduplication check that compares new descriptions against already-generated ones using simple n-gram similarity. If similarity > 0.7, force a regeneration with an explicit instruction to use a different structure.

Things I'd do differently

1. Fix excerpts before running LLM generation

The quality of the generated description is directly proportional to the quality of the excerpt. I should have audited and fixed all excerpts first. I did it in the wrong order.

2. Category-specific prompts

A prompt for a "news" article should be different from a "guide" or "checklist" article. News descriptions need urgency; guides need the benefit; checklists need the scope. I used one prompt for everything and paid for it in quality.

3. Track CTR per article, not just aggregate

I know the average improved, but I don't know which specific articles drove the improvement. Better instrumentation would let me learn which description styles work for which query intents.

The actual takeaway

LLMs are genuinely useful for this kind of bulk text generation task if you:

Write a tight prompt with hard constraints
Build validation + retry logic (don't trust the model to self-validate)
Have decent source material to work from
Measure the actual downstream metric (CTR), not a proxy

They're not useful if you expect them to compensate for bad content strategy. Garbage in, slightly better-formatted garbage out.

I run AYI NEDJIMI Consultants, a cybersecurity consulting firm. Content covers pentesting, Active Directory, cloud security and compliance. 17 free hardening checklists available (PDF + Excel) — FortiGate, Palo Alto, pfSense, Active Directory and more.