binky

Posted on Jun 3

Stop Publishing Blind: Build an AI Content Scorer in 30 Minutes

#claudeapi #contentstrategy #python #automation

Your best drafts languish while you second-guess yourself. Here's how to get a concrete engagement score before the post goes live.

I've watched creators spend more time re-reading a post than writing it. The real problem isn't overthinking—it's flying blind. You don't know if the hook lands, if readers will scroll past paragraph three, or if your CTA will convert until the post is already indexed.

There's a faster way. Claude can read your draft, score it across engagement dimensions, and flag weak sections in under 30 seconds. This guide walks you through building that system, training it on your own data, and wiring it into your publishing workflow.

The Real Cost of Gut-Feel Publishing

Most publish decisions come down to fatigue. You tweak the headline, adjust the opening, and eventually ship it because you've read it too many times to judge fairly.

The issue isn't effort. It's that you lack signal until the post is live and can't be changed. You don't know:

If your headline creates enough curiosity gap
Whether the structure holds attention through the middle
If people will actually finish reading
Which sections confuse or bore readers

What you need: a pre-flight checklist that runs automatically before the draft leaves your folder. Score the headline. Rate the hook. Flag structural weakness. Estimate read-through rates. All in 30 seconds.

That's what we're building.

How It Works: The Three-Layer Architecture

The system has three parts:

Content Analyzer: Breaks your draft into scoreable dimensions (headline clarity, hook strength, readability, structure, predicted engagement)
Prediction Engine: Sends your content to Claude with strict schema constraints—forces JSON output every time, no parsing required
Calibration Loop: Feeds real engagement data back into the system prompt so predictions tighten over time

Here's the scoring schema:

EngagementScore {
headline_score: 0-100
hook_strength: 0-100
readability: 0-100
structure_score: 0-100
predicted_read_rate: 0-100 (% who finish)
predicted_share_probability: 0-100
weak_sections: [list of flagged areas]
improvement_suggestions: [actionable fixes]
overall_score: 0-100
}

Claude does the heavy lifting. You pass the full draft plus a system prompt that defines high-engagement content based on platform data. The structured output constraint forces proper JSON—no freeform prose to parse.

Building the Python CLI Tool

Setup

bash
pip install anthropic rich click python-frontmatter
export ANTHROPIC_API_KEY="sk-ant-your-key-here"

rich colors and formats terminal output. click builds the CLI. python-frontmatter parses markdown with YAML metadata (standard for static site generators).

The Core Predictor

python
import anthropic
import json
import click
import frontmatter
from pathlib import Path
from rich.console import Console
from rich.table import Table
from rich.panel import Panel
from rich.progress import Progress, SpinnerColumn, TextColumn

console = Console()

SCORING_SYSTEM_PROMPT = """You are an expert content strategist with deep knowledge of engagement metrics
across Dev.to, Hashnode, Medium, and LinkedIn. You analyze draft posts and predict engagement performance
based on proven content patterns.

When analyzing content, evaluate these dimensions:

Headline: clarity, curiosity gap, specificity, keyword relevance
Hook (first 150 words): problem identification, relatability, promise of value
Structure: use of headers, code blocks, lists, paragraph length variance
Readability: sentence complexity, jargon density, active vs passive voice
Content depth: actionable specificity vs vague generalities
CTA quality: clarity and placement of calls-to-action

Return ONLY valid JSON matching this exact schema:
{
"headline_score": ,
"hook_strength": ,
"readability": ,
"structure_score": ,
"predicted_read_rate": ,
"predicted_share_probability": ,
"overall_score": ,
"weak_sections": [, ...],
"improvement_suggestions": [, ...]
}

Be precise and critical. A score above 80 means genuinely publish-ready content."""

def analyze_draft(content: str, title: str, platform: str = "dev.to") -> dict:
"""Send draft to Claude and get structured engagement predictions."""
client = anthropic.Anthropic()

prompt = f"""Analyze this draft post for {platform} and predict its engagement performance.

TITLE: {title}

CONTENT:
{content}

Return your analysis as JSON matching the specified schema exactly."""

message = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=1024,
    system=SCORING_SYSTEM_PROMPT,
    messages=[
        {"role": "user", "content": prompt}
    ]
)

response_text = message.content[0].text.strip()

# Strip markdown code fences if Claude wraps the JSON
if response_text.startswith(""):
    lines = response_text.split("\n")
    response_text = "\n".join(lines[1:-1])

return json.loads(response_text)

def render_scores(scores: dict, title: str):
"""Render engagement prediction as a formatted terminal table."""
table = Table(title=f"Engagement Prediction: {title[:60]}", show_header=True)
table.add_column("Metric", style="cyan", width=28)
table.add_column("Score", justify="center", width=10)
table.add_column("Signal", justify="center", width=10)

metrics = [
    ("Headline", scores["headline_score"]),
    ("Hook Strength", scores["hook_strength"]),
    ("Readability", scores["readability"]),
    ("Structure", scores["structure_score"]),
    ("Predicted Read Rate", scores["predicted_read_rate"]),
    ("Share Probability", scores["predicted_share_probability"]),
]

for name, score in metrics:
    if score >= 75:
        signal = "✅"
        style = "green"
    elif score >= 50:
        signal = "⚠️"
        style = "yellow"
    else:
        signal = "❌"
        style = "red"
    table.add_row(name, f"[{style}]{score}[/{style}]", signal)

console.print(table)
console.print()

overall = scores["overall_score"]
overall_color = "green" if overall >= 75 else "yellow" if overall >= 50 else "red"
console.print(Panel(
    f"[bold {overall_color}]Overall Score: {overall}/100[/bold {overall_color}]",
    expand=False
))

if scores.get("weak_sections"):
    console.print("\n[bold red]⚠ Weak Sections:[/bold red]")
    for section in scores["weak_sections"]:
        console.print(f"  • {section}")

if scores.get("improvement_suggestions"):
    console.print("\n[bold yellow]💡 Suggestions:[/bold yellow]")
    for suggestion in scores["improvement_suggestions"]:
        console.print(f"  → {suggestion}")

@click.command()
@click.argument("filepath", type=click.Path(exists=True))
@click.option("--platform", default="dev.to", help="Target platform: dev.to, hashnode, medium, linkedin")
@click.option("--min-score", default=70, help="Minimum overall score to pass (exit code 0)")
@click.option("--json-output", is_flag=True, help="Output raw JSON instead of formatted table")
def score_post(filepath: str, platform: str, min_score: int, json_output: bool):
"""Predict engagement scores for a draft post before publishing."""
path = Path(filepath)

with Progress(SpinnerColumn(), TextColumn("[progress.description]{task.description}"), transient=True) as progress:
    progress.add_task("Reading draft...", total=None)

    if path.suffix in [".md", ".mdx"]:
        post = frontmatter.load(str(path))
        content = post.content
        title = post.get("title", path.stem)
    else:
        content = path.read_text()
        title = path.stem

with Progress(SpinnerColumn(), TextColumn("[progress.description]{task.description}"), transient=True) as progress:
    progress.add_task("Analyzing with Claude...", total=None)
    scores = analyze_draft(content, title, platform)

if json_output:
    print(json.dumps(scores, indent=2))
else:
    render_scores(scores, title)

if scores["overall_score"] < min_score:
    raise SystemExit(1)

if name == "main":
score_post()

analyze_draft() calls Claude with strict schema constraints. render_scores() formats output with color-coded signals. The CLI command ties everything together.

The exit code behavior is intentional—exit code 1 on low scores makes CI/CD integration work.

Running It

bash

Basic usage

python predictor.py my-draft-post.md

Target a specific platform

python predictor.py my-draft-post.md --platform linkedin

Fail if score is below 75

python predictor.py my-draft-post.md --min-score 75

Get raw JSON for piping

python predictor.py my-draft-post.md --json-output | jq '.overall_score'

Training on Your Own Data: Calibration That Actually Works

Here's where I hit a real problem: feeding historical engagement data directly into the user message made Claude pattern-match against my specific numbers instead of reasoning about content quality.

The fix: move historical data into the system prompt as calibration examples. This keeps the reasoning clean.

python
def build_calibrated_system_prompt(historical_data: list[dict]) -> str:
"""
Inject real engagement data as calibration examples into the system prompt.

historical_data format:
[{"title": str, "overall_score": int, "actual_views": int, 
  "actual_read_rate": float, "actual_shares": int}]
"""
base_prompt = SCORING_SYSTEM_PROMPT

if not historical_data:
    return base_prompt

calibration_section = "\n\nCALIBRATION DATA (real published posts and their actual metrics):\n"

for post in historical_data[:10]:  # Cap at 10 examples
    actual_read_pct = int(post["actual_read_rate"] * 100)
    calibration_section += (
        f'- "{post["title"][:60]}": predicted {post["overall_score"]}/100, '
        f'actual views={post["actual_views"]}, '
        f'read_rate={actual_read_pct}%, shares={post["actual_shares"]}\n'
    )

calibration_section += (
    "\nUse this data to calibrate your predictions. "
    "If your previous predictions were consistently off in one direction, adjust accordingly."
)

return base_prompt + calibration_section

def load_engagement_history(history_file: str = "engagement_history.json") -> list[dict]:
"""Load historical engagement data from a local JSON file."""
path = Path(history_file)
if not path.exists():
return []
with open(path) as f:
return json.load(f)

def record_actual_performance(
title: str,
predicted_score: int,
actual_views: int,
actual_read_rate: float,
actual_shares: int,
history_file: str = "engagement_history.json"
):
"""Append real engagement data after a post goes live."""
history = load_engagement_history(history_file)
history.append({
"title": title,
"overall_score": predicted_score,
"actual_views": actual_views,
"actual_read_rate": actual_read_rate,
"actual_shares": actual_shares
})
with open(history_file, "w") as f:
json.dump(history, f, indent=2)
console.print(f"[green]✓ Recorded performance data for '{title}'[/green]")

After a post goes live 48–72 hours, call record_actual_performance() with real metrics. After 15–20 data points, the calibrated system prompt tightens predictions because Claude sees the gap between prior predictions and actual results.

Integration: Git Hooks and CI/CD

Git Pre-Commit Hook

Save this as .git/hooks/pre-commit and run chmod +x .git/hooks/pre-commit:

bash

!/bin/bash

Scores any staged .md files before allowing a commit

STAGED_MD_FILES=$(git diff --cached --name-only --diff-filter=ACM | grep -E '.(md|mdx)$')

if [ -z "$STAGED_MD_FILES" ]; then
exit 0
fi

echo "🔍 Scoring staged content..."

FAILED=0
for FILE in $STAGED_MD_FILES; do
echo "Analyzing: $FILE"
python predictor.py "$FILE" --min-score 65
if [ $? -ne 0 ]; then
echo "❌ $FILE scored below minimum threshold"
FAILED=1
fi
done

if [ $FAILED -eq 1 ]; then
echo ""
echo "One or more posts scored below threshold. Fix the flagged issues or use 'git commit --no-verify' to bypass."
exit 1
fi

exit 0

GitHub Actions Workflow

For teams, add this as .github/workflows/content-score.yml:

yaml
name: Content Quality Check

on:
pull_request:
paths:
- 'posts//*.md'
- 'content//*.md'

jobs:
score-content:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

  - name: Set up Python
    uses: actions/setup-python@v4
    with:
      python-version: '3.11'

  - name: Install dependencies
    run: pip install anthropic rich click python-frontmatter

  - name: Score changed posts
    env:
      ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
    run: |
      CHANGED=$(git diff --name-only origin/main...HEAD | grep '\.md$' || true)
      if [ -z "$CHANGED" ]; then
        echo "No markdown files changed."
        exit 0
      fi
      for FILE in $CHANGED; do
        echo "Scoring $FILE"
        python predictor.py "$FILE" --min-score 70
      done

Store ANTHROPIC_API_KEY in GitHub Secrets.

The Final Piece: What This Actually Changes

You get three immediate wins:

Speed: Score any draft in 30 seconds instead of re-reading it five times
Objectivity: A score removes the "is this good?" guessing game
Iteration: Weak sections are flagged by name, so you know exactly what to fix

After 20 published posts, the calibrated predictions start beating your intuition because Claude sees your actual engagement patterns. It knows which of your hooks convert. It knows which structures work for your audience.

Stop publishing blind.

Follow for more practical AI and productivity content.

DEV Community