Your best drafts languish while you second-guess yourself. Here's how to get a concrete engagement score before the post goes live.
I've watched creators spend more time re-reading a post than writing it. The real problem isn't overthinking—it's flying blind. You don't know if the hook lands, if readers will scroll past paragraph three, or if your CTA will convert until the post is already indexed.
There's a faster way. Claude can read your draft, score it across engagement dimensions, and flag weak sections in under 30 seconds. This guide walks you through building that system, training it on your own data, and wiring it into your publishing workflow.
The Real Cost of Gut-Feel Publishing
Most publish decisions come down to fatigue. You tweak the headline, adjust the opening, and eventually ship it because you've read it too many times to judge fairly.
The issue isn't effort. It's that you lack signal until the post is live and can't be changed. You don't know:
- If your headline creates enough curiosity gap
- Whether the structure holds attention through the middle
- If people will actually finish reading
- Which sections confuse or bore readers
What you need: a pre-flight checklist that runs automatically before the draft leaves your folder. Score the headline. Rate the hook. Flag structural weakness. Estimate read-through rates. All in 30 seconds.
That's what we're building.
How It Works: The Three-Layer Architecture
The system has three parts:
- Content Analyzer: Breaks your draft into scoreable dimensions (headline clarity, hook strength, readability, structure, predicted engagement)
- Prediction Engine: Sends your content to Claude with strict schema constraints—forces JSON output every time, no parsing required
- Calibration Loop: Feeds real engagement data back into the system prompt so predictions tighten over time
Here's the scoring schema:
EngagementScore {
headline_score: 0-100
hook_strength: 0-100
readability: 0-100
structure_score: 0-100
predicted_read_rate: 0-100 (% who finish)
predicted_share_probability: 0-100
weak_sections: [list of flagged areas]
improvement_suggestions: [actionable fixes]
overall_score: 0-100
}
Claude does the heavy lifting. You pass the full draft plus a system prompt that defines high-engagement content based on platform data. The structured output constraint forces proper JSON—no freeform prose to parse.
Building the Python CLI Tool
Setup
bash
pip install anthropic rich click python-frontmatter
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
rich colors and formats terminal output. click builds the CLI. python-frontmatter parses markdown with YAML metadata (standard for static site generators).
The Core Predictor
python
import anthropic
import json
import click
import frontmatter
from pathlib import Path
from rich.console import Console
from rich.table import Table
from rich.panel import Panel
from rich.progress import Progress, SpinnerColumn, TextColumn
console = Console()
SCORING_SYSTEM_PROMPT = """You are an expert content strategist with deep knowledge of engagement metrics
across Dev.to, Hashnode, Medium, and LinkedIn. You analyze draft posts and predict engagement performance
based on proven content patterns.
When analyzing content, evaluate these dimensions:
- Headline: clarity, curiosity gap, specificity, keyword relevance
- Hook (first 150 words): problem identification, relatability, promise of value
- Structure: use of headers, code blocks, lists, paragraph length variance
- Readability: sentence complexity, jargon density, active vs passive voice
- Content depth: actionable specificity vs vague generalities
- CTA quality: clarity and placement of calls-to-action
Return ONLY valid JSON matching this exact schema:
{
"headline_score": ,
"hook_strength": ,
"readability": ,
"structure_score": ,
"predicted_read_rate": ,
"predicted_share_probability": ,
"overall_score": ,
"weak_sections": [, ...],
"improvement_suggestions": [, ...]
}
Be precise and critical. A score above 80 means genuinely publish-ready content."""
def analyze_draft(content: str, title: str, platform: str = "dev.to") -> dict:
"""Send draft to Claude and get structured engagement predictions."""
client = anthropic.Anthropic()
prompt = f"""Analyze this draft post for {platform} and predict its engagement performance.
TITLE: {title}
CONTENT:
{content}
Return your analysis as JSON matching the specified schema exactly."""
message = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
system=SCORING_SYSTEM_PROMPT,
messages=[
{"role": "user", "content": prompt}
]
)
response_text = message.content[0].text.strip()
# Strip markdown code fences if Claude wraps the JSON
if response_text.startswith(""):
lines = response_text.split("\n")
response_text = "\n".join(lines[1:-1])
return json.loads(response_text)
def render_scores(scores: dict, title: str):
"""Render engagement prediction as a formatted terminal table."""
table = Table(title=f"Engagement Prediction: {title[:60]}", show_header=True)
table.add_column("Metric", style="cyan", width=28)
table.add_column("Score", justify="center", width=10)
table.add_column("Signal", justify="center", width=10)
metrics = [
("Headline", scores["headline_score"]),
("Hook Strength", scores["hook_strength"]),
("Readability", scores["readability"]),
("Structure", scores["structure_score"]),
("Predicted Read Rate", scores["predicted_read_rate"]),
("Share Probability", scores["predicted_share_probability"]),
]
for name, score in metrics:
if score >= 75:
signal = "✅"
style = "green"
elif score >= 50:
signal = "⚠️"
style = "yellow"
else:
signal = "❌"
style = "red"
table.add_row(name, f"[{style}]{score}[/{style}]", signal)
console.print(table)
console.print()
overall = scores["overall_score"]
overall_color = "green" if overall >= 75 else "yellow" if overall >= 50 else "red"
console.print(Panel(
f"[bold {overall_color}]Overall Score: {overall}/100[/bold {overall_color}]",
expand=False
))
if scores.get("weak_sections"):
console.print("\n[bold red]⚠ Weak Sections:[/bold red]")
for section in scores["weak_sections"]:
console.print(f" • {section}")
if scores.get("improvement_suggestions"):
console.print("\n[bold yellow]💡 Suggestions:[/bold yellow]")
for suggestion in scores["improvement_suggestions"]:
console.print(f" → {suggestion}")
@click.command()
@click.argument("filepath", type=click.Path(exists=True))
@click.option("--platform", default="dev.to", help="Target platform: dev.to, hashnode, medium, linkedin")
@click.option("--min-score", default=70, help="Minimum overall score to pass (exit code 0)")
@click.option("--json-output", is_flag=True, help="Output raw JSON instead of formatted table")
def score_post(filepath: str, platform: str, min_score: int, json_output: bool):
"""Predict engagement scores for a draft post before publishing."""
path = Path(filepath)
with Progress(SpinnerColumn(), TextColumn("[progress.description]{task.description}"), transient=True) as progress:
progress.add_task("Reading draft...", total=None)
if path.suffix in [".md", ".mdx"]:
post = frontmatter.load(str(path))
content = post.content
title = post.get("title", path.stem)
else:
content = path.read_text()
title = path.stem
with Progress(SpinnerColumn(), TextColumn("[progress.description]{task.description}"), transient=True) as progress:
progress.add_task("Analyzing with Claude...", total=None)
scores = analyze_draft(content, title, platform)
if json_output:
print(json.dumps(scores, indent=2))
else:
render_scores(scores, title)
if scores["overall_score"] < min_score:
raise SystemExit(1)
if name == "main":
score_post()
analyze_draft() calls Claude with strict schema constraints. render_scores() formats output with color-coded signals. The CLI command ties everything together.
The exit code behavior is intentional—exit code 1 on low scores makes CI/CD integration work.
Running It
bash
Basic usage
python predictor.py my-draft-post.md
Target a specific platform
python predictor.py my-draft-post.md --platform linkedin
Fail if score is below 75
python predictor.py my-draft-post.md --min-score 75
Get raw JSON for piping
python predictor.py my-draft-post.md --json-output | jq '.overall_score'
Training on Your Own Data: Calibration That Actually Works
Here's where I hit a real problem: feeding historical engagement data directly into the user message made Claude pattern-match against my specific numbers instead of reasoning about content quality.
The fix: move historical data into the system prompt as calibration examples. This keeps the reasoning clean.
python
def build_calibrated_system_prompt(historical_data: list[dict]) -> str:
"""
Inject real engagement data as calibration examples into the system prompt.
historical_data format:
[{"title": str, "overall_score": int, "actual_views": int,
"actual_read_rate": float, "actual_shares": int}]
"""
base_prompt = SCORING_SYSTEM_PROMPT
if not historical_data:
return base_prompt
calibration_section = "\n\nCALIBRATION DATA (real published posts and their actual metrics):\n"
for post in historical_data[:10]: # Cap at 10 examples
actual_read_pct = int(post["actual_read_rate"] * 100)
calibration_section += (
f'- "{post["title"][:60]}": predicted {post["overall_score"]}/100, '
f'actual views={post["actual_views"]}, '
f'read_rate={actual_read_pct}%, shares={post["actual_shares"]}\n'
)
calibration_section += (
"\nUse this data to calibrate your predictions. "
"If your previous predictions were consistently off in one direction, adjust accordingly."
)
return base_prompt + calibration_section
def load_engagement_history(history_file: str = "engagement_history.json") -> list[dict]:
"""Load historical engagement data from a local JSON file."""
path = Path(history_file)
if not path.exists():
return []
with open(path) as f:
return json.load(f)
def record_actual_performance(
title: str,
predicted_score: int,
actual_views: int,
actual_read_rate: float,
actual_shares: int,
history_file: str = "engagement_history.json"
):
"""Append real engagement data after a post goes live."""
history = load_engagement_history(history_file)
history.append({
"title": title,
"overall_score": predicted_score,
"actual_views": actual_views,
"actual_read_rate": actual_read_rate,
"actual_shares": actual_shares
})
with open(history_file, "w") as f:
json.dump(history, f, indent=2)
console.print(f"[green]✓ Recorded performance data for '{title}'[/green]")
After a post goes live 48–72 hours, call record_actual_performance() with real metrics. After 15–20 data points, the calibrated system prompt tightens predictions because Claude sees the gap between prior predictions and actual results.
Integration: Git Hooks and CI/CD
Git Pre-Commit Hook
Save this as .git/hooks/pre-commit and run chmod +x .git/hooks/pre-commit:
bash
!/bin/bash
Scores any staged .md files before allowing a commit
STAGED_MD_FILES=$(git diff --cached --name-only --diff-filter=ACM | grep -E '.(md|mdx)$')
if [ -z "$STAGED_MD_FILES" ]; then
exit 0
fi
echo "🔍 Scoring staged content..."
FAILED=0
for FILE in $STAGED_MD_FILES; do
echo "Analyzing: $FILE"
python predictor.py "$FILE" --min-score 65
if [ $? -ne 0 ]; then
echo "❌ $FILE scored below minimum threshold"
FAILED=1
fi
done
if [ $FAILED -eq 1 ]; then
echo ""
echo "One or more posts scored below threshold. Fix the flagged issues or use 'git commit --no-verify' to bypass."
exit 1
fi
exit 0
GitHub Actions Workflow
For teams, add this as .github/workflows/content-score.yml:
yaml
name: Content Quality Check
on:
pull_request:
paths:
- 'posts//*.md'
- 'content//*.md'
jobs:
score-content:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install dependencies
run: pip install anthropic rich click python-frontmatter
- name: Score changed posts
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
CHANGED=$(git diff --name-only origin/main...HEAD | grep '\.md$' || true)
if [ -z "$CHANGED" ]; then
echo "No markdown files changed."
exit 0
fi
for FILE in $CHANGED; do
echo "Scoring $FILE"
python predictor.py "$FILE" --min-score 70
done
Store ANTHROPIC_API_KEY in GitHub Secrets.
The Final Piece: What This Actually Changes
You get three immediate wins:
- Speed: Score any draft in 30 seconds instead of re-reading it five times
- Objectivity: A score removes the "is this good?" guessing game
- Iteration: Weak sections are flagged by name, so you know exactly what to fix
After 20 published posts, the calibrated predictions start beating your intuition because Claude sees your actual engagement patterns. It knows which of your hooks convert. It knows which structures work for your audience.
Stop publishing blind.
Follow for more practical AI and productivity content.
Top comments (0)