binky

Posted on Jun 2

Build a Multi-Platform Content Repurposing API: Auto-Convert One Article Into 10 Formats

#contentrepurposing #apidevelopment #python #automation

One blog post. Ten platforms. One API call.

I built a Python service that converts long-form content into optimized Twitter threads, LinkedIn posts, YouTube descriptions, and email sequences — with working code you can deploy today.

This started with a real problem: I watched a client spend 6 hours manually reformatting a single 2,000-word article for five different platforms. That's not a content problem — that's an automation problem.

Why Manual Reformatting Kills Productivity

Most creators write once, then either skip distribution or spend more time reformatting than writing. Twitter demands punchy threads. LinkedIn wants narrative arcs with whitespace. YouTube descriptions need keyword-dense paragraphs plus timestamps. Email sequences require subject lines, preview text, and CTAs per email.

These aren't minor formatting differences. Each platform has its own editorial grammar. Switching between them, manually, for every piece of content? That's pure friction.

The solution is a transformation pipeline that understands each platform's constraints and handles the mechanical work automatically.

Architecture: Three Layers

The service has three layers:

Ingestion: Accept raw markdown
Transformation: Call Claude with platform-specific prompts in parallel
Output: Validate format constraints, return structured JSON

I used async job processing because each transformation takes 10–30 seconds. Blocking a web request that long ruins the experience. Batching multiple platform transformations into parallel API calls cuts wall-clock time significantly.

Article Markdown → ContentTransformer → [Async Tasks per Platform] → Validated Output JSON
↓
Claude API (claude-opus-4-5)
↓
Format Validator → Cache Layer

The cache layer matters for cost. If someone requests the same article transformed to Twitter format twice, you shouldn't pay for two API calls.

Setup

bash
pip install anthropic asyncio aiohttp redis python-dotenv markdown2 tiktoken

Create a .env file:

bash
ANTHROPIC_API_KEY=your_key_here
REDIS_URL=redis://localhost:6379
MAX_CONCURRENT_REQUESTS=5
CACHE_TTL_SECONDS=86400

The Core Transformation Service

python
import asyncio
import hashlib
import json
import os
from dataclasses import dataclass, field
from enum import Enum
from typing import Optional

import anthropic
import redis
from dotenv import load_dotenv

load_dotenv()

client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
cache = redis.from_url(os.getenv("REDIS_URL", "redis://localhost:6379"))

class Platform(Enum):
TWITTER_THREAD = "twitter_thread"
LINKEDIN_POST = "linkedin_post"
LINKEDIN_CAROUSEL = "linkedin_carousel"
YOUTUBE_DESCRIPTION = "youtube_description"
EMAIL_SEQUENCE = "email_sequence"
INSTAGRAM_CAPTION = "instagram_caption"
NEWSLETTER_INTRO = "newsletter_intro"
PODCAST_SHOWNOTES = "podcast_shownotes"
REDDIT_POST = "reddit_post"
FACEBOOK_POST = "facebook_post"

@dataclass
class PlatformConfig:
max_chars: Optional[int]
tone: str
structure_hints: str
output_format: str # "list", "text", "json"

PLATFORM_CONFIGS = {
Platform.TWITTER_THREAD: PlatformConfig(
max_chars=280,
tone="punchy, direct, no fluff",
structure_hints="Number each tweet 1/, 2/, etc. Hook in tweet 1. Each tweet standalone. End with CTA.",
output_format="list",
),
Platform.LINKEDIN_POST: PlatformConfig(
max_chars=3000,
tone="professional but human, first-person narrative",
structure_hints="3-line hook. Whitespace between paragraphs. 3-5 bullet insights. CTA question at end. 3-5 hashtags.",
output_format="text",
),
Platform.LINKEDIN_CAROUSEL: PlatformConfig(
max_chars=None,
tone="educational, slide-by-slide clarity",
structure_hints="Return JSON array. Each slide has 'title' (max 60 chars) and 'body' (max 150 chars). 7-10 slides. First slide is hook, last is CTA.",
output_format="json",
),
Platform.YOUTUBE_DESCRIPTION: PlatformConfig(
max_chars=5000,
tone="SEO-aware, keyword-rich first 150 chars",
structure_hints="First 2 sentences are searchable summary. Then timestamps placeholder. Then 3 paragraph expansion. Then links section. Then hashtags.",
output_format="text",
),
Platform.EMAIL_SEQUENCE: PlatformConfig(
max_chars=None,
tone="conversational, direct, one idea per email",
structure_hints="Return JSON array of 5 emails. Each has 'subject', 'preview_text' (max 90 chars), 'body', and 'cta'. Space emails across a week.",
output_format="json",
),
Platform.INSTAGRAM_CAPTION: PlatformConfig(
max_chars=2200,
tone="visual storytelling, emotional hook",
structure_hints="Hook line. Story or insight. Lesson. CTA. 10-15 hashtags on new lines.",
output_format="text",
),
Platform.NEWSLETTER_INTRO: PlatformConfig(
max_chars=500,
tone="warm, editor's-note style",
structure_hints="2-3 sentences. Why this content matters right now. What reader will get from it.",
output_format="text",
),
Platform.PODCAST_SHOWNOTES: PlatformConfig(
max_chars=None,
tone="informative, scannable",
structure_hints="Episode summary. Key topics as bullet list. 3-5 key takeaways. Guest/resource mentions.",
output_format="text",
),
Platform.REDDIT_POST: PlatformConfig(
max_chars=40000,
tone="authentic, community-aware, anti-promotional",
structure_hints="TL;DR at top. Explain context. Share actual findings. Invite discussion. No overt CTAs.",
output_format="text",
),
Platform.FACEBOOK_POST: PlatformConfig(
max_chars=63206,
tone="story-driven, shareable",
structure_hints="Relatable hook. Personal angle. 3 key points. Question to drive comments. Optional emoji use.",
output_format="text",
),
}

def build_prompt(article_markdown: str, platform: Platform) -> str:
config = PLATFORM_CONFIGS[platform]
char_constraint = f"Max total length: {config.max_chars} characters." if config.max_chars else ""

return f"""You are a professional content strategist specializing in platform-native content.

Convert the following article into optimized content for: {platform.value.replace('_', ' ').title()}

TONE: {config.tone}
STRUCTURE: {config.structure_hints}
{char_constraint}
OUTPUT FORMAT: {config.output_format} — if JSON, return only valid JSON with no surrounding text.

ARTICLE:
{article_markdown}

Return only the transformed content. No preamble, no explanation."""

async def transform_single(
article_markdown: str,
platform: Platform,
semaphore: asyncio.Semaphore,
) -> dict:
cache_key = hashlib.sha256(
f"{platform.value}:{article_markdown}".encode()
).hexdigest()

cached = cache.get(cache_key)
if cached:
    return {"platform": platform.value, "content": json.loads(cached), "cached": True}

async with semaphore:
    try:
        # Run sync anthropic client in thread pool to avoid blocking event loop
        loop = asyncio.get_event_loop()
        response = await loop.run_in_executor(
            None,
            lambda: client.messages.create(
                model="claude-opus-4-5",
                max_tokens=2048,
                messages=[{"role": "user", "content": build_prompt(article_markdown, platform)}],
            ),
        )

        raw_content = response.content[0].text
        config = PLATFORM_CONFIGS[platform]

        if config.output_format == "json":
            # Strip markdown code fences if model wrapped JSON
            if raw_content.startswith(""):
                raw_content = raw_content.split("\n", 1)[1]
                raw_content = raw_content.rsplit("", 1)[0]
                raw_content = raw_content.strip()
            parsed = json.loads(raw_content)
            output = parsed
        else:
            output = raw_content

        cache.setex(
            cache_key,
            int(os.getenv("CACHE_TTL_SECONDS", 86400)),
            json.dumps(output),
        )

        return {
            "platform": platform.value,
            "content": output,
            "cached": False,
            "tokens_used": response.usage.input_tokens + response.usage.output_tokens,
        }

    except json.JSONDecodeError as e:
        return {"platform": platform.value, "error": f"JSON parse failed: {e}", "raw": raw_content}
    except anthropic.APIError as e:
        return {"platform": platform.value, "error": str(e)}

async def repurpose_article(
article_markdown: str,
platforms: Optional[list[Platform]] = None,
max_concurrent: int = 5,
) -> dict:
if platforms is None:
platforms = list(Platform)

semaphore = asyncio.Semaphore(max_concurrent)
tasks = [transform_single(article_markdown, p, semaphore) for p in platforms]
results = await asyncio.gather(*tasks, return_exceptions=True)

output = {}
for result in results:
    if isinstance(result, Exception):
        print(f"Task failed with exception: {result}")
        continue
    output[result["platform"]] = result

return output

The semaphore limits concurrent requests to 5 by default. That prevents hammering the API. The cache layer uses SHA-256 of the platform name plus article content — identical inputs always hit cache.

Format Validation: Where Theory Meets Reality

Format validation is the practical layer. Claude is reliable, but at scale you hit edge cases: a tweet at 295 characters, a JSON email missing the subject field, or—weirdly—markdown code fences wrapped around JSON.

python
def validate_and_fix(result: dict) -> dict:
platform = Platform(result["platform"])
config = PLATFORM_CONFIGS[platform]
content = result.get("content")

if not content or "error" in result:
    return result

# Twitter thread: enforce per-tweet character limits
if platform == Platform.TWITTER_THREAD:
    if isinstance(content, str):
        tweets = [line.strip() for line in content.split("\n") if line.strip()]
    else:
        tweets = content

    fixed_tweets = []
    for tweet in tweets:
        if len(tweet) > 280:
            truncated = tweet[:277].rsplit(" ", 1)[0] + "..."
            fixed_tweets.append(truncated)
        else:
            fixed_tweets.append(tweet)

    result["content"] = fixed_tweets
    result["tweet_count"] = len(fixed_tweets)

# LinkedIn: enforce char limit and hashtag presence
elif platform == Platform.LINKEDIN_POST:
    if isinstance(content, str) and len(content) > 3000:
        result["content"] = content[:2997] + "..."
        result["truncated"] = True

    if "#" not in str(content):
        result["content"] = str(content) + "\n\n#contentmarketing #productivity"

# Email sequence: validate required JSON fields
elif platform == Platform.EMAIL_SEQUENCE:
    if isinstance(content, list):
        for i, email in enumerate(content):
            if "subject" not in email:
                email["subject"] = f"Email {i+1}"
            if "cta" not in email:
                email["cta"] = "Reply to this email with your thoughts."
            if len(email.get("preview_text", "")) > 90:
                email["preview_text"] = email["preview_text"][:87] + "..."
    result["email_count"] = len(content) if isinstance(content, list) else 0

# Newsletter intro: hard char cap
elif platform == Platform.NEWSLETTER_INTRO:
    if isinstance(content, str) and len(content) > 500:
        result["content"] = content[:497] + "..."

return result

def validate_all(results: dict) -> dict:
return {k: validate_and_fix(v) for k, v in results.items()}

This layer catches the 1-in-100 calls where JSON wraps in markdown fences, or a tweet goes over 280 characters. It's defensive but crucial at scale.

Retry Logic and Batch Processing

python
import time
from functools import wraps

def with_retry(max_retries: int = 3, backoff_base: float = 2.0):
def decorator(func):
@wraps(func)
async def wrapper(args, **kwargs):
for attempt in range(max_retries):
try:
return await func(*args, **kwargs)
except anthropic.RateLimitError:
if attempt == max_retries - 1:
raise
wait = backoff_base * attempt
print(f"Rate limited. Waiting {wait}s before retry {attempt + 1}/{max_retries}")
await asyncio.sleep(wait)
except anthropic.APIConnectionError:
if attempt == max_retries - 1:
raise
await asyncio.sleep(backoff_base ** attempt)
return None
return wrapper
return decorator

@with_retry(max_retries=3, backoff_base=2.0)
async def transform_single_with_retry(article_markdown, platform, semaphore):
return await transform_single(article_markdown, platform, semaphore)

async def process_batch(articles: list[str], platforms: list[Platform]) -> list[dict]:
"""Process multiple articles with cost management."""
all_results = []
for i, article in enumerate(articles):
print(f"Processing article {i+1}/{len(articles)}")
results = await repurpose_article(article, platforms)
validated = validate_all(results)
all_results.append(validated)

    # Small delay between articles to be a good API citizen

    if i < len(articles) - 1:

        await asyncio.sleep(1)

return all_results

The Hidden Production Bug

I hit a subtle issue with JSON platforms—email sequences and LinkedIn carousels. Claude would occasionally wrap JSON in markdown code blocks like .... That broke json.loads().

The fix was simple but took too long to find. I added preprocessing inside transform_single:

python
if raw_content.startswith(""):
raw_content = raw_content.split("\n", 1)[1] # remove first line
raw_content = raw_content.rsplit("", 1)[0] # remove closing fence
raw_content = raw_content.strip()

This runs before json.loads(). In local tests, Claude never wrapped the JSON. In production, it happened about 1 in 10 calls. The lesson: test with higher concurrency and longer sequences than you think you need.

Deployment Considerations

Before shipping this, consider:

Cost: Each transformation costs tokens. Cache aggressively and track spend by platform.
Latency: Email sequences and carousels take longer than tweets. Consider separate timeout thresholds.
Quality gates: Run spot checks on output. Email subjects should be under 60 characters. LinkedIn hashtags should exist.
Rate limits: Anthropic's API has rate limits. Use the retry decorator and respect backoff windows.

Start with one platform, validate quality, then scale to ten. The architecture handles it, but your processes need to catch edge cases your local tests missed.

This pattern—one input, many outputs, smart caching, format validation—works across any content transformation task. Apply it to code documentation, tutorials, social promos, or customer success case studies.

Follow for more practical AI and productivity content.

DEV Community