Olamide Olaniyan

Posted on Feb 23

Build a Video-to-Blog Post Converter That Works Across Every Platform

#python #webdev #tutorial #programming

You published a banger 10-minute video last week. Thousands of views. Good engagement. And then it's gone — buried in the algorithm.

Meanwhile, a written version of that same content would rank on Google for years. But who has time to manually transcribe and rewrite every video?

Let's build a Video-to-Blog Converter that:

Grabs the transcript from any video (TikTok, YouTube, Instagram, Twitter)
Cleans and structures it automatically
Uses AI to rewrite it as an SEO-optimized blog post
Suggests titles, meta descriptions, and internal links

One video → one blog post → years of organic traffic. Let's go.

Why Every Video Should Become a Blog Post

Here's something most creators miss: video and search audiences barely overlap.

Google indexes text, not video. Your YouTube video might get 50K views in the first week and then plateau. But a blog post covering the same topic can compound traffic for years.

The math:

A 10-minute video = ~1,500 words of spoken content
A 1,500-word blog post = competitive for most long-tail keywords
Time to manually transcribe + edit = 2 hours
Time with this tool = 30 seconds

The Stack

Python: Language
SociaVault API: Multi-platform transcript endpoints
OpenAI: Content transformation
python-dotenv: Config management

Step 1: Setup

mkdir video-to-blog
cd video-to-blog
pip install requests openai python-dotenv

Create .env:

SOCIAVAULT_API_KEY=your_key_here
OPENAI_API_KEY=your_openai_key

Step 2: Universal Transcript Fetcher

The beauty here — SociaVault has transcript endpoints for every major platform. One function handles them all.

Create converter.py:

import os
import re
import json
import requests
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

API_BASE = "https://api.sociavault.com"
HEADERS = {"Authorization": f"Bearer {os.getenv('SOCIAVAULT_API_KEY')}"}
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))


def detect_platform(url: str) -> str:
    """Auto-detect which platform a URL belongs to."""
    url_lower = url.lower()

    if "tiktok.com" in url_lower:
        return "tiktok"
    elif "youtube.com" in url_lower or "youtu.be" in url_lower:
        return "youtube"
    elif "instagram.com" in url_lower:
        return "instagram"
    elif "twitter.com" in url_lower or "x.com" in url_lower:
        return "twitter"
    elif "facebook.com" in url_lower or "fb.watch" in url_lower:
        return "facebook"
    else:
        raise ValueError(f"Unsupported platform: {url}")


def get_transcript(url: str) -> dict:
    """Fetch transcript from any supported platform."""
    platform = detect_platform(url)

    endpoints = {
        "tiktok": "/v1/scrape/tiktok/transcript",
        "youtube": "/v1/scrape/youtube/transcript",
        "instagram": "/v1/scrape/instagram/transcript",
        "twitter": "/v1/scrape/twitter/transcript",
        "facebook": "/v1/scrape/facebook/transcript",
    }

    endpoint = endpoints.get(platform)
    if not endpoint:
        raise ValueError(f"No transcript endpoint for {platform}")

    print(f"📥 Fetching {platform} transcript...")

    response = requests.get(
        f"{API_BASE}{endpoint}",
        params={"url": url},
        headers=HEADERS
    )
    response.raise_for_status()

    data = response.json().get("data", {})

    # Normalize transcript format across platforms
    transcript_text = ""

    if isinstance(data, str):
        transcript_text = data
    elif isinstance(data, dict):
        transcript_text = data.get("transcript", "") or data.get("text", "")

        # Handle timestamped segments
        if not transcript_text and "segments" in data:
            transcript_text = " ".join(
                seg.get("text", "") for seg in data["segments"]
            )
    elif isinstance(data, list):
        transcript_text = " ".join(
            item.get("text", "") if isinstance(item, dict) else str(item)
            for item in data
        )

    word_count = len(transcript_text.split())
    print(f"  ✓ Got {word_count} words from {platform}")

    return {
        "platform": platform,
        "url": url,
        "transcript": transcript_text,
        "word_count": word_count,
        "raw_data": data,
    }

Step 3: AI Blog Post Generator

Here's where the transcript becomes a polished blog post:

def transcript_to_blog(
    transcript_data: dict,
    target_keyword: str = None,
    tone: str = "conversational but authoritative",
    word_count_target: int = 1500,
) -> dict:
    """Convert a video transcript into an SEO-optimized blog post."""

    transcript = transcript_data["transcript"]
    platform = transcript_data["platform"]

    if not transcript.strip():
        raise ValueError("Empty transcript — video might not have speech")

    print(f"\n✍️  Converting {transcript_data['word_count']} words to blog post...")

    keyword_instruction = ""
    if target_keyword:
        keyword_instruction = f"""
        Target SEO keyword: "{target_keyword}"
        - Include it in the title, first paragraph, and 2-3 subheadings
        - Use it naturally 3-5 times in the body
        - Include related long-tail variations
        """

    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{
            "role": "user",
            "content": f"""Convert this video transcript into a polished blog post.

TRANSCRIPT (from {platform}):
{transcript[:6000]}

INSTRUCTIONS:
- Write ~{word_count_target} words
- Tone: {tone}
- Structure with H2 and H3 headings
- Add an engaging introduction that hooks the reader
- Break into scannable sections
- Add a clear conclusion with a takeaway
- Write like a human, not AI — use short sentences, contractions, and personality
- Keep the speaker's original insights and examples
- Don't add information that wasn't in the transcript
- Remove verbal tics (um, uh, like, you know, so basically)
{keyword_instruction}

Return JSON:
{{
  "title": "SEO-optimized blog title",
  "meta_description": "155 character meta description",
  "slug": "url-friendly-slug",
  "blog_post": "Full markdown blog post with headings",
  "tags": ["5 relevant tags"],
  "estimated_read_time": "X min read",
  "seo_suggestions": ["3 internal linking opportunities"],
  "social_snippets": {{
    "twitter": "Tweet-length summary",
    "linkedin": "LinkedIn post summary (2-3 sentences)"
  }}
}}"""
        }],
        response_format={"type": "json_object"}
    )

    result = json.loads(completion.choices[0].message.content)

    actual_words = len(result["blog_post"].split())
    print(f"  ✓ Generated {actual_words}-word blog post")
    print(f"  📝 Title: {result['title']}")
    print(f"  🔗 Slug: /{result['slug']}")
    print(f"  ⏱️  {result['estimated_read_time']}")

    return result

Step 4: Batch Converter for Content Libraries

Got a backlog of videos? Convert them all:

def batch_convert(urls: list, output_dir: str = "blog_posts") -> list:
    """Convert multiple videos to blog posts."""

    os.makedirs(output_dir, exist_ok=True)
    results = []

    print(f"\n🔄 Converting {len(urls)} videos to blog posts...\n")

    for i, url in enumerate(urls, 1):
        print(f"{'─' * 50}")
        print(f"[{i}/{len(urls)}] {url}")

        try:
            transcript = get_transcript(url)

            if transcript["word_count"] < 50:
                print(f"  ⚠️  Skipping — only {transcript['word_count']} words")
                continue

            blog = transcript_to_blog(transcript)

            # Save as markdown file
            filename = f"{blog['slug']}.md"
            filepath = os.path.join(output_dir, filename)

            frontmatter = f"""---
title: "{blog['title']}"
description: "{blog['meta_description']}"
tags: {json.dumps(blog['tags'])}
source_video: "{url}"
source_platform: "{transcript['platform']}"
read_time: "{blog['estimated_read_time']}"
---

"""

            with open(filepath, "w", encoding="utf-8") as f:
                f.write(frontmatter + blog["blog_post"])

            print(f"  💾 Saved: {filepath}")

            results.append({
                "url": url,
                "title": blog["title"],
                "slug": blog["slug"],
                "file": filepath,
                "word_count": len(blog["blog_post"].split()),
            })

        except Exception as e:
            print(f"  ❌ Error: {e}")
            results.append({"url": url, "error": str(e)})

    # Summary
    successful = [r for r in results if "title" in r]
    print(f"\n{'═' * 50}")
    print(f"✅ Converted: {len(successful)}/{len(urls)} videos")

    total_words = sum(r["word_count"] for r in successful)
    print(f"📝 Total content: {total_words:,} words")
    print(f"📁 Output: {output_dir}/")

    return results

Step 5: Content Calendar Generator

Plan which videos to convert based on performance:

def suggest_conversion_priority(video_urls_with_views: list) -> list:
    """Rank videos by conversion priority."""

    print("\n📊 Analyzing conversion priority...\n")

    scored = []

    for item in video_urls_with_views:
        url = item["url"]
        views = item.get("views", 0)

        # Higher views = more proven topic
        # Longer videos = more content to work with
        transcript = get_transcript(url)

        word_count = transcript["word_count"]

        # Score: views (topic validation) × words (content depth)
        score = 0
        if word_count >= 500:
            score += 40  # Enough content for a full post
        elif word_count >= 200:
            score += 20
        else:
            score += 5

        if views >= 100000:
            score += 50
        elif views >= 10000:
            score += 35
        elif views >= 1000:
            score += 20
        else:
            score += 10

        scored.append({
            "url": url,
            "views": views,
            "word_count": word_count,
            "score": score,
            "platform": transcript["platform"],
        })

    scored.sort(key=lambda x: x["score"], reverse=True)

    print("Priority ranking:")
    for i, item in enumerate(scored, 1):
        emoji = "🟢" if item["score"] >= 60 else "🟡" if item["score"] >= 30 else "🔴"
        print(f"  {emoji} {i}. [{item['platform']}] Score: {item['score']}")
        print(f"     {item['url']}")
        print(f"     {item['views']:,} views | {item['word_count']} words")
        print()

    return scored

Step 6: Full Pipeline CLI

def main():
    import sys

    if len(sys.argv) < 2:
        print("Video-to-Blog Converter")
        print()
        print("Usage:")
        print("  python converter.py convert <video_url>")
        print("  python converter.py convert <url> --keyword 'target keyword'")
        print("  python converter.py batch urls.txt")
        print("  python converter.py transcript <url>")
        print()
        print("Supported platforms: TikTok, YouTube, Instagram, Twitter/X, Facebook")
        return

    command = sys.argv[1]

    if command == "transcript":
        url = sys.argv[2]
        result = get_transcript(url)
        print(f"\n--- TRANSCRIPT ({result['word_count']} words) ---\n")
        print(result["transcript"][:2000])
        if result["word_count"] > 400:
            print(f"\n... ({result['word_count'] - 400} more words)")

    elif command == "convert":
        url = sys.argv[2]
        keyword = None
        if "--keyword" in sys.argv:
            idx = sys.argv.index("--keyword")
            keyword = sys.argv[idx + 1]

        transcript = get_transcript(url)
        blog = transcript_to_blog(transcript, target_keyword=keyword)

        # Save it
        filename = f"{blog['slug']}.md"
        frontmatter = f"""---
title: "{blog['title']}"
description: "{blog['meta_description']}"
tags: {json.dumps(blog['tags'])}
source: "{url}"
---

"""
        with open(filename, "w", encoding="utf-8") as f:
            f.write(frontmatter + blog["blog_post"])

        print(f"\n💾 Saved to {filename}")

        print(f"\n🐦 Twitter: {blog['social_snippets']['twitter']}")
        print(f"\n💼 LinkedIn: {blog['social_snippets']['linkedin']}")

    elif command == "batch":
        filepath = sys.argv[2]
        with open(filepath) as f:
            urls = [line.strip() for line in f if line.strip()]
        batch_convert(urls)


if __name__ == "__main__":
    main()

Running It

# Convert a single YouTube video
python converter.py convert "https://youtube.com/watch?v=abc123"

# Convert with SEO keyword targeting
python converter.py convert "https://youtube.com/watch?v=abc123" --keyword "python web scraping"

# Just grab the transcript
python converter.py transcript "https://tiktok.com/@user/video/123"

# Batch convert from a file of URLs
python converter.py batch my_videos.txt

Sample Output

From a 12-minute YouTube tutorial:

📥 Fetching youtube transcript...
  ✓ Got 1,847 words from youtube

✍️  Converting 1,847 words to blog post...
  ✓ Generated 1,623-word blog post
  📝 Title: How to Build a REST API with Express.js in 2025
  🔗 Slug: /build-rest-api-express-2025
  ⏱️  7 min read

💾 Saved to build-rest-api-express-2025.md

🐦 Twitter: Just converted my Express.js tutorial into a blog
   post automatically. 1,800 words of video → 1,600 words of
   SEO-optimized content in 30 seconds.

💼 LinkedIn: Published a comprehensive guide on building REST
   APIs with Express.js. Originally a video tutorial, now
   optimized for search with structured headings and examples.

The Content Multiplication Strategy

Here's the real play:

Record one video with your expertise
Convert to blog post (this tool)
Extract tweet threads from key sections
Pull LinkedIn posts from the social snippets
Create email newsletter from the summary

One video → 5 pieces of content. Every week.

The creators winning right now aren't creating 5x more content. They're repurposing 5x more effectively.

Cost Comparison

Method	Cost	Time
Manual transcription	Free	2-3 hours
Rev.com transcription	$1.50/min	12 hours turnaround
Descript	$24/mo	30 min editing
This tool	~$0.02/video	30 seconds

The SociaVault transcript costs 1 credit per video. OpenAI rewrite costs about $0.01. Total: pennies per blog post.

Get Started

Get your API key at sociavault.com
Pick your best-performing video
Convert it and publish — it'll probably rank within a month

Your best content is already recorded. It's just stuck in video format where Google can't find it.

Every video you've ever made is a blog post waiting to happen. Stop leaving SEO traffic on the table.

python #contentcreation #seo #webdev

DEV Community