Olamide Olaniyan

Posted on Jan 26

Building an AI Content Repurposer: Turn 1 Video into 10 Pieces (Full Tutorial)

#webdev #programming #ai #tutorial

Creating content is exhausting.

You spend hours making a YouTube video, then realize you need:

A blog post
5 Twitter threads
3 LinkedIn posts
Instagram captions
TikTok scripts
Newsletter content

That's another 10+ hours of work.

Or... you could automate it.

I built a system that takes any video URL and generates all these formats automatically. Today I'll show you exactly how.

What We're Building

An AI-powered content repurposer that:

Extracts the transcript from any video (YouTube, TikTok, Instagram)
Analyzes the content structure and key points
Generates platform-optimized content for 10+ formats
Outputs ready-to-post content with proper formatting

One input → Ten outputs. Let's build it.

The Architecture

┌─────────────────────────────────────────────────────────┐
│                     Video URL Input                      │
└─────────────────────────┬───────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────┐
│              Transcript Extraction Layer                 │
│                                                          │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐                │
│  │ YouTube  │ │  TikTok  │ │Instagram │                │
│  │   API    │ │   API    │ │   API    │                │
│  └──────────┘ └──────────┘ └──────────┘                │
└─────────────────────────┬───────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────┐
│              Content Analysis (GPT-4)                    │
│                                                          │
│  • Key points extraction                                │
│  • Topic identification                                 │
│  • Audience analysis                                    │
│  • Tone detection                                       │
└─────────────────────────┬───────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────┐
│          Platform-Specific Generation (GPT-4)           │
│                                                          │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐      │
│  │  Blog   │ │ Twitter │ │LinkedIn │ │  Email  │      │
│  │  Post   │ │ Thread  │ │  Posts  │ │Newsletter│     │
│  └─────────┘ └─────────┘ └─────────┘ └─────────┘      │
└─────────────────────────────────────────────────────────┘

Step 1: Project Setup

mkdir content-repurposer && cd content-repurposer
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

pip install requests openai python-dotenv tiktoken

Create .env:

SOCIAVAULT_API_KEY=your_sociavault_key
OPENAI_API_KEY=your_openai_key

Step 2: Transcript Extraction

First, we need to get the video transcript from any platform:

# transcript.py
import os
import re
import requests
from typing import Optional, Dict
from urllib.parse import urlparse
from dotenv import load_dotenv

load_dotenv()

class TranscriptExtractor:
    def __init__(self):
        self.api_key = os.getenv("SOCIAVAULT_API_KEY")
        self.base_url = "https://api.sociavault.com/v1"
        self.headers = {"Authorization": f"Bearer {self.api_key}"}

    def extract(self, url: str) -> Dict:
        """Extract transcript from any supported video URL."""

        platform = self._detect_platform(url)

        if platform == "youtube":
            return self._extract_youtube(url)
        elif platform == "tiktok":
            return self._extract_tiktok(url)
        elif platform == "instagram":
            return self._extract_instagram(url)
        else:
            raise ValueError(f"Unsupported platform: {platform}")

    def _detect_platform(self, url: str) -> str:
        """Detect which platform the URL is from."""

        domain = urlparse(url).netloc.lower()

        if "youtube.com" in domain or "youtu.be" in domain:
            return "youtube"
        elif "tiktok.com" in domain:
            return "tiktok"
        elif "instagram.com" in domain:
            return "instagram"
        else:
            return "unknown"

    def _extract_youtube(self, url: str) -> Dict:
        """Extract transcript from YouTube video."""

        # Extract video ID
        video_id = self._extract_youtube_id(url)

        response = requests.get(
            f"{self.base_url}/scrape/youtube/transcript",
            params={"videoId": video_id},
            headers=self.headers,
            timeout=60
        )
        response.raise_for_status()

        data = response.json().get("data", {})

        # Also get video metadata
        meta_response = requests.get(
            f"{self.base_url}/scrape/youtube/video",
            params={"videoId": video_id},
            headers=self.headers,
            timeout=30
        )
        meta = meta_response.json().get("data", {})

        return {
            "platform": "youtube",
            "title": meta.get("title", ""),
            "description": meta.get("description", ""),
            "transcript": data.get("transcript", ""),
            "duration": meta.get("duration", 0),
            "views": meta.get("viewCount", 0),
            "author": meta.get("channelTitle", ""),
            "url": url
        }

    def _extract_tiktok(self, url: str) -> Dict:
        """Extract transcript from TikTok video."""

        response = requests.get(
            f"{self.base_url}/scrape/tiktok/transcript",
            params={"url": url},
            headers=self.headers,
            timeout=60
        )
        response.raise_for_status()

        data = response.json().get("data", {})

        return {
            "platform": "tiktok",
            "title": data.get("description", "")[:100],
            "description": data.get("description", ""),
            "transcript": data.get("transcript", ""),
            "duration": data.get("duration", 0),
            "views": data.get("playCount", 0),
            "author": data.get("author", {}).get("uniqueId", ""),
            "url": url
        }

    def _extract_instagram(self, url: str) -> Dict:
        """Extract transcript from Instagram Reel."""

        response = requests.get(
            f"{self.base_url}/scrape/instagram/transcript",
            params={"url": url},
            headers=self.headers,
            timeout=60
        )
        response.raise_for_status()

        data = response.json().get("data", {})

        return {
            "platform": "instagram",
            "title": data.get("caption", "")[:100],
            "description": data.get("caption", ""),
            "transcript": data.get("transcript", ""),
            "duration": data.get("duration", 0),
            "views": data.get("playCount", 0),
            "author": data.get("ownerUsername", ""),
            "url": url
        }

    def _extract_youtube_id(self, url: str) -> str:
        """Extract video ID from YouTube URL."""

        patterns = [
            r"(?:v=|\/)([0-9A-Za-z_-]{11}).*",
            r"(?:embed\/)([0-9A-Za-z_-]{11})",
            r"(?:youtu\.be\/)([0-9A-Za-z_-]{11})"
        ]

        for pattern in patterns:
            match = re.search(pattern, url)
            if match:
                return match.group(1)

        raise ValueError("Could not extract YouTube video ID")

Step 3: Content Analyzer

Now let's analyze the transcript to understand its structure:

# analyzer.py
import os
from openai import OpenAI
from typing import Dict, List
from dotenv import load_dotenv

load_dotenv()

class ContentAnalyzer:
    def __init__(self):
        self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

    def analyze(self, video_data: Dict) -> Dict:
        """Analyze video content and extract key information."""

        transcript = video_data.get("transcript", "")
        title = video_data.get("title", "")

        if not transcript:
            raise ValueError("No transcript available")

        prompt = f"""Analyze this video transcript and extract the following:

TITLE: {title}

TRANSCRIPT:
{transcript[:8000]}  # Limit to avoid token limits

Please provide:

1. MAIN_TOPIC: What is this video primarily about? (1 sentence)

2. KEY_POINTS: List the 5-7 main takeaways (bullet points)

3. TARGET_AUDIENCE: Who would find this valuable? (1-2 sentences)

4. TONE: Is this educational, entertaining, inspirational, controversial, or mixed?

5. HOOK: What's the most compelling/attention-grabbing element?

6. QUOTABLE_MOMENTS: 3-5 short quotes that stand alone well

7. ACTIONABLE_TIPS: Any specific, actionable advice given

8. STORY_ELEMENTS: Any personal stories or examples used

Format your response as JSON."""

        response = self.client.chat.completions.create(
            model="gpt-4-turbo-preview",
            messages=[
                {"role": "system", "content": "You are a content analyst. Respond only with valid JSON."},
                {"role": "user", "content": prompt}
            ],
            response_format={"type": "json_object"},
            temperature=0.3
        )

        import json
        analysis = json.loads(response.choices[0].message.content)

        return {
            **video_data,
            "analysis": analysis
        }

Step 4: Multi-Platform Generator

The main engine that generates content for each platform:

# generator.py
import os
from openai import OpenAI
from typing import Dict, List
from dataclasses import dataclass
from dotenv import load_dotenv

load_dotenv()

@dataclass
class GeneratedContent:
    platform: str
    format_type: str
    content: str
    character_count: int
    ready_to_post: bool

class ContentGenerator:
    def __init__(self):
        self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

    def generate_all(self, analyzed_data: Dict) -> List[GeneratedContent]:
        """Generate content for all platforms."""

        results = []

        # Blog post
        results.append(self._generate_blog_post(analyzed_data))

        # Twitter/X thread
        results.extend(self._generate_twitter_thread(analyzed_data))

        # LinkedIn posts (3 variations)
        results.extend(self._generate_linkedin_posts(analyzed_data))

        # Instagram caption
        results.append(self._generate_instagram_caption(analyzed_data))

        # Email newsletter
        results.append(self._generate_newsletter(analyzed_data))

        # TikTok script (for repurposing to your own TikTok)
        results.append(self._generate_tiktok_script(analyzed_data))

        return results

    def _generate_blog_post(self, data: Dict) -> GeneratedContent:
        """Generate SEO-optimized blog post."""

        analysis = data.get("analysis", {})
        transcript = data.get("transcript", "")

        prompt = f"""Transform this video content into an SEO-optimized blog post.

VIDEO ANALYSIS:
- Topic: {analysis.get('MAIN_TOPIC', '')}
- Key Points: {analysis.get('KEY_POINTS', [])}
- Target Audience: {analysis.get('TARGET_AUDIENCE', '')}

TRANSCRIPT:
{transcript[:6000]}

Requirements:
1. Write a compelling headline (60 characters max)
2. Include an intro that hooks the reader
3. Use H2 and H3 subheadings
4. Include the key points as scannable sections
5. Add a conclusion with call-to-action
6. Target 1,200-1,500 words
7. Write in a conversational, engaging tone
8. Include a meta description (155 characters)

Format:
---
META_DESCRIPTION: [description]
---

# [HEADLINE]

[CONTENT]"""

        response = self.client.chat.completions.create(
            model="gpt-4-turbo-preview",
            messages=[
                {"role": "system", "content": "You are an expert blog writer who creates engaging, SEO-friendly content."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.7,
            max_tokens=3000
        )

        content = response.choices[0].message.content

        return GeneratedContent(
            platform="blog",
            format_type="long_form_article",
            content=content,
            character_count=len(content),
            ready_to_post=True
        )

    def _generate_twitter_thread(self, data: Dict) -> List[GeneratedContent]:
        """Generate Twitter/X thread."""

        analysis = data.get("analysis", {})

        prompt = f"""Create a viral Twitter thread from this content.

CONTENT SUMMARY:
- Topic: {analysis.get('MAIN_TOPIC', '')}
- Key Points: {analysis.get('KEY_POINTS', [])}
- Quotable Moments: {analysis.get('QUOTABLE_MOMENTS', [])}
- Hook: {analysis.get('HOOK', '')}

Requirements:
1. Tweet 1: Strong hook that makes people want to read (no "Thread:" prefix)
2. Tweets 2-8: Key insights, one per tweet
3. Final tweet: Call-to-action and engagement prompt
4. Each tweet must be under 280 characters
5. Use line breaks for readability
6. Include 1-2 relevant emojis per tweet (not excessive)
7. End some tweets with incomplete thoughts to encourage reading next

Format each tweet on a new line, separated by ---"""

        response = self.client.chat.completions.create(
            model="gpt-4-turbo-preview",
            messages=[
                {"role": "system", "content": "You are a Twitter expert who creates viral threads."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.8,
            max_tokens=2000
        )

        content = response.choices[0].message.content
        tweets = [t.strip() for t in content.split("---") if t.strip()]

        return [
            GeneratedContent(
                platform="twitter",
                format_type=f"thread_tweet_{i+1}",
                content=tweet,
                character_count=len(tweet),
                ready_to_post=len(tweet) <= 280
            )
            for i, tweet in enumerate(tweets)
        ]

    def _generate_linkedin_posts(self, data: Dict) -> List[GeneratedContent]:
        """Generate 3 LinkedIn post variations."""

        analysis = data.get("analysis", {})

        variations = [
            ("story", "Tell this as a personal story/lesson learned"),
            ("listicle", "Present as a numbered list of insights"),
            ("hot_take", "Frame as a bold/contrarian perspective")
        ]

        results = []

        for var_type, instruction in variations:
            prompt = f"""Create a LinkedIn post from this content.

CONTENT:
- Topic: {analysis.get('MAIN_TOPIC', '')}
- Key Points: {analysis.get('KEY_POINTS', [])}
- Stories: {analysis.get('STORY_ELEMENTS', [])}

STYLE: {instruction}

Requirements:
1. Strong first line (this appears before "see more")
2. 1,200-1,500 characters total
3. Use line breaks every 1-2 sentences
4. Include a question to drive comments
5. Add 3-5 relevant hashtags at the end
6. Professional but personable tone
7. No emojis in first line, max 2-3 total"""

            response = self.client.chat.completions.create(
                model="gpt-4-turbo-preview",
                messages=[
                    {"role": "system", "content": "You are a LinkedIn content expert."},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.8,
                max_tokens=1000
            )

            content = response.choices[0].message.content

            results.append(GeneratedContent(
                platform="linkedin",
                format_type=f"post_{var_type}",
                content=content,
                character_count=len(content),
                ready_to_post=len(content) <= 3000
            ))

        return results

    def _generate_instagram_caption(self, data: Dict) -> GeneratedContent:
        """Generate Instagram caption."""

        analysis = data.get("analysis", {})

        prompt = f"""Create an Instagram caption from this content.

CONTENT:
- Topic: {analysis.get('MAIN_TOPIC', '')}
- Key Points: {analysis.get('KEY_POINTS', [])}
- Hook: {analysis.get('HOOK', '')}

Requirements:
1. Hook in first line (shows before "more")
2. 150-300 words total
3. Mix of value and personality
4. Call-to-action (save, share, comment)
5. 20-30 relevant hashtags in a separate block
6. Use emojis naturally (5-10 total)
7. Instagram's casual, friendly tone"""

        response = self.client.chat.completions.create(
            model="gpt-4-turbo-preview",
            messages=[
                {"role": "system", "content": "You are an Instagram content creator."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.8,
            max_tokens=800
        )

        content = response.choices[0].message.content

        return GeneratedContent(
            platform="instagram",
            format_type="caption",
            content=content,
            character_count=len(content),
            ready_to_post=len(content) <= 2200
        )

    def _generate_newsletter(self, data: Dict) -> GeneratedContent:
        """Generate email newsletter section."""

        analysis = data.get("analysis", {})
        transcript = data.get("transcript", "")

        prompt = f"""Create an email newsletter section from this content.

CONTENT:
- Topic: {analysis.get('MAIN_TOPIC', '')}
- Key Points: {analysis.get('KEY_POINTS', [])}
- Actionable Tips: {analysis.get('ACTIONABLE_TIPS', [])}

Requirements:
1. Compelling subject line options (3 variations)
2. Preview text (40-90 characters)
3. Intro that creates curiosity
4. 3-5 key takeaways with brief explanations
5. "One thing to try this week" section
6. Casual, friendly tone (like writing to a friend)
7. 400-600 words total"""

        response = self.client.chat.completions.create(
            model="gpt-4-turbo-preview",
            messages=[
                {"role": "system", "content": "You are an email marketing expert."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.7,
            max_tokens=1200
        )

        content = response.choices[0].message.content

        return GeneratedContent(
            platform="email",
            format_type="newsletter_section",
            content=content,
            character_count=len(content),
            ready_to_post=True
        )

    def _generate_tiktok_script(self, data: Dict) -> GeneratedContent:
        """Generate TikTok script for repurposing."""

        analysis = data.get("analysis", {})

        prompt = f"""Create a TikTok script based on this content.

CONTENT:
- Topic: {analysis.get('MAIN_TOPIC', '')}
- Key Points: {analysis.get('KEY_POINTS', [])}
- Hook: {analysis.get('HOOK', '')}

Requirements:
1. 30-60 second script
2. Hook in first 3 seconds
3. Fast-paced delivery
4. One clear takeaway
5. Call-to-action at end
6. Include visual/editing suggestions in [brackets]
7. Conversational, energetic tone

Format:
HOOK (0-3 sec): [script]
BODY (3-50 sec): [script with visual notes]
CTA (50-60 sec): [script]"""

        response = self.client.chat.completions.create(
            model="gpt-4-turbo-preview",
            messages=[
                {"role": "system", "content": "You are a TikTok content creator expert."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.8,
            max_tokens=800
        )

        content = response.choices[0].message.content

        return GeneratedContent(
            platform="tiktok",
            format_type="script",
            content=content,
            character_count=len(content),
            ready_to_post=True
        )

Step 5: Main Runner

Putting it all together:

#!/usr/bin/env python3
# main.py

import json
import os
from datetime import datetime
from transcript import TranscriptExtractor
from analyzer import ContentAnalyzer
from generator import ContentGenerator, GeneratedContent
from typing import List

def repurpose_video(url: str, output_dir: str = "output") -> List[GeneratedContent]:
    """Main function to repurpose a video into multiple content pieces."""

    print(f"\n{'='*60}")
    print(f"Content Repurposer")
    print(f"{'='*60}")
    print(f"\nProcessing: {url}")

    # Step 1: Extract transcript
    print("\n📝 Extracting transcript...")
    extractor = TranscriptExtractor()
    video_data = extractor.extract(url)
    print(f"   ✓ Platform: {video_data['platform']}")
    print(f"   ✓ Title: {video_data['title'][:50]}...")
    print(f"   ✓ Transcript length: {len(video_data['transcript'])} chars")

    # Step 2: Analyze content
    print("\n🔍 Analyzing content...")
    analyzer = ContentAnalyzer()
    analyzed_data = analyzer.analyze(video_data)
    analysis = analyzed_data.get('analysis', {})
    print(f"   ✓ Topic: {analysis.get('MAIN_TOPIC', 'Unknown')[:50]}...")
    print(f"   ✓ Key points: {len(analysis.get('KEY_POINTS', []))} found")

    # Step 3: Generate content
    print("\n✨ Generating content for all platforms...")
    generator = ContentGenerator()
    results = generator.generate_all(analyzed_data)

    # Step 4: Save outputs
    print("\n💾 Saving outputs...")
    os.makedirs(output_dir, exist_ok=True)

    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

    for content in results:
        filename = f"{content.platform}_{content.format_type}_{timestamp}.txt"
        filepath = os.path.join(output_dir, filename)

        with open(filepath, "w", encoding="utf-8") as f:
            f.write(f"Platform: {content.platform}\n")
            f.write(f"Format: {content.format_type}\n")
            f.write(f"Characters: {content.character_count}\n")
            f.write(f"Ready to post: {content.ready_to_post}\n")
            f.write(f"\n{'='*40}\n\n")
            f.write(content.content)

        status = "✓" if content.ready_to_post else "⚠"
        print(f"   {status} {content.platform}/{content.format_type}: {content.character_count} chars")

    # Save summary
    summary = {
        "source_url": url,
        "source_platform": video_data["platform"],
        "source_title": video_data["title"],
        "processed_at": timestamp,
        "outputs": [
            {
                "platform": c.platform,
                "format": c.format_type,
                "characters": c.character_count,
                "ready": c.ready_to_post
            }
            for c in results
        ]
    }

    with open(os.path.join(output_dir, f"summary_{timestamp}.json"), "w") as f:
        json.dump(summary, f, indent=2)

    print(f"\n{'='*60}")
    print(f"Done! Generated {len(results)} content pieces")
    print(f"Output saved to: {output_dir}/")
    print(f"{'='*60}\n")

    return results


if __name__ == "__main__":
    import sys

    if len(sys.argv) < 2:
        print("Usage: python main.py <video_url>")
        print("\nSupported platforms:")
        print("  - YouTube (youtube.com, youtu.be)")
        print("  - TikTok (tiktok.com)")
        print("  - Instagram Reels (instagram.com)")
        sys.exit(1)

    url = sys.argv[1]
    repurpose_video(url)

Usage

# Repurpose a YouTube video
python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# Repurpose a TikTok
python main.py "https://www.tiktok.com/@user/video/123456789"

# Repurpose an Instagram Reel
python main.py "https://www.instagram.com/reel/ABC123/"

Output structure:

output/
├── blog_long_form_article_20260201_143022.txt
├── twitter_thread_tweet_1_20260201_143022.txt
├── twitter_thread_tweet_2_20260201_143022.txt
├── ...
├── linkedin_post_story_20260201_143022.txt
├── linkedin_post_listicle_20260201_143022.txt
├── linkedin_post_hot_take_20260201_143022.txt
├── instagram_caption_20260201_143022.txt
├── email_newsletter_section_20260201_143022.txt
├── tiktok_script_20260201_143022.txt
└── summary_20260201_143022.json

Cost Breakdown

Let's be transparent about costs:

Per video repurposed:

Service	Usage	Cost
SociaVault transcript API	1 request	~$0.01
GPT-4 Turbo (analysis)	~2K tokens	~$0.02
GPT-4 Turbo (generation)	~10K tokens	~$0.10
Total		~$0.13

Monthly costs (if you repurpose 100 videos):

API costs: ~$13
Time saved: 50+ hours

Compare that to hiring a content writer or doing it manually.

Advanced: Batch Processing

For repurposing multiple videos:

# batch.py
import csv
from main import repurpose_video
import time

def batch_repurpose(csv_file: str):
    """Repurpose multiple videos from a CSV file."""

    with open(csv_file, "r") as f:
        reader = csv.DictReader(f)

        for i, row in enumerate(reader):
            url = row.get("url")

            if not url:
                continue

            print(f"\n[{i+1}] Processing: {url}")

            try:
                repurpose_video(url, output_dir=f"output/batch_{i+1}")
            except Exception as e:
                print(f"Error processing {url}: {e}")

            # Rate limiting
            time.sleep(5)

if __name__ == "__main__":
    batch_repurpose("videos.csv")

Quality Tips

The AI output is good, but here's how to make it great:

Always review before posting - AI can miss nuance
Add your personal touch - Include specific experiences
Check platform-specific rules - Character limits, hashtag best practices
Maintain your voice - Edit to match your style
Test and iterate - See what performs, adjust prompts

What's Next?

You could extend this to:

Auto-post to platforms via their APIs
A/B test different variations
Track performance and feed back to improve prompts
Add image generation for social posts
Queue content in a scheduler

Want the transcript extraction without building it yourself?

SociaVault provides transcript APIs for YouTube, TikTok, and Instagram. Pay-as-you-go, no minimums.

Related tutorials:

DEV Community