DEV Community

Trần Quang Đạt
Trần Quang Đạt

Posted on

Building a YouTube-to-Podcast Pipeline with yt-dlp, ffmpeg, and Backblaze B2

YouTube has an enormous amount of great audio content — earnings calls, university lectures, audiobooks, speeches — but none of it is available as a podcast. You can't subscribe to a YouTube channel in Apple Podcasts. There's no RSS feed. If you miss a video, you miss it.

I wanted to fix that. So I built Castify, a system that turns YouTube channels and playlists into real podcast RSS feeds. You add a URL, and it handles the rest: scanning for new videos, extracting audio, uploading to cloud storage, and generating an RSS feed that works in any podcast app.

Here's how the pipeline works.

Architecture

The system has three components:

┌─────────────────┐       ┌──────────────┐       ┌──────────────┐
│  Desktop App    │──API──▶│  Go Server   │──RSS──▶│  Podcast App │
│  (Tauri + Rust) │       │  (chi + GORM)│       │  (any client) │
│                 │──PUT──▶│              │       └──────────────┘
│  yt-dlp, ffmpeg │       │  MySQL, B2   │
└─────────────────┘       └──────────────┘
Enter fullscreen mode Exit fullscreen mode
  • Desktop app (Tauri v2 + Rust): runs yt-dlp and ffmpeg as sidecar binaries, handles the scan → download → upload pipeline locally.
  • Go server (chi router, GORM, MySQL): REST API for feed/episode management, generates RSS XML, serves audio via Backblaze B2 presigned URLs.
  • Backblaze B2: cheap object storage for the audio files (~$0.005/GB/month).

The key insight: audio extraction happens on the user's machine, not the server. This keeps server costs near zero and avoids running yt-dlp in the cloud.

Step 1: Scanning for New Videos

When a user adds a YouTube channel, the app calls yt-dlp in --flat-playlist mode to quickly fetch video metadata without downloading anything:

let mut args = vec![
    "--ignore-errors",
    "--flat-playlist",
    "--dump-json",
    "--playlist-end", "30",
];
args.push(channel_url);

let output = Command::new("yt-dlp").args(&args).output().await?;
Enter fullscreen mode Exit fullscreen mode

Each line of stdout is a JSON object with the video ID, title, upload date, duration, and view count. We parse these into playlist entries and diff against existing episodes in the database to find new ones.

For YouTube specifically, yt-dlp returns results newest-first. We also filter out Shorts, live streams, and non-public videos:

entries.retain(|entry| {
    !entry.url.contains("/shorts/")
        && entry.live_status != Some("is_live")
        && entry.availability.map(|a| a == "public").unwrap_or(true)
});
Enter fullscreen mode Exit fullscreen mode

New entries get registered as episodes on the server (status: pending) and pushed into a priority download queue.

Step 2: Downloading and Extracting Audio

Download workers pull jobs from a priority channel (urgent → high → normal) with bounded concurrency:

let max_concurrent = num_cpus::get().clamp(2, 4);
let semaphore = Arc::new(Semaphore::new(max_concurrent));

loop {
    let job = tokio::select! {
        biased;
        Some(job) = urgent_rx.recv() => job,
        Some(job) = high_rx.recv() => job,
        Some(job) = normal_rx.recv() => job,
    };

    let permit = semaphore.acquire_owned().await?;
    tokio::spawn(async move {
        let _permit = permit;
        process_download(&state, job).await;
    });
}
Enter fullscreen mode Exit fullscreen mode

The actual download uses yt-dlp to extract audio with ffmpeg post-processing. yt-dlp downloads the best available audio stream and converts it to M4A:

yt-dlp --no-playlist --retries 3 \
       -x --audio-format m4a --audio-quality 0 \
       -o "%(id)s.tmp.%(ext)s" \
       "https://www.youtube.com/watch?v=VIDEO_ID"
Enter fullscreen mode Exit fullscreen mode

Then we re-encode with ffmpeg to normalize the output — mono, 48kbps AAC. This cuts file sizes dramatically (~24MB per hour of audio) while keeping quality good enough for speech content:

let ffmpeg_args = vec![
    "-i", &input_path,
    "-vn",              // drop video stream
    "-map", "0:a:0",    // first audio stream only
    "-ac", "1",         // mono
    "-b:a", "48k",      // 48kbps bitrate
    "-f", "mov",
    "-c:a", "aac",      // AAC codec
    "-y",               // overwrite
    &output_path,
];
Enter fullscreen mode Exit fullscreen mode

Handling YouTube's Anti-Bot Measures

YouTube has gotten aggressive about blocking automated downloads. Our pipeline tries multiple strategies:

  1. Default client — plain yt-dlp, works for most public videos
  2. Web client--extractor-args youtube:player_client=web
  3. Android client--extractor-args youtube:player_client=android
  4. Cookie fallback--cookies-from-browser chrome for auth-gated content

If one fails, the next one is tried. We also bundle Deno as a sidecar because yt-dlp uses it to solve YouTube's JavaScript challenges.

Step 3: Uploading to Backblaze B2

Once the audio file is ready, the upload worker requests a presigned upload URL from the server, then uploads directly to B2:

pub async fn upload_to_b2(
    file_path: &Path,
    upload_url: &str,
    auth_token: &str,
    file_name: &str,
) -> Result<(), AppError> {
    let data = tokio::fs::read(file_path).await?;
    let sha1_hex = hex_sha1(&data);

    reqwest::Client::new()
        .post(upload_url)
        .header("Authorization", auth_token)
        .header("X-Bz-File-Name", file_name)
        .header("Content-Type", "audio/mp4")
        .header("Content-Length", data.len())
        .header("X-Bz-Content-Sha1", &sha1_hex)
        .body(data)
        .send()
        .await?;
    Ok(())
}
Enter fullscreen mode Exit fullscreen mode

B2 can return 503s under load, so we retry with exponential backoff (up to 6 attempts), re-fetching the upload URL before each retry as B2 recommends.

After a successful upload, the episode status is marked as ready on the server.

Step 4: RSS Feed Generation

The Go server generates iTunes-compatible RSS XML on the fly when a podcast app requests it:

func Build(db *gorm.DB, feedSlug string, baseURL string) (string, error) {
    // Fetch feed metadata
    var feed Feed
    db.Where("feed_slug = ?", feedSlug).First(&feed)

    // Fetch ready episodes
    var episodes []Episode
    db.Where("feed_id = ? AND status = ?", feed.ID, "ready").
        Order("pub_date DESC").Find(&episodes)

    // Build RSS XML with iTunes extensions
    xml := `<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
     xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
     xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>` + feed.Name + `</title>
    <itunes:author>` + author + `</itunes:author>
    <itunes:type>` + itunesType + `</itunes:type>`

    for _, ep := range episodes {
        audioURL := fmt.Sprintf("%s/audio/%s/%s.m4a",
            baseURL, feedSlug, ep.ID)
        xml += `
    <item>
      <title>` + ep.Title + `</title>
      <enclosure url="` + audioURL + `"
                 type="audio/mp4"/>
      <itunes:duration>` + duration + `</itunes:duration>
    </item>`
    }
    return xml, nil
}
Enter fullscreen mode Exit fullscreen mode

The RSS feed is served at /rss/{slug}.xml — a standard URL that any podcast app can subscribe to. Apple Podcasts, Pocket Casts, Overcast, they all just work.

What We Built With It

Using this pipeline, we've created podcast feeds for content that previously had no RSS:

Lessons Learned

Bundle your dependencies. We ship yt-dlp, ffmpeg, and Deno as Tauri sidecar binaries with platform-specific naming (yt-dlp-aarch64-apple-darwin). Never assume users have these installed.

Audio extraction is surprisingly cheap. Mono 48kbps AAC is ~6MB per hour. A feed with 30 one-hour episodes costs about $0.001/month to store on B2. Even at 500 feeds, total storage costs are negligible.

yt-dlp breaks regularly. YouTube changes their player code frequently. The EJS challenge solver, Deno runtime integration, and multiple client fallbacks are all workarounds we've had to add over time. Budget for maintenance.

Priority queues matter. When a user adds a new feed, those downloads should jump ahead of background sync jobs. We use tokio::select! with biased; to always drain the urgent channel first.

Stack Summary

Component Tech
Desktop app Tauri v2, Rust, React 19
Server Go 1.22, chi, GORM, MySQL
Audio extraction yt-dlp + ffmpeg (sidecar binaries)
Object storage Backblaze B2
RSS iTunes-compatible XML, generated on-the-fly

If you're interested in subscribing to earnings calls or audiobooks as podcasts, check out the free podcast catalog. And if you want to turn your own YouTube channels into podcast feeds, the desktop app is available for Mac, Windows, and Linux.


Have questions about the pipeline? Drop a comment below.

Top comments (0)