RenderIO

Posted on Apr 8 • Originally published at renderio.dev

Extract Audio from Video in Zapier

#ffmpeg #video #webdev #api

Video in. Audio out. Automatically.

You record a video interview. You need the audio as an MP3 for your podcast feed. You film a webinar. You need the audio for transcription. You get UGC video. You need to check the audio quality without downloading the whole file.

Each of these requires extracting audio from video. It's a 30-second FFmpeg operation that Zapier can't do natively.

With RenderIO, it becomes an automated step in any Zap. Video goes in, MP3 comes out.

The basic Zap

Step 1: Trigger

App: Google Drive
Event: New File in Folder
Folder: "Videos for Audio Extraction"

Alternative triggers:

Dropbox: New file
Email: New attachment
Webhook: Custom trigger
Typeform: New file upload

Step 2: Extract audio

App: Webhooks by Zapier
Event: POST
URL: https://renderio.dev/api/v1/run-ffmpeg-command

Headers:

X-API-KEY: your_api_key
Content-Type: application/json

Body:

{
  "ffmpeg_command": "-i {{in_video}} -vn -c:a libmp3lame -b:a 192k {{out_audio}}",
  "input_files": {
    "in_video": "{{step1_file_url}}"
  },
  "output_files": {
    "out_audio": "extracted-audio.mp3"
  }
}

The flags:

-vn: No video (strip the video stream entirely)
-c:a libmp3lame: Encode audio as MP3
-b:a 192k: 192 kbps bitrate (good quality for speech)

Step 3: Wait

App: Delay by Zapier
Duration: 20 seconds

Audio extraction is fast because FFmpeg doesn't need to decode/encode video. Even a 30-minute video extracts in under 10 seconds.

Step 4: Check status

App: Webhooks by Zapier
Event: GET
URL: https://renderio.dev/api/v1/commands/{{step2_command_id}}
Headers: X-API-KEY: your_api_key

Step 5: Save the MP3

App: Google Drive
Event: Upload File
File URL: {{step4_output_url}}
Folder: "Extracted Audio"
Filename: {{step1_filename}}.mp3

Audio format options

MP3 (most compatible)

{
  "ffmpeg_command": "-i {{in_video}} -vn -c:a libmp3lame -b:a 192k {{out_audio}}",
  "input_files": { "in_video": "{{file_url}}" },
  "output_files": { "out_audio": "audio.mp3" }
}

Best for: Podcast distribution, general sharing, email attachments.

WAV (lossless)

{
  "ffmpeg_command": "-i {{in_video}} -vn -c:a pcm_s16le {{out_audio}}",
  "input_files": { "in_video": "{{file_url}}" },
  "output_files": { "out_audio": "audio.wav" }
}

Best for: Audio editing, music production, when you need maximum quality.

AAC (smaller than MP3)

{
  "ffmpeg_command": "-i {{in_video}} -vn -c:a aac -b:a 128k {{out_audio}}",
  "input_files": { "in_video": "{{file_url}}" },
  "output_files": { "out_audio": "audio.m4a" }
}

Best for: Apple ecosystem, when file size matters.

FLAC (lossless, compressed)

{
  "ffmpeg_command": "-i {{in_video}} -vn -c:a flac {{out_audio}}",
  "input_files": { "in_video": "{{file_url}}" },
  "output_files": { "out_audio": "audio.flac" }
}

Best for: Archival, when you want lossless but smaller than WAV.

Audio processing options

Normalize volume

Ensure consistent loudness across extracted audio:

{
  "ffmpeg_command": "-i {{in_video}} -vn -af \"loudnorm=I=-16:TP=-2:LRA=11\" -c:a libmp3lame -b:a 192k {{out_audio}}",
  "input_files": { "in_video": "{{file_url}}" },
  "output_files": { "out_audio": "normalized.mp3" }
}

-16 LUFS is the podcast standard. This ensures your extracted audio plays at a consistent level regardless of the original recording volume.

Remove background noise

Basic noise reduction with FFmpeg:

{
  "ffmpeg_command": "-i {{in_video}} -vn -af \"highpass=f=80,lowpass=f=12000,afftdn=nf=-20\" -c:a libmp3lame -b:a 192k {{out_audio}}",
  "input_files": { "in_video": "{{file_url}}" },
  "output_files": { "out_audio": "clean.mp3" }
}

This applies:

High-pass filter at 80Hz (removes low rumble)
Low-pass filter at 12kHz (removes high-frequency hiss)
FFT-based noise reduction (reduces ambient noise)

Extract specific time range

Extract audio from a specific portion of the video:

{
  "ffmpeg_command": "-i {{in_video}} -ss 00:02:30 -t 00:10:00 -vn -c:a libmp3lame -b:a 192k {{out_audio}}",
  "input_files": { "in_video": "{{file_url}}" },
  "output_files": { "out_audio": "segment.mp3" }
}

-ss 00:02:30 starts at 2 minutes 30 seconds. -t 00:10:00 extracts 10 minutes.

Split into chapters

Extract multiple segments from one video:

First segment:

{
  "ffmpeg_command": "-i {{in_video}} -ss 0 -t 600 -vn -c:a libmp3lame -b:a 192k {{out_audio}}",
  "input_files": { "in_video": "{{file_url}}" },
  "output_files": { "out_audio": "chapter-1.mp3" }
}

Second segment:

{
  "ffmpeg_command": "-i {{in_video}} -ss 600 -t 600 -vn -c:a libmp3lame -b:a 192k {{out_audio}}",
  "input_files": { "in_video": "{{file_url}}" },
  "output_files": { "out_audio": "chapter-2.mp3" }
}

Use multiple Webhooks steps or a loop in Zapier to create all chapters.

Use case: Podcast repurposing

A common workflow for video podcasters:

Trigger: New video uploaded to Google Drive (after recording)
Extract full audio: MP3, 192kbps, normalized
Extract first 60 seconds: MP3, for social media teaser
Save full audio: Upload to podcast hosting (Buzzsprout, Anchor)
Save teaser: Upload to social media scheduler

{
  "ffmpeg_command": "-i {{in_video}} -vn -af \"loudnorm=I=-16:TP=-2:LRA=11\" -c:a libmp3lame -b:a 192k {{out_audio}}",
  "input_files": { "in_video": "{{file_url}}" },
  "output_files": { "out_audio": "full-episode.mp3" }
}

Teaser:

{
  "ffmpeg_command": "-i {{in_video}} -t 60 -vn -af \"loudnorm=I=-16:TP=-2:LRA=11,afade=t=out:st=55:d=5\" -c:a libmp3lame -b:a 192k {{out_audio}}",
  "input_files": { "in_video": "{{file_url}}" },
  "output_files": { "out_audio": "teaser.mp3" }
}

The teaser includes a 5-second fade-out at the 55-second mark.

Use case: Transcription prep

Before sending audio to a transcription service (Otter, Rev, Whisper):

{
  "ffmpeg_command": "-i {{in_video}} -vn -af \"highpass=f=80,lowpass=f=8000,loudnorm=I=-16\" -ar 16000 -ac 1 -c:a libmp3lame -b:a 64k {{out_audio}}",
  "input_files": { "in_video": "{{file_url}}" },
  "output_files": { "out_audio": "for-transcription.mp3" }
}

This optimizes for transcription:

Filters out non-speech frequencies (80Hz-8kHz)
Normalizes volume
Downsamples to 16kHz (sufficient for speech)
Mono channel (speech doesn't need stereo)
64kbps (small file size for upload)

The resulting file is 80-90% smaller than the original, which means faster uploads to transcription services and lower costs.

Use case: Audio quality check

Before reviewing hours of UGC video, check audio quality quickly:

{
  "ffmpeg_command": "-i {{in_video}} -t 30 -vn -c:a libmp3lame -b:a 128k {{out_audio}}",
  "input_files": { "in_video": "{{file_url}}" },
  "output_files": { "out_audio": "preview.mp3" }
}

Extract the first 30 seconds as a quick preview. Listen in Slack or email without downloading the full video.

Cost

Audio extraction is one of the lightest FFmpeg operations. Processing is nearly instant.

Volume	Monthly commands	Plan	Cost
10 videos/week	40	Starter	$9/mo
5 videos/day	500	Starter	$9/mo
20 videos/day	600	Growth	$29/mo

Video contains audio. FFmpeg extracts it. Zapier automates it. That's the whole story.

DEV Community