Pull audio from video without touching a terminal
You have video interviews to transcribe. Or podcast episodes recorded as video. Or a music library trapped in MP4 files. You need the audio track extracted.
FFmpeg does this in one command. n8n can trigger that command automatically whenever a new video appears. No manual steps. No terminal. No server.
The problem: n8n can't extract audio natively
n8n doesn't have an audio extraction node. The cloud version doesn't allow shell commands. Even self-hosted, running FFmpeg inside n8n blocks the worker and risks crashes on large files.
The solution: send the extraction command to RenderIO's API via n8n's HTTP Request node. RenderIO runs FFmpeg in an isolated container. Your n8n instance stays responsive.
Use the RenderIO n8n node
RenderIO has a partner-verified community node on the n8n marketplace. Install from Settings → Community Nodes → search "renderio". It provides a visual interface for FFmpeg commands, including audio extraction.
The node handles authentication and request formatting automatically. The extraction examples below use HTTP Request nodes for full flexibility, but the same FFmpeg commands work with the native node.
Basic extraction: MP4 to MP3
The simplest workflow: video URL in, MP3 URL out.
HTTP Request node configuration:
- Method: POST
- URL:
https://renderio.dev/api/v1/run-ffmpeg-command - Authentication: Header Auth (X-API-KEY)
- Body:
{
"ffmpeg_command": "-i {{in_video}} -vn -acodec libmp3lame -q:a 2 {{out_audio}}",
"input_files": {
"in_video": "{{ $json.videoUrl }}"
},
"output_files": {
"out_audio": "extracted.mp3"
}
}
-vn disables video. -q:a 2 sets MP3 quality (0=best, 9=worst, 2 is high quality at ~190kbps).
Poll for completion, then use the output URL.
Extraction formats
MP3 (most compatible)
{
"ffmpeg_command": "-i {{in_video}} -vn -acodec libmp3lame -q:a 2 {{out_audio}}",
"input_files": { "in_video": "{{ $json.videoUrl }}" },
"output_files": { "out_audio": "audio.mp3" }
}
Best for: sharing, podcast distribution, general use.
WAV (lossless)
{
"ffmpeg_command": "-i {{in_video}} -vn -acodec pcm_s16le -ar 44100 {{out_audio}}",
"input_files": { "in_video": "{{ $json.videoUrl }}" },
"output_files": { "out_audio": "audio.wav" }
}
Best for: transcription services (they often prefer WAV), audio editing, archival.
AAC (Apple/streaming)
{
"ffmpeg_command": "-i {{in_video}} -vn -acodec aac -b:a 192k {{out_audio}}",
"input_files": { "in_video": "{{ $json.videoUrl }}" },
"output_files": { "out_audio": "audio.m4a" }
}
Best for: Apple devices, streaming platforms, smaller files than MP3 at same quality.
FLAC (lossless compressed)
{
"ffmpeg_command": "-i {{in_video}} -vn -acodec flac {{out_audio}}",
"input_files": { "in_video": "{{ $json.videoUrl }}" },
"output_files": { "out_audio": "audio.flac" }
}
Best for: archival when you want lossless but smaller than WAV (typically 50-60% of WAV size).
OGG/Opus (web)
{
"ffmpeg_command": "-i {{in_video}} -vn -acodec libopus -b:a 128k {{out_audio}}",
"input_files": { "in_video": "{{ $json.videoUrl }}" },
"output_files": { "out_audio": "audio.ogg" }
}
Best for: web applications, voice recordings, VoIP.
Complete workflow: Extract and transcribe
Combine audio extraction with a transcription service:
Google Drive Trigger (new video)
→ HTTP Request: Extract audio (RenderIO)
→ Wait + Poll
→ HTTP Request: Download audio
→ HTTP Request: Send to Whisper API / AssemblyAI / Deepgram
→ Google Sheets: Write transcript
→ Slack: Notify team
Node 1: Google Drive Trigger
Watches a "Videos" folder for new uploads.
Node 2: Extract audio (HTTP Request)
{
"ffmpeg_command": "-i {{in_video}} -vn -acodec pcm_s16le -ar 16000 -ac 1 {{out_audio}}",
"input_files": { "in_video": "{{ $json.downloadUrl }}" },
"output_files": { "out_audio": "for_transcription.wav" }
}
Note: -ar 16000 -ac 1 converts to 16kHz mono. This is the format most transcription APIs prefer. Smaller files, faster uploads, same transcription quality.
Node 3-5: Poll and get result
Standard polling loop.
Node 6: Send to transcription
{
"method": "POST",
"url": "https://api.openai.com/v1/audio/transcriptions",
"headers": { "Authorization": "Bearer {{ $credentials.openAiApi.apiKey }}" },
"body": {
"model": "whisper-1",
"file": "{{ $json.output_files.out_audio.storage_url }}"
}
}
Batch extraction from a video library
Process an entire folder of videos:
Step 1: Get video list
Use a Code node or fetch from a spreadsheet:
const videos = [
{ url: "https://example.com/interview1.mp4", name: "interview1" },
{ url: "https://example.com/interview2.mp4", name: "interview2" },
{ url: "https://example.com/interview3.mp4", name: "interview3" },
];
return videos.map(v => ({ json: v }));
Step 2: Split in Batches (size: 5)
Step 3: Submit extraction for each
{
"ffmpeg_command": "-i {{in_video}} -vn -acodec libmp3lame -q:a 2 {{out_audio}}",
"input_files": { "in_video": "{{ $json.url }}" },
"output_files": { "out_audio": "{{ $json.name }}.mp3" }
}
Step 4: Poll and collect URLs
Step 5: Write results to spreadsheet
| Video | Audio URL | Status |
|---|---|---|
| interview1 | https://media.renderio.dev/interview1.mp3 | extracted |
| interview2 | https://media.renderio.dev/interview2.mp3 | extracted |
Audio processing after extraction
Once you have the audio, you can process it further:
Normalize volume:
-i {{in_audio}} -af loudnorm=I=-16:TP=-1.5:LRA=11 {{out_audio}}
Trim silence from start/end:
-i {{in_audio}} -af silenceremove=start_periods=1:start_silence=0.5:start_threshold=-50dB,areverse,silenceremove=start_periods=1:start_silence=0.5:start_threshold=-50dB,areverse {{out_audio}}
Convert sample rate:
-i {{in_audio}} -ar 44100 {{out_audio}}
Chain these into your workflow as additional processing steps after extraction.
Error handling
Common extraction failures:
No audio track: Some screen recordings or animations have no audio. FFmpeg returns an error. Handle with an IF node that checks the error message for "does not contain any stream."
Corrupted audio: Add -err_detect ignore_err before -i to attempt extraction despite minor corruption.
Very long videos: Extraction is fast (typically 10-30 seconds regardless of video length) because it only copies/transcodes the audio stream, not the video.
Get started
The Starter plan at $9/mo includes 500 commands -- enough to set up and test your audio extraction workflow.
Top comments (0)