RenderIO

Posted on Apr 12 • Originally published at renderio.dev

Extract Audio from Video in n8n

#ffmpeg #api #video #webdev

Pull audio from video without touching a terminal

You have video interviews to transcribe. Or podcast episodes recorded as video. Or a music library trapped in MP4 files. You need the audio track extracted.

FFmpeg does this in one command. n8n can trigger that command automatically whenever a new video appears. No manual steps. No terminal. No server.

The problem: n8n can't extract audio natively

n8n doesn't have an audio extraction node. The cloud version doesn't allow shell commands. Even self-hosted, running FFmpeg inside n8n blocks the worker and risks crashes on large files.

The solution: send the extraction command to RenderIO's API via n8n's HTTP Request node. RenderIO runs FFmpeg in an isolated container. Your n8n instance stays responsive.

Use the RenderIO n8n node

RenderIO has a partner-verified community node on the n8n marketplace. Install from Settings → Community Nodes → search "renderio". It provides a visual interface for FFmpeg commands, including audio extraction.

The node handles authentication and request formatting automatically. The extraction examples below use HTTP Request nodes for full flexibility, but the same FFmpeg commands work with the native node.

Basic extraction: MP4 to MP3

The simplest workflow: video URL in, MP3 URL out.

HTTP Request node configuration:

Method: POST
URL: https://renderio.dev/api/v1/run-ffmpeg-command
Authentication: Header Auth (X-API-KEY)
Body:

{
  "ffmpeg_command": "-i {{in_video}} -vn -acodec libmp3lame -q:a 2 {{out_audio}}",
  "input_files": {
    "in_video": "{{ $json.videoUrl }}"
  },
  "output_files": {
    "out_audio": "extracted.mp3"
  }
}

-vn disables video. -q:a 2 sets MP3 quality (0=best, 9=worst, 2 is high quality at ~190kbps).

Poll for completion, then use the output URL.

Extraction formats

MP3 (most compatible)

{
  "ffmpeg_command": "-i {{in_video}} -vn -acodec libmp3lame -q:a 2 {{out_audio}}",
  "input_files": { "in_video": "{{ $json.videoUrl }}" },
  "output_files": { "out_audio": "audio.mp3" }
}

Best for: sharing, podcast distribution, general use.

WAV (lossless)

{
  "ffmpeg_command": "-i {{in_video}} -vn -acodec pcm_s16le -ar 44100 {{out_audio}}",
  "input_files": { "in_video": "{{ $json.videoUrl }}" },
  "output_files": { "out_audio": "audio.wav" }
}

Best for: transcription services (they often prefer WAV), audio editing, archival.

AAC (Apple/streaming)

{
  "ffmpeg_command": "-i {{in_video}} -vn -acodec aac -b:a 192k {{out_audio}}",
  "input_files": { "in_video": "{{ $json.videoUrl }}" },
  "output_files": { "out_audio": "audio.m4a" }
}

Best for: Apple devices, streaming platforms, smaller files than MP3 at same quality.

FLAC (lossless compressed)

{
  "ffmpeg_command": "-i {{in_video}} -vn -acodec flac {{out_audio}}",
  "input_files": { "in_video": "{{ $json.videoUrl }}" },
  "output_files": { "out_audio": "audio.flac" }
}

Best for: archival when you want lossless but smaller than WAV (typically 50-60% of WAV size).

OGG/Opus (web)

{
  "ffmpeg_command": "-i {{in_video}} -vn -acodec libopus -b:a 128k {{out_audio}}",
  "input_files": { "in_video": "{{ $json.videoUrl }}" },
  "output_files": { "out_audio": "audio.ogg" }
}

Best for: web applications, voice recordings, VoIP.

Complete workflow: Extract and transcribe

Combine audio extraction with a transcription service:

Google Drive Trigger (new video)
  → HTTP Request: Extract audio (RenderIO)
  → Wait + Poll
  → HTTP Request: Download audio
  → HTTP Request: Send to Whisper API / AssemblyAI / Deepgram
  → Google Sheets: Write transcript
  → Slack: Notify team

Node 1: Google Drive Trigger
Watches a "Videos" folder for new uploads.

Node 2: Extract audio (HTTP Request)

{
  "ffmpeg_command": "-i {{in_video}} -vn -acodec pcm_s16le -ar 16000 -ac 1 {{out_audio}}",
  "input_files": { "in_video": "{{ $json.downloadUrl }}" },
  "output_files": { "out_audio": "for_transcription.wav" }
}

Note: -ar 16000 -ac 1 converts to 16kHz mono. This is the format most transcription APIs prefer. Smaller files, faster uploads, same transcription quality.

Node 3-5: Poll and get result

Standard polling loop.

Node 6: Send to transcription

{
  "method": "POST",
  "url": "https://api.openai.com/v1/audio/transcriptions",
  "headers": { "Authorization": "Bearer {{ $credentials.openAiApi.apiKey }}" },
  "body": {
    "model": "whisper-1",
    "file": "{{ $json.output_files.out_audio.storage_url }}"
  }
}

Batch extraction from a video library

Process an entire folder of videos:

Step 1: Get video list

Use a Code node or fetch from a spreadsheet:

const videos = [
  { url: "https://example.com/interview1.mp4", name: "interview1" },
  { url: "https://example.com/interview2.mp4", name: "interview2" },
  { url: "https://example.com/interview3.mp4", name: "interview3" },
];

return videos.map(v => ({ json: v }));

Step 2: Split in Batches (size: 5)

Step 3: Submit extraction for each

{
  "ffmpeg_command": "-i {{in_video}} -vn -acodec libmp3lame -q:a 2 {{out_audio}}",
  "input_files": { "in_video": "{{ $json.url }}" },
  "output_files": { "out_audio": "{{ $json.name }}.mp3" }
}

Step 4: Poll and collect URLs

Step 5: Write results to spreadsheet

Video	Audio URL	Status
interview1	https://media.renderio.dev/interview1.mp3	extracted
interview2	https://media.renderio.dev/interview2.mp3	extracted

Audio processing after extraction

Once you have the audio, you can process it further:

Normalize volume:

-i {{in_audio}} -af loudnorm=I=-16:TP=-1.5:LRA=11 {{out_audio}}

Trim silence from start/end:

-i {{in_audio}} -af silenceremove=start_periods=1:start_silence=0.5:start_threshold=-50dB,areverse,silenceremove=start_periods=1:start_silence=0.5:start_threshold=-50dB,areverse {{out_audio}}

Convert sample rate:

-i {{in_audio}} -ar 44100 {{out_audio}}

Chain these into your workflow as additional processing steps after extraction.

Error handling

Common extraction failures:

No audio track: Some screen recordings or animations have no audio. FFmpeg returns an error. Handle with an IF node that checks the error message for "does not contain any stream."

Corrupted audio: Add -err_detect ignore_err before -i to attempt extraction despite minor corruption.

Very long videos: Extraction is fast (typically 10-30 seconds regardless of video length) because it only copies/transcodes the audio stream, not the video.

Get started

The Starter plan at $9/mo includes 500 commands -- enough to set up and test your audio extraction workflow.

DEV Community