Donald Feury

Posted on May 9, 2020 • Edited on Jan 31, 2021 • Originally published at blog.feurious.com

Automatically trim silence from video with ffmpeg and python

#python #ffmpeg #video #productivity

Follow me on YouTube and Twitter for more video editing tricks with ffmpeg and a little python now!

I finally did it, I managed to figure out a little process to automatically remove the silent parts from a video.

Let me show ya'll the process and the two main scripts I use to accomplish this.

Process

Use ffmpeg's silencedetect filter to generate output of sections of the video's audio with silence
Pipe that output through a few programs to get the output in the format that I want
Save the output into a text file
Use that text file in a python script that sections out the parts of the video with audio, and save the new version with the silence removed

Now, with the process laid out, lets look at the scripts doing the heavy lifting.

Scripts

Here is the script for generating the silence timestamp data:



#!/usr/bin/env sh

IN=$1
THRESH=$2
DURATION=$3

ffmpeg -hide_banner -vn -i $IN -af "silencedetect=n=${THRESH}dB:d=${DURATION}" -f null - 2>&1 | grep "silence_end" | awk '{print $5 " " $8}' > silence.txt

I'm passing in three arguments to this script:

IN - the file path to the video I want to analyze
THRESH - the volume threshold the filter uses to determine what counts as silence
DURATION - the length of time in seconds the audio needs to stay below the threshold to count as a section of silence

That leaves us with the actual ffmpeg command:

ffmpeg -hide_banner -vn -i $IN -af "silencedetect=n=${THRESH}dB:d=${DURATION}" -f null - 2>&1 | grep "silence_end" | awk '{print $5 " " $8}' > silence.txt

-hide_banner - hides the initial dump of info ffmpeg shows when you run it
-vn - ignore the input file's video stream, we only need the audio and ignoring the video stream speeds up the process alot as it doesn't need to demux and decode the video stream.
-af "silencedetect=n=${THRESH}dB:d=${DURATION}" - detects the silence in the audio and displays the ouput in stdout, which I pipe to other programs

The output of silencedetect looks like this:

-f null - 2>&1 - do not write any streams out and ignore error messages. To keep the output as clean as possible
grep "silence_end" - we first pipe the output to grep, I only want the lines that have that part that says "silence_end"
awk '{print $5 " " $8}' > silence.txt - Lastly, we pipe that output to awk and print the fifth and eighth values to a text file

The final output looks like this:



86.7141 5.29422
108.398 5.57798
135.61 1.0805
165.077 1.06485
251.877 1.11594
283.377 5.21286
350.709 1.12472
362.749 1.24295
419.726 4.42077
467.997 5.4622
476.31 1.02338
546.918 1.35986

You might ask, why did I not grab the silence start timestamp? That is because those two numbers I grabbed were the ending timestamp and the duration. If I just subtract the duration from the ending timestamp, I get the starting timestamp!

So finally we get to the python script that processes the timestamps. The script makes use of a python library called moviepy, you should check it out!



#!/usr/bin/env python

import sys
import subprocess
import os
import shutil
from moviepy.editor import VideoFileClip, concatenate_videoclips

# Input file path
file_in = sys.argv[1]
# Output file path
file_out = sys.argv[2]
# Silence timestamps
silence_file = sys.argv[3]

# Ease in duration between cuts
try:
    ease = float(sys.argv[4])
except IndexError:
    ease = 0.0

minimum_duration = 1.0

def main():
    # number of clips generated
    count = 0
    # start of next clip
    last = 0

    in_handle = open(silence_file, "r", errors='replace')
    video = VideoFileClip(file_in)
    full_duration = video.duration
    clips = []
    while True:
        line = in_handle.readline()

        if not line:
            break

        end,duration = line.strip().split()

        to = float(end) - float(duration)

        start = float(last)
        clip_duration = float(to) - start
        # Clips less than one seconds don't seem to work
        print("Clip Duration: {} seconds".format(clip_duration))

        if clip_duration < minimum_duration:
            continue

        if full_duration - to < minimum_duration:
            continue

        if start > ease:
            start -= ease

        print("Clip {} (Start: {}, End: {})".format(count, start, to))
        clip = video.subclip(start, to)
        clips.append(clip)
        last = end
        count += 1

    if full_duration - float(last) > minimum_duration:
        print("Clip {} (Start: {}, End: {})".format(count, last, 'EOF'))
        clips.append(video.subclip(float(last)-ease))

    processed_video = concatenate_videoclips(clips)
    processed_video.write_videofile(
        file_out,
        fps=60,
        preset='ultrafast',
        codec='libx264'
    )

    in_handle.close()
    video.close()

main()

Here I pass in 3 required and 1 optional argument:

file_in - the input file to work on, should be the same as the one passed into the silence detection script
file_out - the file path to save the final version to
silence_file - the file path to the file generated by the silence detection
ease_in - a work in progress concept. I noticed the jumps between the clips is kinda sudden and too abrupt. So I want to add about half a second of padding to when the next clip is suppose to start to make it less abrupt.

You will see there is a minimum_duration, that is because I found in testing that moviepy will crash when trying to write out a clip that is less than a second. There are a few sanity checks using that to determine if a clip should be extracted yet or not. That part is very rough still though.

I track when the next clip to be written out should start in the last variable, to track when the last section of silence ended.

The logic for writing out clips works like so:

Get the starting timestamp of silence
Write out a clip from the end of the last section of silence, until the start of the next section of silence, and store it in a list
Store the end of the next section of silence in a variable
Repeat until all sections of silence are exhausted

Last we write the remainder of the video to the last clip, use the concatenate_vidoeclips function from moviepy to pass in a list of clips and combine them into one video clip, and call the write_videofile method of VideoClip class to save the final output to the out path I passed into the script.

Tada! You got a new version of the video with the silent parts removed!

I will try to show a before and after video of the process soon.

Top comments (1)

scresante • May 10 '22 • Edited

A fix I had to use for the original first script:
ffmpeg -hide_banner -vn -i $orig -af silencedetect -f null - 2>&1 | sed "s/\r/\n/" | awk '/_end/{print $5 " " $8}' > silence.txt
Since I am only using this for audio files, I decided to go with this: github.com/kkroening/ffmpeg-python...

DEV Community

Automatically trim silence from video with ffmpeg and python

Process

Scripts

Top comments (1)

Read next

Batch, Mini-Batch & Stochastic Gradient Descent

Beginner's Guide to Python: A Quick Tutorial - 2

Setting Up Virtual environment in Python Projects with Conda - 1

Day 4 - None Datatype & input() function in Python