Follow me on YouTube and Twitter for more video editing tricks with ffmpeg and a little python now!
I finally did it, I managed to figure out a little process to automatically remove the silent parts from a video.
Let me show ya'll the process and the two main scripts I use to accomplish this.
Process
- Use ffmpeg's silencedetect filter to generate output of sections of the video's audio with silence
- Pipe that output through a few programs to get the output in the format that I want
- Save the output into a text file
- Use that text file in a python script that sections out the parts of the video with audio, and save the new version with the silence removed
Now, with the process laid out, lets look at the scripts doing the heavy lifting.
Scripts
Here is the script for generating the silence timestamp data:
#!/usr/bin/env sh
IN=$1
THRESH=$2
DURATION=$3
ffmpeg -hide_banner -vn -i $IN -af "silencedetect=n=${THRESH}dB:d=${DURATION}" -f null - 2>&1 | grep "silence_end" | awk '{print $5 " " $8}' > silence.txt
I'm passing in three arguments to this script:
IN
- the file path to the video I want to analyzeTHRESH
- the volume threshold the filter uses to determine what counts as silenceDURATION
- the length of time in seconds the audio needs to stay below the threshold to count as a section of silence
That leaves us with the actual ffmpeg command:
ffmpeg -hide_banner -vn -i $IN -af "silencedetect=n=${THRESH}dB:d=${DURATION}" -f null - 2>&1 | grep "silence_end" | awk '{print $5 " " $8}' > silence.txt
-hide_banner
- hides the initial dump of info ffmpeg shows when you run it-vn
- ignore the input file's video stream, we only need the audio and ignoring the video stream speeds up the process alot as it doesn't need to demux and decode the video stream.-af "silencedetect=n=${THRESH}dB:d=${DURATION}"
- detects the silence in the audio and displays the ouput in stdout, which I pipe to other programs
The output of silencedetect looks like this:
-f null - 2>&1
- do not write any streams out and ignore error messages. To keep the output as clean as possiblegrep "silence_end"
- we first pipe the output to grep, I only want the lines that have that part that says "silence_end"awk '{print $5 " " $8}' > silence.txt
- Lastly, we pipe that output to awk and print the fifth and eighth values to a text file
The final output looks like this:
86.7141 5.29422
108.398 5.57798
135.61 1.0805
165.077 1.06485
251.877 1.11594
283.377 5.21286
350.709 1.12472
362.749 1.24295
419.726 4.42077
467.997 5.4622
476.31 1.02338
546.918 1.35986
You might ask, why did I not grab the silence start timestamp? That is because those two numbers I grabbed were the ending timestamp and the duration. If I just subtract the duration from the ending timestamp, I get the starting timestamp!
So finally we get to the python script that processes the timestamps. The script makes use of a python library called moviepy, you should check it out!
#!/usr/bin/env python
import sys
import subprocess
import os
import shutil
from moviepy.editor import VideoFileClip, concatenate_videoclips
# Input file path
file_in = sys.argv[1]
# Output file path
file_out = sys.argv[2]
# Silence timestamps
silence_file = sys.argv[3]
# Ease in duration between cuts
try:
ease = float(sys.argv[4])
except IndexError:
ease = 0.0
minimum_duration = 1.0
def main():
# number of clips generated
count = 0
# start of next clip
last = 0
in_handle = open(silence_file, "r", errors='replace')
video = VideoFileClip(file_in)
full_duration = video.duration
clips = []
while True:
line = in_handle.readline()
if not line:
break
end,duration = line.strip().split()
to = float(end) - float(duration)
start = float(last)
clip_duration = float(to) - start
# Clips less than one seconds don't seem to work
print("Clip Duration: {} seconds".format(clip_duration))
if clip_duration < minimum_duration:
continue
if full_duration - to < minimum_duration:
continue
if start > ease:
start -= ease
print("Clip {} (Start: {}, End: {})".format(count, start, to))
clip = video.subclip(start, to)
clips.append(clip)
last = end
count += 1
if full_duration - float(last) > minimum_duration:
print("Clip {} (Start: {}, End: {})".format(count, last, 'EOF'))
clips.append(video.subclip(float(last)-ease))
processed_video = concatenate_videoclips(clips)
processed_video.write_videofile(
file_out,
fps=60,
preset='ultrafast',
codec='libx264'
)
in_handle.close()
video.close()
main()
Here I pass in 3 required and 1 optional argument:
file_in
- the input file to work on, should be the same as the one passed into the silence detection scriptfile_out
- the file path to save the final version tosilence_file
- the file path to the file generated by the silence detectionease_in
- a work in progress concept. I noticed the jumps between the clips is kinda sudden and too abrupt. So I want to add about half a second of padding to when the next clip is suppose to start to make it less abrupt.
You will see there is a minimum_duration, that is because I found in testing that moviepy will crash when trying to write out a clip that is less than a second. There are a few sanity checks using that to determine if a clip should be extracted yet or not. That part is very rough still though.
I track when the next clip to be written out should start in the last
variable, to track when the last section of silence ended.
The logic for writing out clips works like so:
Get the starting timestamp of silence
Write out a clip from the end of the last section of silence, until the start of the next section of silence, and store it in a list
Store the end of the next section of silence in a variable
Repeat until all sections of silence are exhausted
Last we write the remainder of the video to the last clip, use the concatenate_vidoeclips function from moviepy to pass in a list of clips and combine them into one video clip, and call the write_videofile method of VideoClip class to save the final output to the out path I passed into the script.
Tada! You got a new version of the video with the silent parts removed!
I will try to show a before and after video of the process soon.
Top comments (1)
A fix I had to use for the original first script:
ffmpeg -hide_banner -vn -i $orig -af silencedetect -f null - 2>&1 | sed "s/\r/\n/" | awk '/_end/{print $5 " " $8}' > silence.txt
Since I am only using this for audio files, I decided to go with this: github.com/kkroening/ffmpeg-python...