Introduction
Problem : how to compress photos and videos to save storage.
TL;DR : Arch Linux + [Image Magick -> JPEG XL / HEIC] + [ffmpeg -> MP4 HEVC].
I have a better solution than the previous one.
Better means wider audience, more user oriented, faster to setup, less steps, intended to be enhanced by an AI-agent of your choice.
Hereby I will list more the ideas and some critical steps, feel free to tailor to your needs.
Setup
Yes, Gentoo Setup is efficient, still I find it's a geek way. Gentoo is great for those who love Gentoo.
Hence this time I will do the same with Arch Linux to simplify the setup.
Also I will convert images to JPEG this time thanks to the fantastic progress done by JPEG XL Team.
For videos I will stick to the MP4 with HEVC.
pacman -Syu --needed \
ffmpeg \
imagemagick \
libheif \
libpng \
librsvg \
x264 \
x265 \
fdk-aac \
lame \
opus \
libvpx \
jq
verify
magick -list format | grep -i HEIC
ffmpeg -encoders | grep -E 'libx264|libx265|libvpx|libopus|libmp3lame|libfdk_aac'
Feel free to digest as is to an AI agent if any customisation needed.
Data verification
Okay, let's assume we have some set of files, let's see the types of files :
ls -R ./ | awk -F. '/\./ {print $NF}' | sort -u
Then see if we have case-sensitive duplicates :
#!/bin/bash
# Find all subdirectories and process files in each subdirectory separately
find . -type d | while read -r dir; do
# Find files in the current directory
find "$dir" -maxdepth 1 -type f | \
# Remove the path, leaving only the filename
sed 's/.*\///' | \
# Convert filenames to lowercase for case-insensitive comparison
tr '[:upper:]' '[:lower:]' | \
# Sort the filenames
sort | \
# Find duplicates in the sorted list
uniq -d | \
# Print the duplicates with their directory path
while read -r filename; do
echo "Duplicates in '$dir':"
find "$dir" -maxdepth 1 -type f -iname "$filename"
done
done
then convert all to the lower case :
#!/bin/bash
# Find all files recursively
find ./ -type f | while read -r file; do
dir=$(dirname "$file")
base=$(basename "$file")
# Convert the filename to lowercase
lower_base=$(echo "$base" | tr '[:upper:]' '[:lower:]')
# If the filename is different (case insensitive), do the two-step rename
if [[ "$base" != "$lower_base" ]]; then
# Step 1: Rename to an intermediate name (tmp_<lowercase filename>)
mv "$file" "$dir/tmp_$lower_base"
# Step 2: Rename to the final lowercase name (removing tmp_ prefix)
mv "$dir/tmp_$lower_base" "$dir/$lower_base"
fi
done
Also I would recommend to have simple filenames using digits, lower case letters and underscore only, dots in the middle might make the processing harder for ffmpeg.
For example :
OK :
- a_123.mp4
- a_123.mp4.metadata.json
NOK :
- a_123.b.mp4
- a_123.b.mp4.metadata.json
Data preparation
You may get inspired by the following file + metadata collection script :
#!/bin/bash
INPUT_DIR="./input"
OUTPUT_DIR="./output"
mkdir -p "$OUTPUT_DIR"
find "$INPUT_DIR" -type f \( -iname "*.mov" -o -iname "*.mp4" \) | while read -r video_file; do
filename=$(basename "$video_file")
echo "Copying video: $filename"
cp "$video_file" "$OUTPUT_DIR/"
# Copy matching JSON sidecars
find "$(dirname "$video_file")" \
-maxdepth 1 \
-type f \
-iname "${filename}*.json" \
-exec cp {} "$OUTPUT_DIR/" \;
done
echo "Done."
Image processing
As I have mentioned, I will compress all to JPEG this time as it may further be converted with no losses to JPEG XL. You may use HEIC as well.
It's faster to update the metadata, compress and resize in one go :
#!/bin/bash
set -Eeuo pipefail
###############################################################################
# Configuration
###############################################################################
INPUT_DIR="./input"
OUTPUT_DIR="./output"
OUTPUT_FIX_DIR="./output_to_fix"
MAX_SIZE=1280
JPG_QUALITY=90
mkdir -p "$OUTPUT_DIR"
mkdir -p "$OUTPUT_FIX_DIR"
###############################################################################
find_json_sidecar() {
local img="$1"
for candidate in "$img".*.json; do
[[ -f "$candidate" ]] || continue
echo "$candidate"
return 0
done
return 1
}
###############################################################################
# Main
###############################################################################
find "$INPUT_DIR" -type f \( \
-iname "*.jpg" -o \
-iname "*.jpeg" -o \
-iname "*.heic" \
\) -print0 | while IFS= read -r -d '' img_file
do
echo
echo "===================================================="
echo "Processing: $img_file"
filename=$(basename "$img_file")
basename_noext="${filename%.*}"
json_file=""
timestamp=""
description=""
###########################################################################
# Read JSON sidecar
###########################################################################
if json_file=$(find_json_sidecar "$img_file"); then
echo "JSON: $json_file"
timestamp=$(jq -r '
.photoTakenTime.timestamp //
.creationTime.timestamp //
empty
' "$json_file")
description=$(jq -r '
.description //
empty
' "$json_file")
else
echo "JSON not found"
fi
###########################################################################
# Fallback to existing EXIF
###########################################################################
if [[ -z "$timestamp" ]]; then
exif_date=$(exiftool -s3 -DateTimeOriginal "$img_file" 2>/dev/null || true)
if [[ -n "$exif_date" ]]; then
timestamp=$(date -d "$exif_date" +%s 2>/dev/null || true)
echo "Using EXIF timestamp"
fi
fi
###########################################################################
# No timestamp -> output_to_fix
###########################################################################
if [[ -z "$timestamp" ]]; then
output_file="${OUTPUT_FIX_DIR}/${basename_noext}.jpg"
echo "No timestamp available -> $output_file"
magick "$img_file" \
-auto-orient \
-quality "$JPG_QUALITY" \
"$output_file"
continue
fi
###########################################################################
# Build dates
###########################################################################
exif_date=$(date -d @"$timestamp" "+%Y:%m:%d %H:%M:%S" 2>/dev/null || true)
file_date=$(date -d @"$timestamp" "+%Y%m%d_%H%M%S" 2>/dev/null || true)
if [[ -z "$file_date" ]]; then
output_file="${OUTPUT_FIX_DIR}/${basename_noext}.jpg"
echo "Invalid timestamp -> $output_file"
magick "$img_file" \
-auto-orient \
-quality "$JPG_QUALITY" \
"$output_file"
continue
fi
###########################################################################
# Output filename
###########################################################################
output_file="${OUTPUT_DIR}/${file_date}_${basename_noext}.jpg"
counter=1
while [[ -e "$output_file" ]]; do
output_file="${OUTPUT_DIR}/${file_date}_${basename_noext}_${counter}.jpg"
((counter++))
done
###########################################################################
# Determine orientation and resize only if needed
###########################################################################
dimensions=$(identify -format "%wx%h" "$img_file")
width=${dimensions%x*}
height=${dimensions#*x}
echo "Dimensions: ${width}x${height}"
if (( height >= width )); then
resize_arg="x${MAX_SIZE}>"
else
resize_arg="${MAX_SIZE}x>"
fi
###########################################################################
# Convert to JPG + resize
###########################################################################
magick "$img_file" \
-auto-orient \
-resize "$resize_arg" \
-quality "$JPG_QUALITY" \
"$output_file"
###########################################################################
# Restore metadata
###########################################################################
if [[ -n "$exif_date" ]]; then
exiftool -overwrite_original \
-DateTimeOriginal="$exif_date" \
-CreateDate="$exif_date" \
-ModifyDate="$exif_date" \
"$output_file" >/dev/null 2>&1
fi
if [[ -n "$description" && "$description" != "null" ]]; then
exiftool -overwrite_original \
-Description="$description" \
-ImageDescription="$description" \
"$output_file" >/dev/null 2>&1
fi
echo "Created: $output_file"
done
echo
echo "===================================================="
echo "Done."
At this point you have had all your images compressed to the size of around 300 kB.
Video processing
Let's apply the same logic and proceed with the metadata update, conversion and rescaling in one go :
#!/bin/bash
INPUT_DIR="./input"
OUTPUT_DIR="./output"
OUTPUT_DIR_TO_FIX="./mp4_to_fix"
mkdir -p "$OUTPUT_DIR"
mkdir -p "$OUTPUT_DIR_TO_FIX"
find "$INPUT_DIR" -type f \( -iname "*.mp4" -o -iname "*.mov" \) | while read -r video_file; do
echo "Processing: $video_file"
filename=$(basename "$video_file")
extension="${filename##*.}"
extension_lower=$(echo "$extension" | tr '[:upper:]' '[:lower:]')
base_name="${filename%.*}"
# Look for Google Photos JSON sidecar
json_file=$(find "$(dirname "$video_file")" \
-maxdepth 1 \
-type f \
-iname "${filename}*.json" \
| head -n 1)
formatted_date=""
new_filename=""
if [[ -f "$json_file" ]]; then
echo "JSON found: $json_file"
creation_time=$(jq -r '.photoTakenTime.timestamp' "$json_file")
if [[ "$creation_time" =~ ^[0-9]+$ ]]; then
formatted_date=$(date -d @"$creation_time" +"%Y-%m-%d %H:%M:%S" 2>/dev/null)
fi
fi
if [[ -z "$formatted_date" ]]; then
echo "Missing or invalid timestamp."
last_dir=$(basename "$(dirname "$video_file")")
cp "$video_file" \
"${OUTPUT_DIR_TO_FIX}/${last_dir}_${filename}"
continue
fi
# Get dimensions
dimensions=$(ffprobe -v error \
-select_streams v:0 \
-show_entries stream=width,height \
-of csv=s=x:p=0 \
"$video_file")
width=$(echo "$dimensions" | cut -d'x' -f1)
height=$(echo "$dimensions" | cut -d'x' -f2)
echo "Dimensions: ${width}x${height}"
scale_filter=""
if [[ "$height" -ge "$width" && "$height" -gt 1280 ]]; then
scale_filter="scale=-2:1280"
echo "Scaling to -2:1280"
elif [[ "$width" -gt "$height" && "$width" -gt 1280 ]]; then
scale_filter="scale=1280:-2"
echo "Scaling to 1280:-2"
else
echo "No scaling required"
fi
filename_date=$(date -d "$formatted_date" +"%Y%m%d_%H%M%S")
output_file="${OUTPUT_DIR}/${filename_date}_${base_name}.mp4"
echo "Output: $output_file"
if [[ -n "$scale_filter" ]]; then
ffmpeg -y \
-i "$video_file" \
-vf "$scale_filter" \
-r 30 \
-c:v libx265 \
-crf 28 \
-preset medium \
-c:a aac \
-b:a 192k \
-metadata creation_time="$formatted_date" \
"$output_file"
else
ffmpeg -y \
-i "$video_file" \
-r 30 \
-c:v libx265 \
-crf 28 \
-preset medium \
-c:a aac \
-b:a 192k \
-metadata creation_time="$formatted_date" \
"$output_file"
fi
done
echo "Processing complete."
It may happen that your videos have some issue that would prevent ffmpeg to process all in one batch correctly, hence a cleanup before a failover might be helpful :
INPUT_DIR="./input"
OUTPUT_DIR="./output"
find "$OUTPUT_DIR" -type f | while read -r output_file; do
filename=$(basename "$output_file")
search_name="${filename:16}"
search_name="${search_name%.*}"
echo "Looking for: $search_name"
find "$INPUT_DIR" -type f -name "*${search_name}*" -print -delete
done
You may combine both in one monstrous script, I have listed them separately to simplify the understanding of the idea.
At this point you have all your videos processed.
Data export
Now let's adjust the names of the files for housekeeping :
#!/bin/bash
# Set the input and output directories
input_dir="./input"
output_dir="./output"
# Make sure the output directory exists, create it if it doesn't
mkdir -p "$output_dir"
# Counter variable, starting at 1
counter=1
# Loop through sorted .jpg files, handling files with spaces correctly
find "$input_dir" -type f -name "*.jpg" | sort | while IFS= read -r file; do
# Extract the first part of the filename (before the first three underscores)
base_name=$(basename "$file")
prefix=$(echo "$base_name" | cut -d'_' -f1-2) # Extract "20000101_000000" part
# Build the new filename with a 4-digit counter and "_jpg_" prefix
new_filename=$(printf "%s_jpg_%04d.jpg" "$prefix" "$counter")
# Copy the file to the output directory with the new filename
cp "$file" "$output_dir/$new_filename"
# Increment the counter
((counter++))
done
echo "Files renamed and copied to $output_dir"
Summary
Now all your images and videos are ready to be exported.
And you have done an eco-friendly action, given that El Niño is on the rise and this year is expected to be record hot.
Top comments (0)