hexfloor

Posted on Jun 9

Photos and videos processing in 2026 : more space in the cloud

#linux #productivity #cloud #ai

Introduction

Problem : how to compress photos and videos to save storage.
TL;DR : Arch Linux + [Image Magick -> JPEG XL / HEIC] + [ffmpeg -> MP4 HEVC].
I have a better solution than the previous one.
Better means wider audience, more user oriented, faster to setup, less steps, intended to be enhanced by an AI-agent of your choice.
Hereby I will list more the ideas and some critical steps, feel free to tailor to your needs.

Setup

Yes, Gentoo Setup is efficient, still I find it's a geek way. Gentoo is great for those who love Gentoo.
Hence this time I will do the same with Arch Linux to simplify the setup.
Also I will convert images to JPEG this time thanks to the fantastic progress done by JPEG XL Team.
For videos I will stick to the MP4 with HEVC.

pacman -Syu --needed \
    ffmpeg \
    imagemagick \
    libheif \
    libpng \
    librsvg \
    x264 \
    x265 \
    fdk-aac \
    lame \
    opus \
    libvpx \
    jq

verify

magick -list format | grep -i HEIC
ffmpeg -encoders | grep -E 'libx264|libx265|libvpx|libopus|libmp3lame|libfdk_aac'

Feel free to digest as is to an AI agent if any customisation needed.

Data verification

Okay, let's assume we have some set of files, let's see the types of files :

ls -R ./ | awk -F. '/\./ {print $NF}' | sort -u

Then see if we have case-sensitive duplicates :

#!/bin/bash

# Find all subdirectories and process files in each subdirectory separately
find . -type d | while read -r dir; do
    # Find files in the current directory
    find "$dir" -maxdepth 1 -type f | \
        # Remove the path, leaving only the filename
        sed 's/.*\///' | \
        # Convert filenames to lowercase for case-insensitive comparison
        tr '[:upper:]' '[:lower:]' | \
        # Sort the filenames
        sort | \
        # Find duplicates in the sorted list
        uniq -d | \
        # Print the duplicates with their directory path
        while read -r filename; do
            echo "Duplicates in '$dir':"
            find "$dir" -maxdepth 1 -type f -iname "$filename"
        done
done

then convert all to the lower case :

#!/bin/bash

# Find all files recursively
find ./ -type f | while read -r file; do
    dir=$(dirname "$file")
    base=$(basename "$file")

    # Convert the filename to lowercase
    lower_base=$(echo "$base" | tr '[:upper:]' '[:lower:]')

    # If the filename is different (case insensitive), do the two-step rename
    if [[ "$base" != "$lower_base" ]]; then
        # Step 1: Rename to an intermediate name (tmp_<lowercase filename>)
        mv "$file" "$dir/tmp_$lower_base"

        # Step 2: Rename to the final lowercase name (removing tmp_ prefix)
        mv "$dir/tmp_$lower_base" "$dir/$lower_base"
    fi
done

Also I would recommend to have simple filenames using digits, lower case letters and underscore only, dots in the middle might make the processing harder for ffmpeg.
For example :
OK :

a_123.mp4
a_123.mp4.metadata.json

NOK :

a_123.b.mp4
a_123.b.mp4.metadata.json

Data preparation

You may get inspired by the following file + metadata collection script :

#!/bin/bash

INPUT_DIR="./input"
OUTPUT_DIR="./output"

mkdir -p "$OUTPUT_DIR"

find "$INPUT_DIR" -type f \( -iname "*.mov" -o -iname "*.mp4" \) | while read -r video_file; do

    filename=$(basename "$video_file")

    echo "Copying video: $filename"
    cp "$video_file" "$OUTPUT_DIR/"

    # Copy matching JSON sidecars
    find "$(dirname "$video_file")" \
        -maxdepth 1 \
        -type f \
        -iname "${filename}*.json" \
        -exec cp {} "$OUTPUT_DIR/" \;

done

echo "Done."

Image processing

As I have mentioned, I will compress all to JPEG this time as it may further be converted with no losses to JPEG XL. You may use HEIC as well.
It's faster to update the metadata, compress and resize in one go :

#!/bin/bash

set -Eeuo pipefail

###############################################################################
# Configuration
###############################################################################

INPUT_DIR="./input"
OUTPUT_DIR="./output"
OUTPUT_FIX_DIR="./output_to_fix"

MAX_SIZE=1280
JPG_QUALITY=90

mkdir -p "$OUTPUT_DIR"
mkdir -p "$OUTPUT_FIX_DIR"

###############################################################################

find_json_sidecar() {
    local img="$1"

    for candidate in "$img".*.json; do
        [[ -f "$candidate" ]] || continue
        echo "$candidate"
        return 0
    done

    return 1
}

###############################################################################
# Main
###############################################################################

find "$INPUT_DIR" -type f \( \
    -iname "*.jpg" -o \
    -iname "*.jpeg" -o \
    -iname "*.heic" \
\) -print0 | while IFS= read -r -d '' img_file
do

    echo
    echo "===================================================="
    echo "Processing: $img_file"

    filename=$(basename "$img_file")
    basename_noext="${filename%.*}"

    json_file=""
    timestamp=""
    description=""

    ###########################################################################
    # Read JSON sidecar
    ###########################################################################

    if json_file=$(find_json_sidecar "$img_file"); then

        echo "JSON: $json_file"

        timestamp=$(jq -r '
            .photoTakenTime.timestamp //
            .creationTime.timestamp //
            empty
        ' "$json_file")

        description=$(jq -r '
            .description //
            empty
        ' "$json_file")

    else
        echo "JSON not found"
    fi

    ###########################################################################
    # Fallback to existing EXIF
    ###########################################################################

    if [[ -z "$timestamp" ]]; then

        exif_date=$(exiftool -s3 -DateTimeOriginal "$img_file" 2>/dev/null || true)

        if [[ -n "$exif_date" ]]; then
            timestamp=$(date -d "$exif_date" +%s 2>/dev/null || true)
            echo "Using EXIF timestamp"
        fi
    fi

    ###########################################################################
    # No timestamp -> output_to_fix
    ###########################################################################

    if [[ -z "$timestamp" ]]; then

        output_file="${OUTPUT_FIX_DIR}/${basename_noext}.jpg"

        echo "No timestamp available -> $output_file"

        magick "$img_file" \
            -auto-orient \
            -quality "$JPG_QUALITY" \
            "$output_file"

        continue
    fi

    ###########################################################################
    # Build dates
    ###########################################################################

    exif_date=$(date -d @"$timestamp" "+%Y:%m:%d %H:%M:%S" 2>/dev/null || true)

    file_date=$(date -d @"$timestamp" "+%Y%m%d_%H%M%S" 2>/dev/null || true)

    if [[ -z "$file_date" ]]; then

        output_file="${OUTPUT_FIX_DIR}/${basename_noext}.jpg"

        echo "Invalid timestamp -> $output_file"

        magick "$img_file" \
            -auto-orient \
            -quality "$JPG_QUALITY" \
            "$output_file"

        continue
    fi

    ###########################################################################
    # Output filename
    ###########################################################################

    output_file="${OUTPUT_DIR}/${file_date}_${basename_noext}.jpg"

    counter=1

    while [[ -e "$output_file" ]]; do
        output_file="${OUTPUT_DIR}/${file_date}_${basename_noext}_${counter}.jpg"
        ((counter++))
    done

    ###########################################################################
    # Determine orientation and resize only if needed
    ###########################################################################

    dimensions=$(identify -format "%wx%h" "$img_file")

    width=${dimensions%x*}
    height=${dimensions#*x}

    echo "Dimensions: ${width}x${height}"

    if (( height >= width )); then
        resize_arg="x${MAX_SIZE}>"
    else
        resize_arg="${MAX_SIZE}x>"
    fi

    ###########################################################################
    # Convert to JPG + resize
    ###########################################################################

    magick "$img_file" \
        -auto-orient \
        -resize "$resize_arg" \
        -quality "$JPG_QUALITY" \
        "$output_file"

    ###########################################################################
    # Restore metadata
    ###########################################################################

    if [[ -n "$exif_date" ]]; then

        exiftool -overwrite_original \
            -DateTimeOriginal="$exif_date" \
            -CreateDate="$exif_date" \
            -ModifyDate="$exif_date" \
            "$output_file" >/dev/null 2>&1
    fi

    if [[ -n "$description" && "$description" != "null" ]]; then

        exiftool -overwrite_original \
            -Description="$description" \
            -ImageDescription="$description" \
            "$output_file" >/dev/null 2>&1
    fi

    echo "Created: $output_file"

done

echo
echo "===================================================="
echo "Done."

At this point you have had all your images compressed to the size of around 300 kB.

Video processing

Let's apply the same logic and proceed with the metadata update, conversion and rescaling in one go :

#!/bin/bash

INPUT_DIR="./input"
OUTPUT_DIR="./output"
OUTPUT_DIR_TO_FIX="./mp4_to_fix"

mkdir -p "$OUTPUT_DIR"
mkdir -p "$OUTPUT_DIR_TO_FIX"

find "$INPUT_DIR" -type f \( -iname "*.mp4" -o -iname "*.mov" \) | while read -r video_file; do

    echo "Processing: $video_file"

    filename=$(basename "$video_file")
    extension="${filename##*.}"
    extension_lower=$(echo "$extension" | tr '[:upper:]' '[:lower:]')

    base_name="${filename%.*}"

    # Look for Google Photos JSON sidecar
    json_file=$(find "$(dirname "$video_file")" \
        -maxdepth 1 \
        -type f \
        -iname "${filename}*.json" \
        | head -n 1)

    formatted_date=""
    new_filename=""

    if [[ -f "$json_file" ]]; then
        echo "JSON found: $json_file"

        creation_time=$(jq -r '.photoTakenTime.timestamp' "$json_file")

        if [[ "$creation_time" =~ ^[0-9]+$ ]]; then
            formatted_date=$(date -d @"$creation_time" +"%Y-%m-%d %H:%M:%S" 2>/dev/null)
        fi
    fi

    if [[ -z "$formatted_date" ]]; then
        echo "Missing or invalid timestamp."

        last_dir=$(basename "$(dirname "$video_file")")
        cp "$video_file" \
           "${OUTPUT_DIR_TO_FIX}/${last_dir}_${filename}"

        continue
    fi

    # Get dimensions
    dimensions=$(ffprobe -v error \
        -select_streams v:0 \
        -show_entries stream=width,height \
        -of csv=s=x:p=0 \
        "$video_file")

    width=$(echo "$dimensions" | cut -d'x' -f1)
    height=$(echo "$dimensions" | cut -d'x' -f2)

    echo "Dimensions: ${width}x${height}"

    scale_filter=""

    if [[ "$height" -ge "$width" && "$height" -gt 1280 ]]; then
        scale_filter="scale=-2:1280"
        echo "Scaling to -2:1280"
    elif [[ "$width" -gt "$height" && "$width" -gt 1280 ]]; then
        scale_filter="scale=1280:-2"
        echo "Scaling to 1280:-2"
    else
        echo "No scaling required"
    fi

    filename_date=$(date -d "$formatted_date" +"%Y%m%d_%H%M%S")

    output_file="${OUTPUT_DIR}/${filename_date}_${base_name}.mp4"

    echo "Output: $output_file"

    if [[ -n "$scale_filter" ]]; then
        ffmpeg -y \
            -i "$video_file" \
            -vf "$scale_filter" \
            -r 30 \
            -c:v libx265 \
            -crf 28 \
            -preset medium \
            -c:a aac \
            -b:a 192k \
            -metadata creation_time="$formatted_date" \
            "$output_file"
    else
        ffmpeg -y \
            -i "$video_file" \
            -r 30 \
            -c:v libx265 \
            -crf 28 \
            -preset medium \
            -c:a aac \
            -b:a 192k \
            -metadata creation_time="$formatted_date" \
            "$output_file"
    fi

done

echo "Processing complete."

It may happen that your videos have some issue that would prevent ffmpeg to process all in one batch correctly, hence a cleanup before a failover might be helpful :

INPUT_DIR="./input"
OUTPUT_DIR="./output"

find "$OUTPUT_DIR" -type f | while read -r output_file; do

    filename=$(basename "$output_file")

    search_name="${filename:16}"
    search_name="${search_name%.*}"

    echo "Looking for: $search_name"

    find "$INPUT_DIR" -type f -name "*${search_name}*" -print -delete

done

You may combine both in one monstrous script, I have listed them separately to simplify the understanding of the idea.
At this point you have all your videos processed.

Data export

Now let's adjust the names of the files for housekeeping :

#!/bin/bash

# Set the input and output directories
input_dir="./input"
output_dir="./output"

# Make sure the output directory exists, create it if it doesn't
mkdir -p "$output_dir"

# Counter variable, starting at 1
counter=1

# Loop through sorted .jpg files, handling files with spaces correctly
find "$input_dir" -type f -name "*.jpg" | sort | while IFS= read -r file; do
    # Extract the first part of the filename (before the first three underscores)
    base_name=$(basename "$file")
    prefix=$(echo "$base_name" | cut -d'_' -f1-2)  # Extract "20000101_000000" part

    # Build the new filename with a 4-digit counter and "_jpg_" prefix
    new_filename=$(printf "%s_jpg_%04d.jpg" "$prefix" "$counter")

    # Copy the file to the output directory with the new filename
    cp "$file" "$output_dir/$new_filename"

    # Increment the counter
    ((counter++))
done

echo "Files renamed and copied to $output_dir"

Summary

Now all your images and videos are ready to be exported.
And you have done an eco-friendly action, given that El Niño is on the rise and this year is expected to be record hot.

DEV Community