Max Wheeler

Posted on Apr 5

How I Digitized Years of Home Videos and Photos with Immich

#immich #selfhosted #python #homelab

How I Digitized over 20 Years of Home Videos and Photos and Built an AI-Powered Pipeline into Immich

I recently found myself staring at a closet full of MiniDV tapes, a stack of VHS cassettes, and many boxes of printed photos — roughly 20+ years of family memories sitting in formats that are one hardware failure away from being gone forever.

My wife and I also had growing Apple Photos libraries on our Macs that were eating through iCloud storage. I wanted all of it — the tapes, the prints, the phone photos — in one place, searchable, and under my control.

The answer for the primary library turned out to be Immich, a self-hosted Google Photos alternative, running on a Synology NAS. I chose a two-bay Synology with 6TB of usable storage (two drives configured as a mirror, so a single drive failure doesn't cost me anything) and bumped the RAM to 16GB, which left plenty of headroom for Immich's machine-learning and thumbnail-generation workloads.

"Under my control" has its own anxieties, though. If I'm replacing iCloud with a NAS sitting in my closet, I want an offsite backup. I set up Synology's Hyper Backup to push everything to AWS S3 Glacier Deep Archive — cheap to store (roughly $1/TB/month), slow to retrieve (hours), which is exactly the right tradeoff for a disaster-recovery copy of 20 years of family memories.

But getting legacy media into Immich in a way that's actually useful (with correct dates, descriptions, and organization) required building some tooling. I've open-sourced all of it: github.com/maxwheeler/immich-tools.

Here's what I built and what I learned.

The Setup

Here's the infrastructure I used:

NAS: Synology, running Immich in Docker (see the official Docker Compose install guide). I lean heavily on Immich's external libraries feature so I can keep files in a directory structure of my choosing rather than handing full ownership of storage to Immich.
Capture hardware: Sony DCR-PC100 camcorder (MiniDV via FireWire), VCR, ClearClick Video2USB (VHS tapes), Epson FastFoto FF-680W (photos)
Processing: MacBook with ffmpeg, exiftool, and Python 3
AI: Anthropic's Claude API for scene descriptions

A quick side quest: CD-ROM backups

Before I got to the tapes and prints, I dealt with a stack of CD-ROMs from the closet — backups I'd burned from a handful of early digital cameras. The good news: those files already had correct EXIF dates, so no scripting was required. The bad news: I hadn't owned a machine with an optical drive in years.

A $25 Amicool External DVD Drive from Amazon solved it. I copied each disc into its own directory on the NAS, pointed Immich's external library at the parent folder, and the photos slotted into the timeline on their original dates with zero additional work. If only everything had been that easy.

Problem 1: MiniDV Tapes

MiniDV was the prosumer video format of the early 2000s. The tapes are small, the quality is surprisingly good (DV codec, 720x480), and the cameras used FireWire for digital transfer — meaning you can capture the original digital stream with no quality loss.

Capture

Getting bits off the camcorder took more shopping than I expected. Modern Macs don't have FireWire, so I chained three adapters together to bridge the gap between the Sony DCR-PC100 and a USB-C port:

PASOW FireWire Cable 9-pin to 4-pin — connects the camera to the FireWire adapter
Apple Thunderbolt to FireWire Adapter — converts FireWire to Thunderbolt 2
Apple Thunderbolt 3 (USB-C) to Thunderbolt 2 Adapter — gets the signal into a modern Mac

With the hardware sorted, I put the camera in Playback mode and recorded into OBS Studio using its default "Hybrid MOV" output, which produced a clean .mov file ready to feed into the rest of the pipeline. Two practical tips:

Each 60-minute tape produces roughly a 3GB MOV file, so plan disk space accordingly.
Use OBS's output timer to auto-stop recording after the tape length. I set it slightly long and let the --trim-deadspace flag on dv_scene_splitter.py (covered below) handle any trailing static. With that in place I could load a tape, press record, and walk away.

The Problem with Raw Captures

A raw tape capture is one giant 60-minute file. That's not useful in a photo library. Each tape contains dozens of distinct scenes — birthday parties, vacations, random Tuesday afternoons — all concatenated together with hard cuts between them.

I needed to:

Detect where scenes change
Figure out what each scene contains
Split the file into individual clips with descriptive names
Embed metadata that Immich can read
Set an approximate recording date — pulled from whatever we'd scrawled on the tape label, or an educated guess — so clips land in the right neighborhood on the timeline

dv_scene_splitter.py

I built a Python script that handles the entire pipeline. Here's what it does:

Scene detection uses PySceneDetect with its ContentDetector, which analyzes frame-to-frame pixel differences to find hard cuts. DV camcorder footage has clean cuts between scenes (you literally pressed the record button), so this works well out of the box — but the defaults produced too many tiny micro-scenes for my taste, with every pan or exposure change spawning its own clip. After some experimentation I settled on --threshold 45 (a bit less sensitive than the default) and --min-scene-len 20 (ignore any "scene" shorter than 20 seconds), which matched how I actually wanted to browse the footage.

python dv_scene_splitter.py tape_001.mov \
  --date 2003-07-04 \
  --trim-deadspace \
  --min-scene-len 20 \
  --threshold 45

AI-powered descriptions are the interesting part. For each detected scene, the script extracts a representative frame (sampled at 20% into the scene to avoid transition artifacts) and sends it to Claude's vision API. The prompt asks for two things: a short filename-safe label (like "kids_playing_in_backyard") and a detailed 1-2 sentence description.

This means my clips end up with names like 003_birthday_cake_cutting.mov instead of scene_003.mov, and Immich can surface them when I search for "birthday" or "cake."

Metadata embedding was the trickiest part. I initially tried using FFmpeg's -metadata flag to set descriptions, but Immich ignores container-level metadata. After some digging, I found that Immich reads ImageDescription and XMP:Description — so the script uses exiftool to write those fields after FFmpeg splits the clip.

exiftool -overwrite_original \
  -ImageDescription="Kids blowing out candles on a chocolate birthday cake in the kitchen" \
  -XMP-dc:Description="Kids blowing out candles on a chocolate birthday cake in the kitchen" \
  -XMP-dc:Title="birthday cake cutting" \
  003_birthday_cake_cutting.mov

Date stamping ensures clips appear in the correct position on the Immich timeline. The --date flag sets a base recording date, and each clip gets that time offset by its position in the tape. So if the tape was recorded on July 4, 2003, scene 1 starts at 14:00:00, scene 2 at 14:03:22, etc.

Deadspace detection handles a common annoyance with home video: the camera was often left running after the last real scene, recording minutes of lens cap, floor, or a static shot of the couch. The script uses OpenCV to analyze the last few scenes, checking for low pixel variance (blank/uniform frames) and low inter-frame difference (no motion). If detected, those scenes are trimmed automatically with --trim-deadspace.

NAS-friendly staging was a late addition after I noticed that writing many small files to an SMB share was painfully slow and sometimes caused write errors. The script now stages all clips locally in ~/dv_splits, then moves them to the output directory in a single batch once all processing is complete.

A typical tape produces 20-40 clips and takes about 5-10 minutes to process (most of that is the API calls for descriptions). The AI descriptions cost roughly $0.10-$0.50 per tape.

VHS: the same tool, a longer errand

VHS capture was the same problem with more errands. The tapes themselves were fine, but nobody in my house had owned a working VCR in at least 15 years — mine had died quietly on a shelf, and the one I eventually used was borrowed from my father-in-law. On the Mac side, a ClearClick Video2USB dongle handled the analog-to-digital hop: VCR composite out → dongle → USB port → capture with the bundled software.

The resulting .mov files have VHS-era quality (so don't expect miracles) but they feed into dv_scene_splitter.py exactly the same way as the MiniDV captures — scene detection, AI descriptions, date stamping, the works. The tooling didn't care about the source format.

Problem 2: Scanned Photos

I found someone locally in the SF Bay Area (on Facebook Marketplace) that rents an Epson FastFoto FF-680W for $50 / day, which is a sheet-fed scanner designed for batch photo scanning. You load a stack of prints and it dumps everything into a folder.

The workflow I settled on:

Sort physical photos into groups by approximate date based on notes we had written on the envelopes or occasionally were printed on the back of the photos
Pre-create date-named folders with create_monthly_folders.py
Scan each group into its corresponding folder
Run set_photo_dates.py to stamp the correct dates on all files

set_photo_dates.py

This script walks through subdirectories of a root folder, parses the directory name as a date, and sets the creation/modification dates on all photo files inside. It handles a wide variety of date formats — 2024-07-04, July 4 2024, 20240704, etc.

# Preview what will happen
python set_photo_dates.py /path/to/scanned/photos --dry-run

# Apply dates
python set_photo_dates.py /path/to/scanned/photos

On macOS, it sets both the creation date (using SetFile from Xcode CLI tools) and the modification date, so when these files land in Immich, they appear in the correct spot on the timeline rather than all clustering on the scan date.

Problem 3: Organizing External Libraries in Immich

Immich has a feature called "external libraries" that lets you point it at an existing directory of media without copying the files into Immich's own storage. This is great for a NAS setup — my 1.1TB photo library lives on the Synology's 6TB mirrored volume, and Immich just indexes it in place.

The limitation is that external libraries don't automatically create albums — and albums matter, because sharing with other Immich users is album-based. If I want my wife to see the "Vacation 2024" photos, those photos need to be in an album. If you have a folder structure like:

/photos/
├── Vacation 2024/
├── Christmas 2023/
└── Kids Soccer/

Immich will index all the files but they'll just show up in the main timeline — no album organization.

immich_album_from_library.py

This script bridges the gap. Given a directory path (as it appears inside the Immich Docker container) and an album name, it finds all matching assets via the Immich API and adds them to the album.

python immich_album_from_library.py \
    --server http://your-nas:2283 \
    --api-key YOUR_API_KEY \
    --root-path "/mnt/photos/Vacation 2024" \
    --album "Vacation 2024"

One gotcha that tripped me up: the --root-path must be the path as seen inside the Docker container, not the path on your NAS filesystem. Check your docker-compose.yml volume mounts to get the right path.

Problem 4: Apple Photos → Immich

I also wanted my iPhone photos available in Immich without paying for ever-growing iCloud storage. I've since extended this to my wife's library too — she's a blogger with a very large photo collection, so finally getting off a paid iCloud tier was especially satisfying there. The solution was simpler than the video pipeline:

A shell script rsyncs the Apple Photos library from our Macs to the NAS every hour
A macOS LaunchAgent triggers the sync automatically
Immich's external library points at the originals subdirectory inside the synced Photos library package

The key insight is that Photos Library.photoslibrary is actually a macOS package (a directory that Finder displays as a single file). Inside it, the originals/ subdirectory contains the actual image and video files in a folder structure that Immich can index.

Setting it up on the Mac

The moving parts are all in the launchd/ directory of the repo:

photo-sync.sh — the actual rsync command, which reads its source/destination/log paths from ~/.config/photo-sync/env so nothing sensitive is hardcoded in the script
com.immich-tools.photo-sync.plist.template — a LaunchAgent plist template with a StartInterval of 3600 seconds (one hour)
install-launchd.sh — an installer that fills in the template with your actual paths, drops the plist into ~/Library/LaunchAgents/, and loads it with launchctl

The one-time setup looks like this:

# One-time: set up SSH key auth so rsync doesn't prompt for a password
ssh-copy-id user@your-nas-ip

# Install the LaunchAgent
cd launchd
./install-launchd.sh

# Edit the generated config with your source/dest
vim ~/.config/photo-sync/env
# Set PHOTO_SYNC_DEST="user@nas:/volume1/photos/backup"

After that, the sync runs unattended every hour. Output goes to ~/photo-sync.log so I can sanity-check that it's still working. To remove it later, ./install-launchd.sh --uninstall unloads the agent and removes the plist.

The SSH key bit is the non-obvious piece — rsync is running from a background LaunchAgent with no terminal, so there's nowhere to type a password. Set up passwordless auth to the NAS first, or the sync will silently fail every hour into the log file.

This setup means I can delete photos from the Macs to free up local storage and shrink my iCloud plan, while keeping everything accessible and searchable in Immich.

What I Learned

Metadata is everything. The difference between a pile of files and a usable library is metadata — dates, descriptions, locations. Invest the time to get it right during ingestion, because fixing it later is painful.

Immich reads specific EXIF fields. Don't waste time with FFmpeg's -metadata for descriptions. Use exiftool to write ImageDescription and XMP:Description.

External libraries are powerful but have quirks. Metadata edits made through the Immich UI on external library assets are stored only in the database — they're not written back to the files. If Immich rescans the library, your edits can be overwritten. Always set metadata on the source files before importing.

AI descriptions are worth the cost. Being able to search "birthday cake" or "beach sunset" across 20 years of home video is genuinely magical.

Stage locally, then move to the NAS. Writing many small files over SMB is slow and error-prone. Batch your writes.

Try It Yourself

All the scripts are open source: github.com/maxwheeler/immich-tools

dv_scene_splitter.py — MiniDV scene detection, AI descriptions, and splitting
immich_album_from_library.py — Create albums from external library paths
set_photo_dates.py — Stamp dates on scanned photos from folder names
create_monthly_folders.py — Pre-create date-named folders for scanning
LaunchAgent for automated Apple Photos → NAS sync

If you've got old tapes or photos sitting in a closet, I'd encourage you to start digitizing. The hardware isn't getting any younger, and the process turned out to be more rewarding than I expected — I found footage I'd completely forgotten about.

Top comments (2)

Nick • Apr 6

What you've built is very interesting. Congratulations! Did you have any problems playing the .mov videos in Immich, or did you have to re-encode them?

Max Wheeler • Apr 6

Thanks! No issues with Immich with any of the video files including any of the ones from various digital cameras and several generations of iPhones.