I take a lot of photos. Like most people, I'd been dumping them into Google Photos for years, mostly for the face-grouping feature — that quiet magic where it just knows which photos have your kid, your friend, your dog in them. Then I started wondering: could I build that myself, on my own hardware, without shipping every photo of my family to a cloud provider?
Turns out, yes — and it became one of the more satisfying infrastructure projects I've built in a while.
This post walks through the architecture, the interesting technical decisions, and a few of the harder problems I ran into along the way.
What it does
At a high level: you upload photos, the system detects faces in them, groups similar faces into "people," and gives you a dashboard to browse, tag, and correct those groupings — all running locally via Docker, no external API calls, no subscription.
Three repositories make up the project:
- face-pipeline — a FastAPI service that does the actual computer vision work
- face-pipeline-ui — a Laravel application that provides the dashboard, uploads, and real-time progress
- photo-organizer — the deployment layer, tying everything together with Docker Compose
The face recognition pipeline
The core of the project is InsightFace, using two of its models in sequence:
- SCRFD for face detection — finding the bounding boxes of faces in a photo
- ArcFace for generating embeddings — turning each detected face into a vector that captures its distinguishing features
Once you have embeddings, the interesting problem becomes: how do you group them into "people" without knowing in advance how many people exist, or manually labeling anything?
I used HDBSCAN, a density-based clustering algorithm that doesn't require you to specify the number of clusters upfront — a good fit here since you have no idea how many distinct people are in a given photo library. Face embeddings get stored in pgvector (a Postgres extension for vector similarity search), which makes it fast to compare new faces against existing clusters.
That last point matters more than it sounds. Naively, you'd want to re-cluster the entire dataset every time a new photo comes in — expensive, and it gets worse as your library grows. Instead, the pipeline does incremental centroid matching: each existing cluster has a centroid (essentially, its "average face"), and new detections get compared against those centroids first. Only when something doesn't confidently match an existing cluster does it get treated as a new person or an ambiguous case for review. This keeps ongoing processing fast even as the library scales.
To keep cluster quality high, two filters run before anything reaches the clustering stage:
- Blur scoring, using variance-of-Laplacian, to filter out faces too blurry to be useful
- Relative face-size filtering, to discard faces that are too small in the frame to produce a reliable embedding
Garbage in, garbage out applies especially hard to face clustering — a handful of bad embeddings can pull a whole cluster in the wrong direction.
The application layer
The FastAPI service is deliberately narrow — it does computer vision and nothing else. Everything user-facing lives in a Laravel app: project management, photo uploads, browsing detected people, correcting misclassifications, and a real-time dashboard showing pipeline progress.
That real-time piece uses Laravel Reverb, Laravel's self-hosted WebSocket server. When you upload a batch of photos, they get processed asynchronously through the pipeline, and the UI updates live as each stage completes — detection, embedding, clustering — rather than making you refresh or poll.
Two deployment modes
One thing I wanted from the start: this shouldn't be an "advanced users only" project. So photo-organizer ships two ways to run it:
Single-container mode — everything (web server, queue workers, scheduler, websocket server) runs in one container. Simplest possible setup: clone, docker compose up, done. Good for trying it out or running it on modest hardware.
Multi-container mode — the same Laravel app split across independent containers for the web server, Horizon (queue workers), the scheduler, and Reverb, each able to scale independently. If you're processing a large photo library and want to throw more workers at the queue without touching anything else, this is the mode for that.
Splitting a working single-container setup into multiple containers turned out to be its own small adventure — Docker networking has opinions about what localhost means that don't survive the transition, and I hit a genuinely subtle bug where WebSocket broadcasts silently failed once the app was split, because the same environment variable was quietly doing two incompatible jobs (telling the browser where to connect, and telling the backend where to publish). That's a story for its own post, but the short version is: if you're planning to split a monolithic Docker container into services, audit every environment variable for hidden dual purposes before you do.
What I'd do differently
A few honest reflections, since a lot of the value in writing these things up is being straight about the rough edges:
-
Cluster tuning takes patience. HDBSCAN's
min_cluster_sizeandmin_samplesparameters meaningfully change how aggressively faces get grouped, and the right values depend on your specific photo library — how many photos per person, how much lighting/angle variation, etc. I exposed these as configurable environment variables rather than hardcoding a "correct" value, because there isn't one. - Shared storage between services needs care. The FastAPI pipeline and the Laravel app both need access to the same photo/face-crop files, which means getting file permissions right across containers running as different users — a small thing that cost more debugging time than the computer vision code did.
- Incremental matching is worth the complexity. I almost skipped it in favor of simpler full re-clustering, and I'm glad I didn't — the difference in how the system feels as your library grows is significant.
Try it
Everything is open source (Apache-2.0):
- photo-organizer — start here for deployment
- face-pipeline — the FastAPI/InsightFace service
- face-pipeline-ui — the Laravel dashboard
If you're into self-hosting, computer vision, or just enjoy a good Docker Compose stack, I'd love feedback — issues, PRs, or just thoughts on the architecture are all welcome.
Top comments (0)